T ELECTRON PHYSICS VOLUME 80
TADVANCES IN ELECTRONICS AND ELECTRON PHYSICS PHYSICS ELECTRON VOLUME 80 80 VOLUME
EDITOR-IN-CHIEF
PETER W. HAWKES Centre National de la Recherche Scientijlque Toulouse,France
ASSOCIATE EDITOR
BENJAMIN KAZAN Xerox Corporation Palo Alto Research Center Palo Alto, Cali$ornia
Advances in
Electronics and Electron Physics EDITED BY PETER W. HAWKES CE MESILahoratoire d’Oprique Electronique du Centre National de la Recherche Scientijique Toulouse, France
VOLUME 80
ACADEMIC PRESS, INC. Harmurt Brace Jovanovich, Publishers Boston San Diego New York London Sydney Tokyo Toronto
This book is printed on acid-free paper, @ COPYRIGHT 0 1991 BY ACADEMIC PRESS,INC. ALL RIGHTS RESERVED, NO PART OF THIS PUBLICATION MAY BE REPRODUCED OR TRANSMITTED IN ANY FORM OR BY ANY MEANS, ELECTRONIC OR MECHANICAL, INCLUDING PHOTOCOPY, RECORDING, OR ANY INFORMATION STORAGE AND RETRIEVAL SYSTEM, WITHOUT PERMISSION IN WRITING FROM THE PUBLISHER.
ACADEMIC PRESS, INC. I250 Sixth Avenue, San Diego, CA 92101
United Kingdom Edition published by
ACADEMIC PRESS LIMITED 24-28 Oval Road, London NWI 1DX
LIBRARY OF CONGRESS CATALOG CARDNUMBER:49-7504 ISSN 0065-2539 ISBN 0-12-014680-0 PRINTED IN THE UNITED STATES OF AMERICA
91 92 93 94
9 8 7 6 5 4 3 2 1
CONTENTS
CONTRIBUTORS ..............................
PREFACE ..................................
Discrete Fast Fourier Transform Algorithms: A Tutorial Survey M . AN. 1. GERTNER. kf. ROFHEART. AND R . TOLIMIERI I. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . I1. Tensor Product Formulation of Cooley-Tukey Algorithms. . . I11. Multidimensional Algorithms . . . . . . . . . . . . . . . . . . . . IV . Line Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . V . Parallel Implementation of the Line Algorithm . . . . . . . . . . VI . The Fourier Transform in X-ray Crystallography . . . . . . . . . VII . Symmetric Fourier Transforms . . . . . . . . . . . . . . . . . . . . References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Number Theoretic Techniques in Digital Signal Processing GRAHAM A . JULLIEN 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . I1. Discrete Fourier Transforms. . . . . . . . . . . . . . . . . . . . . 111. Fast Fourier Transform (FFT) Algorithms . . . . . . . . . . . . IV . Finite Algebras . . . . . . . . . . . . . . . . . . . . . . . . . . . . V. Number Theoretic Transforms . . . . . . . . . . . . . . . . . . . VI. Residue Number Systems . . . . . . . . . . . . . . . . . . . . . . VII . Implementation of NTTs using the RNS . . . . . . . . . . . . . VIII . VLSI Implementations of Finite Algebraic Systems . . . . . . . IX . Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
vii ix
2 4
16 26 38 40 51 65
70 75 77 80 84 121 131 140
159 160 160
Information Energy and Its Applications L . PARD0 AND I . J . TANEJA I . Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . I1. Information Energy and Information Energy Gain for Discrete Random Variables . . . . . . . . . . . . . . . . . . . . . . V
166 167
vi
CONTENTS
I11. Information Energy and Information Energy Gain for Continuous Random Variables . . . . . . . . . . . . . . . . . . . . 176 IV . Statistical Aspects of Information Energy . . . . . . . . . . . . . . 188 V. Information Energy and Fuzzy Sets Theory . . . . . . . . . . . . . 224 VI . Weighted Information Energy . . . . . . . . . . . . . . . . . . . . . 234 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239
Recent Developments in Image Algebra G . X . RITTER
I . Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . I1. Image Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . I11. A Medley of Consequences . . . . . . . . . . . . . . . . . . . . . . Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Image Filtering and Analysis through the Wigner Distribution GABRIEL CRISTOBAL. CONSUELO GONZALO. AND JULIAN BESCOS I . Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . I1. The Wigner Distribution . . . . . . . . . . . . . . . . . . . . . . . I11. Wigner Distribution Representation of Images . . . . . . . . . . IV . Image Filtering through the Wigner Distribution . . . . . . . . . V . Image Analysis through the Wigner Distribution . . . . . . . . . VI . Applications of the Space (Time)-Frequency Representations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . VII . Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . INDEX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
243 246 273 305 305
309 313 326 344 359 372 387 388 388 399
CONTRIBUTORS Numbers in parenthesesindicate the pages on which the authors’contributions begin.
M. AN (I), Center for Large Scale Computation, The Graduate School and University Center City University of New York, 25 West 43rd Street, Suite 400,New York, NY 10036
BESCOS(309), Instituto de Optica del CSIC, Serrano 121, 28006 JULIAN Madrid, Spain GABRIEL CRISTOBAL (309), International Computer Science Institute and EE-CS Dept. University of California Berkeley, 1947 Center Street, Suite 600, Berkeley, CA 94704
I. GERTNER(l), Center for Large Scale Computation, The Graduate School and University Center City University of New York, 25 West 43rd Street, Suite 400,New York, NY 10036 CONSUELO GONZAL0(309), Instituto de Optica del CSIC, Serrano 121,28006 Madrid, Spain GRAHAM A. JULLIEN (69), Department of Electrical Engineering, University of Windsor, Windsor, Ontario, Canada N9B 3P4
L. PARDO(165), Departamento de Estadistica e LO., Facultad de Matematicas, Universidad Complutense de Madrid, 28040-Madrid, Spain (243). Center for Computer Vision Research, Department of G. X. RITTER Computer and Information Sciences, CSE 301, University of Florida, Gainesville, FL 3261 1 M. ROFHEART(1). Center for Large Scale Computation, The Graduate School and University Center City University of New York, 25 West 43rd Street, Suite 400,New York, NY 10036 I. J. TANEJA (163,Departamento de Matematica, Universidad Federal de Santa Catarina, 88.049-Florianopolis,SC, Brazil
R. TOLIMIERI(1). Center for Large Scale Computation, The Graduate School and University Center City University of New York, 25 West 43rd Street, Suite 400, New York, NY 10036
vii
This Page Intentionally Left Blank
PREFACE ,This topical volume devoted to image science reflects a conscious effort to increase significantly the number of reviews in these Advances on image processing and related topics. The following list of forthcoming articles, and various less formal promises confirm that this tendency is welcomed by both authors and readers. Of the five chapters here, two are devoted to the transforms that form so large a part of everyday image processing activities. A. An, I. Gertner, M. Rofheart, and R. Tolimieri have written a tutorial essay on the Fourier transform, which will perhaps be useful for teaching purposes as well as serving as a reference text. As befits authors from a Center for Large-Scale Computation, one section is devoted to parallel implementation, and crystallographic applications are explored in detail. With the growing realization that number-theoretic transform are well matched to discrete data, these mappings are attracting much attention. G. A. Jullien gives a detailed and clear account of them, with many examples, that goes from the mathematical background to VLSI implementations. This demystifying account is particularly welcome in this volume. In presenting linear transforms such as the Fourier and number-theoretic transforms, we tend to think of coordinates in direct space and reciprocal space as alternatives. There are, however, many branches of physics in which functions involving both sets of coordinates are encountered. Here, the Wigner distribution frequently offers the simplest representation of some quantity of interest. The best known outside quantum mechanics is perhaps coherence theory, but there are others, as G. Cristobal, C. Gonzalo and J. Bescos explain. After introducing the distribution, they describe its role in image filtering and image analysis in some detail and conclude with speculation in other fields. I. J. Taneja has already contributed a chapter on generalized information measures to this series (Volume 76). In the third chapter of this volume, he and L. Pardo continue this work with an account of information energy and its relation to statistics and fuzzy set theory. In the fourth chapter, G. X. Ritter describes the new and exciting subject of image algebra. A glance at any of the standard textbooks on image processing is enough to show that, although the main divisions of the subject are clear, the material within them is heterogeneous and highly diverse. If any pattern can be discerned, it is that of a very elaborate kaleidoscope, not of a systematic body of scientific knowledge. In an attempt to impose a coherent order on all this diverse material, which is certainly fascinating and ix
PREFACE
X
effective but uncoordinated, a number of research groups have been developing image algebras. These are mathematical structures, consisting of rules and operands (images), in terms of which most, if not all, image processing operations can be expressed. The image algebra developed by G. X. Ritter and his colleagues is among the most powerful and flexible; this full account will surely encourage many practitioners of image processing to master this new approach. Not only does it enable us to represent many methods in a coherent way but it has already stimulated many new developments. It only remains for me to thank the contributors for all the trouble they have taken over their chapters, and to present a list of forthcoming articles
FORTHCOMING ARTICLES Image Processing with Signal-Dependent Noise Parallel Detection Bod0 von Borries, Pioneer of Electron Microscopy Magnetic Reconnection Vacuum Microelectronic Devices Sampling Theory Nanometer-scale Electron Beam Lithography Electrons in a Periodic Lattice Potential The Artificial Visual System Concept Speech Coding Corrected Lenses for Charged Particles The Development of Electron Microscopy Italy The Study of Dynamic Phenomena in Solids Using Field Emission Pattern Invariance and Lie Representations Amorphous Semiconductors Median Filters Bayesian Image Analysis Applications of Speech Recognition Technology Spin-Polarized SEM Analysis of Potentials and Trajectories by the Integral Equation Method
H. H.Arsenault P. E. Batson H. von Borries A. Bratenahl and P. J. Baum I. Brodie and C. A. Spindt J. L. Brown Z. W. Chen J. M. Churchill and F. E. Holmstrom J. M. Coggins V. Cuperman R. L. Dalglish G. Donelli
M. Drechsler M. Ferraro W. Fuhs N. C. Gallagher and E. Coyle S. and D. Geman H. R. Kirby K. Koike G. Martinez and M. Sancho
PREFACE
The Rectangular Patch Microstrip Radiator Electronic Tools in Parapsychology Image Formation in STEM Low Voltage SEM Z-Contrast in Materials Science Languages for Vector Computers Electron Scattering and Nuclear Structure Electrostatic Lenses Energy-Filtered Electron Microscopy CAD in Electromagnetics Scientific Work of Reinhold Rudenberg Metaplectic Methods and Image Processing X-ray Microscopy Accelerator Mass Spectroscopy Applications of Mathematical Morphology Developments in Ion Implantation Equipment Optimized Ion Microprobes Focus-Deflection Systems and Their Applications The Suprenum Project Electron Gun Optics Cathode-ray Tube Projection TV Systems Thin-film Cathodoluminescent Phosphors Electron Microscopy and Helmut Ruska Canonical Theory in Electron Optics Parallel Imaging Processing Methodologies Diode-Controlled Liquid-Crystal Display Panels
xi
H. Matzner and E. Levine R. L. Morris C. Mory and C. Colliex J. Pawley S. J. Pennycook R. H. Perrot G. A. Peterson F. H. Read and I. W. Drummond L. Reimer K. R. Richter and 0. Biro H. G. Rudenberg W. Schempp G. Schmahl J. P. F. Sellschop J. Serra M.Setvak Z. Shao T. Soma 0.Trottenberg Y. Uchikawa L. Vriens, T. G. Spanjer and R. Raue A. M. Wittenberg C. Wolpers J. Ximen S. Yalamanchili Z. Yaniv
This Page Intentionally Left Blank
ADVANCES I N ELECTRONICS AND ELECTRON PHYSICS. VOL 80
Discrete Fast Fourier Transform Algorithms: A Tutorial Survey M . AN. I. GERTNER. M . ROFHEART. AND R . TOLIMIERI* Center for Large Scale Computation The Graduate School and University Center CUN Y N e w York N e w York
.
I . Introduction . . . . . . . . . . . . . . . . . . . . . . . . . I1. Tensor Product Formulation of Cooley-Tukey Algorithms . . . . . . . . . A. Introduction . . . . . . . . . . . . . . . . . . . . . . . . B. Tensor Product Algebra . . . . . . . . . . . . . . . . . . . . C. Stride Permutation . . . . . . . . . . . . . . . . . . . . . . D . Multidimensional Tensor Products . . . . . . . . . . . . . . . . E . Cooley-Tukey Algorithms . . . . . . . . . . . . . . . . . . . F. ADesignExample . . . . . . . . . . . . . . . . . . . . . . I11. Multidimensional Algorithms . . . . . . . . . . . . . . . . . . . A . Introduction . . . . . . . . . . . . . . . . . . . . . . . . B. Fourier Transform of Finite Abelian Groups . . . . . . . . . . . . . C. Good-Thomas FFT . . . . . . . . . . . . . . . . . . . . . D . Multidimensional Cooley-Tukey Algorithm . . . . . . . . . . . . . IV . Line Algorithm . . . . . . . . . . . . . . . . . . . . . . . . A . Introduction . . . . . . . . . . . . . . . . . . . . . . . . B. Primecase . . . . . . . . . . . . . . . . . . . . . . . . C. Prime Power Case . . . . . . . . . . . . . . . . . . . . . . D. General Line Algorithm . . . . . . . . . . . . . . . . . . . . E . N-dimensional Line Algorithm . . . . . . . . . . . . . . . . . . F. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . V . Parallel Implementation of the Line Algorithm . . . . . . . . . . . . . A . Machine Model . . . . . . . . . . . . . . . . . . . . . . . B. Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . VI. The Fourier Transform in X-ray Crystallography . . . . . . . . . . . . A . Introduction . . . . . . . . . . . . . . . . . . . . . . . . B. Sampling . . . . . . . . . . . . . . . . . . . . . . . . . C. Crystallographic Groups . . . . . . . . . . . . . . . . . . . .
2 4 4 4 6 9 11
14 16 16 18 21
25 26 26 27 31 33 35 37 38 38 38 40 40 41
44
* This research is sponsored by Defense Advanced Research Projects Agency DARPA Order No . 6674. monitored by AFOSR under Contract No . F49620.89.C.0020 . The views and conclusions contained in this document are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied. of the Defense Advanced Research Projects Agency or the U S. Government . 1
.
Copyright 1 I991 by A'ddemlc Press Inc All rights or reproduction in any rorm reserved ISBN n-iz-ni~an-n
2
M. AN, I. GERTNER, M. ROFHEART, AND R. TOLIMIERI
VII. Symmetric Fourier Transforms . . . . . . . . . A. Introduction . . . . . . . . . . . . . . . B. Redundancy Conditions . . . . . . . . . . . C. A Symmetrized Fourier Transform . . . . . . D. Symmetrized Good-Thomas Algorithm . . . . E. Symmetrized Multidimensional Cooley-Tukey . . F. Implementation Example. . . . . . . . . . References . . . . . . . . . . . . . . . . .
. . . . . . . . . . 51 . . . . . . . . . 51 . . . . . . . . . 52 . . . . . . . . . . 54
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
56 58 60 65
I. INTRODUCTION The increasing importance of large vector and parallel computers in scientific and engineering applications requires new ideas for algorithm design and code generation. The problem of algorithm design is no longer tied to computational complexity but resides to a much greater extent in data flow. In the first part of this work, we discuss the importance of the tensor product in providing a streamlined, powerful language for automatic symbolic manipulation of data flow. As a tool in algorithm design for Cooley-Tukey FFT variants, the tensor product was introduced by Pease (1968) and has found widespread applications in Tempertons efforts (1983).Our emphasis is on the tensor product as a linguistic device, paying close attention to the stride permutation algebra, which naturally creates grammatical rules for data manipulation (Rodrigues, 1989). This grammar is not solely for producing highlighted Cooley-Tukey variants but is directly tied to problems in specific machine implementation where machine parameters make specific demands. Implementation is discussed on the level of multidimensional tensor products that can then be used as modules for Cooley-Tukey algorithms, as well as in other digital signal processing applications. Tensor product methods have been used in several other programming efforts not related to FFTs, including matrix multiplication on the Cray, where for large sizes it has increased by two the computation rate (Huang and Johnson, 1990). In the second part, multidimensional FFT algorithms are discussed. Our approach is both abstract and concrete. An abstract Fourier transform is defined. Good-Thomas and generalized Cooley-Tukey FFTs are defined in this context. This coordinate-free approach permits a uniformity that brings out both the similarities and differences among various seemingly different algorithms. It offers a freedom that can be used to find basic algorithmic structures without having to carry along the details of specific coordinates. Coordinates and representation are introduced at the implementation stage. The line algorithm introduced by I. Gertner (1988) for the multidimensional
DISCRETE FAST FOURIER TRANSFORM ALGORITHMS
3
FFT is taken as a concrete example that is especially suited for parallel machines. This algorithm is closely related to the Nussbaumer-Quandalle FFT algorithm (Nussbaumer, 1982)but, as will be shown, is the basic principle underlying this latter work, which also includes some multiplicative input. In the last part, we apply our results to the problem of computing the F F T on data admitting redundancies from invariance under certain group actions. These naturally come from the x-ray diffraction of crystals. L. Ten Eyck (1973) was the first to code FFT algorithms taking these redundancies into account. R.Argawal and G. Bricogne have specialized extensions to include symmetries not covered by Ten Eyck methods. The large data sets required in many modern applications and the necessity of many FFT computations make this application not only mathematically interesting but of great scientific importance. Good-Thomas and Cooley-Tukey algorithms are tuned to take advantage of this redundancy, and the method of orbit exchange is discussed in detail (An, 1988;An et al., 1990a)for special cases, but we present here for the first time a general account that can be applied to all cases. This method was introduced by M. An and J. Cooley at IBM, Watson Research Center, Yorktown Heights, on a specific application of programming composite size symmetrized FFT. L. Auslander abstracted the method in private conversation, resulting in its application to a wide range of problems. In particular, M.An and Ted Prince at N E T have worked out highly efficient FFT programs based on “diagonal” symmetries. Multiplicative FFT algorithms will not be touched on in this work, but will be the main topic of a second work devoted especially to the role of field and ring structures on indexing set. The S . Winograd program of using such structures has increasingly found its way from being mathematically interesting to the efficient programming stage, as shown by H. Silverman (1977), in programs at IBM, Watson, and by the joint efforts of J. Cooley and Chao Lu (Lu, 1988).The theoretical base of this work was provided by work in the late 1970s and early 1980s by Winograd and Auslander in the one-dimensional case, and by these authors and E. Feig in the restricted multidimensional field case (Auslander et al., 1983). Although these efforts relate to line algorithms and Nussbaumer-Quandalle FFT algorithms, they are based directly on the field structure. Multiplicative methods play a natural and important role in crystallographic FFTs and have guided recent efforts of An and Auslander (1987), Auslander et al. (1988), An and Tolimieri (1989), and Bricogne and Tolimieri (1990). A continual theme throughout this work is the emphasis and insistence upon global general procedures that can be matched whole to implementations on a variety of machines and computer sizes. This is the crucial advantage of tensor product formulation. In the crystallographic algorithms, this
4
M. AN, I. GERTNER, M. ROFHEART, AND R. TOLIMIERi
global approach required an analogous characterization of crystallographic groups that would permit a unified treatment, including at each step the redundancy afforded by the group, rather than an element-by-element approach. Recent efforts by Cook (1988; Auslander and Cook, 1990) have presented the point groups and their collection of inequivalent representations in an especially convenient way for applications. Johnson (1990) has built on these methods to give an algebraic characterization of Bravais lattices.
11. TENSOR PRODUCT FORMULATION OF COOLEY -TUKEYALGORITHMS A . Introduction
For the most part, algorithms will be presented as matrix factorizations where each factor describes a recognizable stage of the computation. CooleyTukey algorithms are built from three distinct, highly structured stages that are distributed perhaps several times through the computation: a data movement stage given by a permutation matrix (stride), a data multiplication stage (twiddle), and a Fourier transform stage. The arithmetic advantage comes about in the Fourier transform stages. Cooley-Tukey algorithms appear in various forms that are distinguished by the factors describing the stages and the order in which these factors occur. The factors dictate the implementation of the stages, the degree of parallelism and vectorization for both computation and data transfer stages, and consequently the suitability of the algorithm for a particular machine. In general, this results in programs having complicated looping structures and addressing. Code generation becomes a time-consuming part of the process. The language of tensor products is especially suited to providing tools for understanding and generating the variety of Cooley-Tukey algorithms on the level of symbolic manipulation, and it significantly reduces the cost of matching an algorithm to a machine. The abstract nature of the tensor product makes it applicable to general DSP algorithm design. In this section, we develop the theory of tensor product and its role in implementation. B. Tensor Product Algebra
We introduce the definitions of the tensor product. We show that a particular matrix tensor product is naturally identified with a parallel com-
DISCRETE FAST FOURIER TRANSFORM ALGORITHMS
5
putation, while another is naturally associated to a vector computation. Every matrix tensor product can be written as the product of a parallel factor followed by a t'ecror factor, and vice versa. Denote an M-dimensional complex vector space by qMand write x E W M as a column vector X =
[
1.
xo
xM-l
The tensor product of two vectors x E V Mand y E %" is the vector x 0 y E WMN, defined by
The tensor product of an M x M matrix A and an N x N matrix B is the MN x M N matrix a0.u- ,B aM - 1. M - 1
The action of A 0 B on x O y is given by
1-
( A Q B ) ( x @ y) = A X @ By.
(3)
(4)
We will repeatedly use the tensor product identities (A
+ B ) 0 C = ( A O C ) + ( B 8 0,
(AC) @ (BD) = ( A 0 B)(C @ D )
(5)
(6)
for matrices of appropriate sizes. An important consequence of (6) is the factorization A O B =(A O IN)(~M O B ) = (1, O B ) ( A O 1 ~ ) -
(7)
Consider the factor B L@B=[;
...
:::
0
Take z E qMN and segment z into M consecutive segments, each of length N:
6
M. AN, I. GERTNER, M. ROFHEART, AND R. TOLIMIERI
The action of 1, 0 B on z is performed by computing the action of B on each of these segments, in parallel:
We call I,, 0 B a parallel tensor product factor. The factor
a ~ I, M- - 11, can be computed as a vector operation, since ( A 0 I,)z =
[
ao,oZo
+
..*
+
aO,M-lzMI
+
aM-l,oZO '*' + a~ - L M - 11,- 1 where q k Z k denotes a scalar vector multiplication and + denotes vector addition. The factor (1 1) is called a vector tensor product factor. More generally, the expression Ihf 0
c @ IN
(13)
with an R x R matrix C is called a mixed-type ( M , N) factor, since it can be implemented by M parallel operations of the vector-type operation C 0 I,. The Fortran structure implementing these actions can be found in Tolimieri et al. (1990). Tensor products are related to the identification between one-dimensional arrays and two-dimensional arrays established by assigning to the L = MNdimensional vector z the N x M matrix
z =[z,z,.*.zM~l] formed by placing the segments Zo,Z , , . . . ,2, corresponds to the N x M matrix
-
(14)
as columns. Then ( A 0 B)z
BZA',
(1 5 )
where A' denotes matrix transpose.
C . Stride Permutation The factors A @ I, and IN @ A act by the action of A on M-dimensional subvectors formed from the input MN-dimensional vector z. They differ in
DISCRETE FAST FOURIER TRANSFORM ALGORITHMS
7
implementation by how these subvectors are formed and where the results are placed. The data permutations covering these differences are called stride permutations. They can be implemented, on some processors by machine instructions (LOAD-STORE), as the components of the input vector are loaded from cache or main memory into registers and as results are stored from registers back into cache or main memory. Set L = MN. The L-point stride N permutation matrix P(L, N) is defined by
P(L N ) ( x 0 Y ) = y 0 x,
i16)
where x and y are arbitrary vectors of sizes M and N, respectively. This completely defines P ( L , N ) since the set of vectors x 0 y spans L-dimensional space as x spans M-dimensional space and y spans N-dimensional space. Stride permutations are closely related to matrix transposes under the identification of a vector z of size L with the N x M matrix 2 = [Z,,
21,
.-.,2,- 11,
(17)
defined in the preceding section. We have that P(L, N ) z corresponds to the transpose of 2 or, equivalently, is formed by running across the rows of 2.To compute P(L, N ) z we stride through z with stride N. For example, if we take M = 2 and N = 3, then
The elements of z are collected at stride three into three consecutive segments, two elements in each. The first segment begins with z,, the second segment begins with z l , and the third segment begins with z 2 . In general, P ( M N , N ) reorders the coordinates at stride N into N consecutive segments of M elements, the kth segment beginning with xk. Tensor product identities are greatly influenced by the algebra of stride permutations. The first important result describes how stride permutations are combined to produce new stride permutations.
Theorem 1. If N = RST, then P ( N , S T ) = P ( N , S ) P ( N , 7'). In particular, P(NM, M ) - ' = P ( N M , N). A more complete description of stride permutation algebra appears in An et al. (1989). The main theorem governing data flow in the implementation of tensor actions is the commutation theorem.
8
M. AN, 1. GERTNER, M. ROFHEART, AND R. TOLIMIERI
Theorem 2. For any M x M matrix A and any N x N matrix B, P ( M N , N ) ( A 8 B ) = ( B 0 A)P(MN, N ) .
The commutation theorem is an important tool for interchanging and manipulating the degree of parallelism and vectorization in an algorithm. In particular, P ( k f N ,N ) ( A 8 IN)P(MN,N)-' = !N @ A,
(21)
P(MN,N)(IM 0 B)P(MN,N)-' = B @
(22)
IM,
from which we can write A 8 B = ( A 8 IN)^ ( M N , N ) - ' ( B 8 I M )( M ~ N ,N ) ,
(23)
A 8 B = P ( M N , N)-'(IN 8 A ) P( M N ,N ) ( f M@ B).
(24)
Formula (21) decomposes the parallel action 1, 0 A into an input readdressing stage described by P(MN, N ) - ' , the vector action A @ I N , and an output readdressing stage given by P ( M N , N ) . In (23) the tensor product action A @ B is decomposed into two vectors actions where readdressing occurs at input and after the first computational stage. In all of these cases, the intervening stride permutations provide a mathematical language for describing the readdressing between stages of the computation. Ignoring, for the moment, the actual implementation of the vector operations in (23), the main problem in computing the action of A 8 B is implementing the stride permutations. In many vector machines this readdressing can be carried out using machine language directly (Tolimieri et al., 1990; and 1989). Stride permutations can also be used to change mixed-type factors. Suppose A is a T x T matrix. Then I, 0 A 0 Is = P ( R S T , R ) ( A 6 IRs)P(RST,R)-', IR
8 A 8 Is = ( P ( R T R ) 8 Is)(A 8 IRs)(P(RT,R)-' @ 1s).
(25)
(26)
DISCRETE FAST FOURIER TRANSFORM ALGORITHMS
9
In both cases, the (R, S)-type factor has been changed into the vector factor A 0
IRS,
but the readdressing is different. Similar results hold when changing an (R, S)type to a parallel operation. The permutations P ( R S , S ) 0 1, and IR
0 P(ST T )
are especially important for some architectures. The permutation 1,
6P W , T )
can be interpreted as R copies of the permutation P(ST, T). Segmenting the input vector into R vectors of length ST, we perform this operation by acting by the strfde permutation P ( S T , T ) on each factor. As with any parallel operation, the same action is taken at different offsets of the input vector. The permutation
P ( R S , S ) 0 1, permutes blocks of the input vector. Segmenting the output vector into R S segments of length T, the stride permutation P ( R S ,S) permutes these segments at stride S.
D . Multidimensional Tensor Products
Three or more factor tensor products can be defined by induction:
It is easy to see that multidimensional tensor products are independent of how the factors are associated. A two-factor tensor product is naturally related to the identification of a one-dimensional array to a two-dimensional array. A three-factor tensor product (27) relates a one-dimensional array to a threedimensional array. Multidimensional tensor products correspond to multidimensional arrays but offer a concise and easily manipulated language for describing certain standard operations and data readdressing. The commutation theorem can be used to obtain tensor product identities for multidimensional tensor products. Along with the basic tensor product identities, these identities establish symbolic manipulation rules that can be
10
M. AN, 1. GERTNER, M. ROFHEART, AND R. TOLIMIERI
used to match a particular tensor product computation to a particular machine. The general theory of multidimensional tensor products including a discussion of several important factorizations can be found in An et al. (1989). In fact, from the basic two-factor Cooley-Tukey algorithm, we can directly apply these identities to derive the Cooley-Tukey for any transform size. The fundamental variants, described in the following section, can be viewed as special cases of the general multidimensional tensor product theory. In general, however, these variants are not sufficiently rich to solve a concrete implementation problem where specific machine parameters control the computation. A design example has been included to show the power of the tensor product as a design tool. At this time, we will state the most important multidimensional tensor product identities. Let N l , ...,4 be positive integers, and ANk an Nk x Nk matrix. See N ( k ) = N , . . N k , N = N , . . N r , and N(0) = 1 .
-
where Qk = I N ( k - 1) Q P ( N / N ( k - I), N k ) . Identity (29) follows from the product rule. Each factor can be parallelized by the identity I N ( , - 1)
@
AN(k)
8 l N / N r = Pk(zN/Nk
@
AN,)P;
' 3
(32)
where pk = P ( N , N(k)). Identity (30) results from combining the intervening stride permutations. A second identity parallelizing the factors in (29) is given by l N ( k - 1)
@
@
IN/,(,)
= Qk(lN/Nk @
ANk)Q;
'9
(33)
where Qk = Ihi(k-1)
@ P ( N / N ( k - I), N k ) .
An important distinction between the two parallelizations can best be seen when all the Nk are equal. In this case the data readdressing between each stage of (30) is the same, while in (31) the data readdressing varies between each stage, acting on segments of different sizes in the inner loops. The advantage of the uniform data readdressing in (30) for hardwiring was first recognized by Pease (1968).
DISCRETE FAST FOURIER TRANSFORM ALGORITHMS
11
E. Cooley- Tukey Algorithms Cooley-Tukey algorithms (Cooley and Tukey, 1965) use the additive structure of the indexing Z / N , N composite, to identify, as in the preceding section, one-dimensional arrays to two-dimensional or multidimensional arrays. The arithmetic computations are best described on these multidimensional arrays, or equivalently using the language of tensor products. A history of Cooley-Tukey algorithms can be found in Cooley (1987). 1. Two-Factor Cooley- Tukey Algorithm
The N-point Fourier transform matrix is defined by
11
Direct computation of the action of F ( N ) on a vector requires N ( N - 1) additions and N 2 multiplications, but much of the arithmetic is redundant. The Cooley-Tukey algorithm (Cooley and Tukey, 1965) and its variants make extensive use of these redundancies. The basic Cooley-Tukey algorithm can be written as the factorization F ( R S ) = ( F ( R )0 ls)T~'(lR0 F(S))P(RS,R),
(35)
where TgRis a diagonal matrix (twiddle factor).This factorization decomposes the N-point FT, N = RS, into four stages:
. ..
the stride permutation P(RS, R ) , the parallel operation ZR 0 F ( S ) , the twiddle factor T,RS,and the vector operation F ( R ) 0 Is.
The arithmetic count is R S 2 multiplication for the parallel factor, N multiplications for the twiddle factor, S R 2 for the vector factor. In all, N(R + S) multiplications are required. If R or S are composite, then factorization (35) can be applied to F ( R ) or F ( S ) . For highly composite numbers N , the multiplication count is of order of magnitude N log N. Analogous results apply to the addition count. The twiddle factor TgRis defined by the matrix direct sum
12
M. AN, 1. GERTNER, M. ROFHEART, AND R. TOLIMIERI
where D t s = diag(1, w,. . .,w s - I ) . Relative to the stride permutations, we have TgR = P(RS,S)T:sP(RS,S)-‘. (37) Variants of the two-factor Cooley-Tukey algorithm can be derived by taking transpose and by the commutation theorem. The transpose interchanges the data flow. For example, applying the transpose to both sides of formula (35),we have F ( R S ) = P(RS, S)(IR 0 F ( S ) ) T f S ( F ( R8 ) Is). The permutation is now on output (decimation in frequency), while in (35) the permutation is on input (decimation in time). Applying the commutation theorem to (354, we have F ( R S ) = P(RS,S)(IR @ F(S))P(RS,R)Tgs(I, 8 F(R))P(RS,s).
Each Fourier transform factor is a parallel factor. As with the multidimensional tensor product factorizations, several “parallel” versions can be given with variations in data readdressing. To the extent that stride permutations are machine-instruction implementable, each of these variations is easily coded without complicated line code. The matching of machine parameters to these options in readdressing is a basic design problem. The same methods produce vector versions and versions containing various degrees of parallelization and vectorization. 2. Radix-Two Cooley- Tukey Algorithms If the transform size contains more than two factors, then the methods of the preceding section can be reapplied to the Fourier transform factors. As the number of factors increases, the number of possible factorizations increase. Several algorithms have been distinguished over the years. These variants have the same arithmetic, but vary as to data flow and to type of Fourier transform factor. In this section, we consider the case of transform size N = 2R. Cooley -Tukey Algorithm: k
The Fourier transform factor 12k-j
@ F(2) 8 I 2 j -
1
acts by 2k-’ parallel computations of the vector two-point FFT on vectors of size I z j - I . The twiddle factor IZk-1
8 T ~ J1(2j) -
DISCRETE FAST FOURIER TRANSFORM ALGORITHMS
acts by 2 k - j parallel diagonal actions where, in general, if N
=
13
RS,
The matrix Q ( 2 & )is 2'-points bit reversal, which can be defined by @ x2 @ Q(2k)(~1
'.'
@ xk) = xk @
"*
@ x2 @ x i ,
where xi, 1 5 j I k, is a two-dimensional vector. Taking transpose on both sides, we get the Gentleman-Sande FFT (Gentleman and Sande, 1966). Gentleman-Sande FFT:
42"). where T(2") = TZmThe Pease FFT (Tolimieri et al., 1990) is based on the identity z2j- I
0 F(2) 8
Z2k- I
= P'(1,k
0F(2))P-j,
where P = P ( 2 k , 2 ) .The uniform data flow between the stages is given by combining the intervening stride permutations. Pease FFT:
n k
F(2&)= Q ( 2 k )
Tj(l,k-
1
0 F(2))P(2&,2'-*),
j= 1
where
r j = P j ( 1 2 , - l @ T(Zk-j+l)) p -j , and P = P(2', 2 k - 1 ) . A vectorized version of the Pease FFT was obtained by Korn and Lambiotte (1979). Korn-Lambiotte FFT:
Complete vectorization is achieved with constant data flow between stages. Taking transpose results in the Singleton FFT. In each of the above factorizations, a bit reversal is required at input or output. In Cochran (1967),
14
M. AN, I. GERTNER, M. ROFHEART, AND R. TOLIMIERI
an FFT algorithm attributed to Stockham is designed that avoids bit reversal by distributing it throughout the different stages of the computation. Complete vectorization is still achieved, but in contrast to the KornLambiotte FFT, the data flow varies from stage to stage. The main formula guiding the derivation of the Stockham FFT is 1Zj-l
where Qj = Q ( 2 j ) 0
0 F(2) 0 12k-1 = Q i ' ( F ( 2 ) 0 12k-1)Qjr 12k-
I.
Bit reversal is used to vectorize the mixed factor.
Stockham FFT: k
F ( 2 k )= j=
1
q ( F ( 2 )0 12k-j)(P(2', 2) 8 12k-l),
where = Q j ( l Z j - l@ T ( 2 k - J ' 1 ) ) Q j .
The Stockham FFT is called a self-sorting FFT, since data movement is not required in a big chunk at input or output, but occurs as small selfcorrections during the computation. Temperton (1989) discusses the advantages of this approach. Similar algorithms can be derived for general transform size
N
=
h',.*.h',..*h'k
and are called mixed-radix Cooley -Tukey algorithms. These algorithms offer flexibility on high-speed supercomputers that can often avoid the cost of datatransfer bottlenecks. A detailed account of these results can be found in Tolimieri et al. (1989). F. A Design Example
In this section the language and identities of the tensor product will be used to implement a tensor product operation on a model vector processor. The main idea is to design an algorithm that takes advantage of the architectural features of the model. In particular, vector instructions such as addition and multiplication are most efficient for vectors of size equal to the vector register length. It is essential for good performance that maximum use is made of vector instructions. Vector instructions are carried out on vectors located in vector registers. The instruction
vo + V1 + v2 adds the vectors contained in registers V l and V 2 and produces the results in register VO. Although vector registers contain a specified fixed number of elements, we can operate on subvectors by various methods of segmenting. These in-
DISCRETE FAST FOURIER TRANSFORM ALGORITHMS
15
structions are tied to the LOAD/STORE instructions, which transfer vectors between memory and vector registers. The instruction
VitX,s loads into the vector register Vi elements from memory beginning at X at stride s. Similarly, a vector register can be stored to memory at any given stride by the instruction
Y + s, Vk, which places the contents of the vector register Vk into memory beginning at Y at stride s. The following example exhibits the power of tensor product as a tool for making efficient use of machine parameters. Let, for example, the vector register length be 64. Let A for simplicity denote the two-point Fourier transform matrix
We want to compute A 0 I,,,. Since A operates on a vector of size 128, segmentation is necessary to fit the vector registers. By the commutation theorem, A 0 I,,, = P(256, l28)(I, 0 A
0 16,)P(256, 2).
The vector operator A 0 I,, acts on vectors of size 64, but loading at stride 2 as indicated by P(256,2) requires two vector registers of size 128. We use the identity
P(256, 2) = (P(4,2) 0 16,)(12 0 P(128, 2)). The first factor creates four segments of size 64.
The second factor permutes these segments, giving
vo, v2, V1, v3. 8 A 0 1, operates on these segments by applying A 0 Z64 to VO, V2 and to V1, V3. Output can be handled in the same fashion. Several design criteria have been met in this example. There are no unnecessary memory operations, since both addition and subtraction from A are performed on the segments before outputting. Also, the segmentation 1,
16
M. AN, I. GERTNER, M. ROFHEART, AND R. TOLIMIER1
overlaps computation and memory operations, since as the additions and subtractions are being performed, we can simultaneously load and store other segments making use of the vector instruction pipeline. We list below mixed-radix factorizations. The exact form of the twiddle factors can be found in Tolimieri et ul. (1989), along with derivations. Mixed Radix:
N
= N,.-*Nk.
M, = N,.-*Nk.
T(M,)= T,,
+
l(MJ
QI(uN10 -..6 aNI)= uNI0
(twiddle factor) @ aNI.
(bit reuersul)
Q = Qk.
111. MULTIDIMENSIONAL ALGORITHMS A. Introduction
Multidimensional tensor products play a central role in our description and implementation of Cooley-Tukey FFT algorithms. Usually, these al-
DISCRETE FAST FOURIER TRANSFORM ALGORITHMS
17
gorithms are based on mappings of input and output data into multidimensional arrays and operations on these multidimensional arrays. The multidimensional FFT describes directly operations on multidimensional arrays, which also admit tensor product formulation. In an appropriate level of abstraction, one-dimensional and multidimensional FFT algorithms can be derived and expressed in exactly the same language. They are distinguished by indexing group presentation. Take positive integers N,,N 2 , . ..,N, and an n-dimensional array of complex data
a = a(j), (38) where j = (jl,. , . ,j,,), 0 5 j , < N,. The n-dimensional N , x N2 x . . - x N,, Fourier transform A is given by
where k = (kl,. ..,k,,), 0 Ik, 4 N,. Computation (39) can be decomposed into a sequence of n one-dimensional Fourier transforms (row-column met hod):
Nt
-1
A ( k )=
a,,-](jl,k2,...,kn)e2ni(k1jl/N1’.
j , =O
This computation can be written as an n-dimensional tensor product by linearly arranging the n-dimensional array with j,, the fastest running variable, followed by j,. .,jl, and k arranged in the same way. In this way, (40) is equivalent to the action of the n-dimensional tensor product F(N1) @
- - *
@ F(N,,).
(41)
All the rules established in previous sections for multidimensional tensor products apply, with the word of warning that the mapping between onedimensional and multidimensional arrays has been transposed. The onedimensional factors can be computed using Cooley -Tukey FFTs or Winograd/Rader FFTs, and the global computation is “nested” in a variety of ways. For example, if small Winograd FFTs are employed,
F(N,.) = C j B j A j ,
(42)
18
M. AN, 1. GERTNER, M. ROFHEART, AND R. TOLIMIERI
where Cj and A j are matrices of zeros and ones and Bj is a diagonal matrix, then by the tensor product multiplication rule we can rewrite (41) as
The algorithm described by (43) is especially efficient if the preaddition stage @;= Aj and the postaddition stage @;= Cj can be implemented by specialpurpose hardware-for example, systolic array. The main computation stage Bj. Taking the inverse of (43), the main is given by the diagonal matrix computation stage is the inverse of this diagonal matrix, and a computationally precise result is obtained. The row-column method requires data transposition between each stage of the computation. For example, writing
,
F",) 6 W 2 ) = PlUN* Q ~ ( w p ; l ( ~ N 0 l W2)), (44) we compute N , N,-point Fourier transforms on the rows of the twodimensional array, interchange the rows and columns, and then take N, N,point Fourier transforms on the rows. The processed data is output by a second row-column interchange. Various implementation and algorithmic schemeshave been devised to reduce the cost of data transposition as well as the cost in size and number of Fourier transforms required. The application of tensor product identities with various one-dimensional algorithms computing the factors provides a wide variety of nesting options.
B. Fourier Transform of Finite Abelian Groups Consider a finite abelian group of order M . By a well-known result, we can present A, in several ways, as a direct sum of cyclic groups: A = Z / M , Q * * * 6 ZIM,,
M = MI
* - -
Mr.
(45 1
In this presentation, a typical point a E A can be written as a = (al,.. . ,ar), where aj E Z / M j . Addition in A is given by coordinatewise addition. A mapping
x: A
+
U,
(46)
where U is the multiplicative group of Mth roots of unity, is called a character of A if
+
x(a b) = x(a)x(b), a, b E A. (47) Denote the set of all characters of A by A*, which is a group under the addition
DISCRETE FAST FOURIER TRANSFORM ALGORITHMS
rule (a*
+ b*)(c) = a*(c)b*(c),
19
a*, b* E A*, c E A.
(48) The groups A and A* are isomorphic. To see this, we return to the representation (45) of A . The set A* consists of all mappings Yo,a E A, given by ya(b) = e2ni((albr/Mi)+ .'.+(a,.br/MrJ), b A.
(49)
The mapping Y :a + yb
is a group isomorphism of A onto A*. Denote the set of all complex valued functions on a set X by L ( X ) .The Fourier transform of A is the mapping from L ( A ) onto L(A*) defined by the formula
where ( a , a * ) = a*@). This definition depends solely on the group A . It is independent of the isomorphism Y between A and A*. To make it more familiar, define ( F u l f ) ( a )=
~ ( f ) ( Y ( a ) ) , a E A.
We see that F,( f ) E L ( A ) is given by ~ , ( f ) ( b )= f ( a ) e 2 n i ( ( b ~ a ~ / M ~. .). + ( b d o , / M . ) ) .
C
REA
(52)
(53)
Suppose A , is another finite abelian group and 4 is a group isomorphism from A , into A*. The groups A, and A are isomorphic, but the following definition depends solely on 4. The Fourier transform F4 is the mapping L ( A ) onto L(A,),defined by
Although the Fourier transform F of A depends solely on A, each isomorphism q5 of A, onto A* gives rise to a linear isomorphsim F,, which we also call a Fourier transform or, more precisely, a presentation of the Fourier transform. To make sense, these various presentations must be related. The next result states the relationship.
Theorem 3. If and 4, are isomorphisms of the groups A , and A , onto A*, respectively, then for any f E L ( A ) , F4,(f)(a,)= F4*(f)(%)9 where a , = 4;'4,a,, a,
E
A l , a,
EA,.
(54)
20
M. AN, 1. GERTNER, M. ROFHEART, AND R. TOLIMIERI
From any one presentation of the Fourier transform, we can compute another by data permutation. Fix an isomorphism 4 of A onto A* throughout the following discussion. Consider any subgroup B of A. The dual B' of B defined by
B1 = {a E A: (b,&a))
= 1,
Vb E B }
(55)
is again a subgroup of A. The isomorphism 4 induces several isomorphisms. We list two, with notations for their associated Fourier transform presentations: B'
+ (A/B)*,
F,:L ( A / B )+ L(B1);
(56)
A / B 1 --* B*, F,: L(B)+ L(A/B1), (57) where AIB denotes quotient space. To define the first induced isomorphism, we first restrict 4 to B1 and notice that $ ( b l ) , b' E B1 acts trivially on B and consequently defines a character on A / B . A function f E L ( A ) is B-periodic if a E A, b E B. f(a + b) = f(a), The space of B-periodic functions in L(A),denoted by LB(A),can be identified with the space L(A/B).Consider coset representative a. = 0, al,. . .,aJ- for A / B . A is the disjoint union of cosets
B,a1
+ B,...,~j-i+ B,
(58)
and each a E A can be written uniquely as a = aj + b, for some 0 5 j c J , b E B. Iff is B-periodic, then
+
0 I j < J , b E B, f ( a j ) = f ( a j b), (59) and f induces uniquely and unambiguously a function on A/B that assigns to the coset aj B the value f ( a j ) .The induced function will also be denoted by f. Consider F,(f), for any B-periodic function F We will show that F,(f) vanishes off of B l . Take c .$ B*. Then
+
which by the B-periodicity off can be rewritten as
'B there is a b, Since c # ,
E
B such that ( b o , 4(c)> # 1. From
DISCRETE FAST FOURIER TRANSFORM ALGORITHMS
21
F,(f)(c) = 0,
(64)
we have and whenever c 4 B'. For c E B', we proceed as before, except we have now that
where O(B)denotes the number of elements in B. Consequently
Viewingf as a function in L ( A / B ) ,the summation on the right-hand side of (66) is F , (I')W,
(67)
where F , is the Fourier transform of L ( A / B ) onto L(B') induced by have proved the next result.
4. We
Theorem 4. Iff E L ( A )is B-periodic, then F,(f) vanishes off of the dual B'. On B', it is given by
BL, (68) where F, is the Fourier transform of L ( A / B )onto L(B') induced by 4. F,(fNbl)
=
O(B)Fi(f)(b'),
b'
E
This is the key result in the Good-Thomas, Cooley-Tukey, and Line algorithms derived in the following three sections. A function J' E L ( A ) is called B-decimated iff vanishes off of B. By Theorem 2, the Fourier transform F , ( f ) of a B-periodic function f is BL-decimated. The next result gives the converse.
Theorem 5. Iff E L ( A )is B-decimated, then F,( f ) is BL-periodicviewed as a function on A J B ~ F,(f)(c) = F,(f)(c),
(69)
c E A/B1,
where F2 is the Fourier transform of L(B)onto L(A/BL)induced by
4.
C. Good- Thomas FFT
The dimension of a Fourier transform of a finite abelian group A is a parameter that depends upon the presentation of the Fourier transform and
22
M. AN, I. GERTNER, M. ROFHEART, AND R. TOLIMIERI
consequently on the isomorphism d, of A onto A* relative to which it is defined, Since the collection of presentations of the Fourier transform of A are all related by data permutation, from a computational point of view the distinction is solely psychological. The Good-Thomas FFT for onedimensional Fourier transforms is perhaps the most commonly known example of this phenomenon. The problem is to compute the one-dimensional N-point Fourier transform where N = PQ, P and Q relatively prime integers. The cyclic group Z / N is isomorphic to the direct sum of the cyclic groups Z f P and Z / Q : ZIN z Z / P 0 Z / Q . The isomorphism usually attached to Z j N , a -, yo(b)= e2ni(ab/N) ,
(70)
(71) leads to the one-dimensional N-point Fourier transform, while the isomorphism usually attached to ZIP 0 Z / Q , (a, b,
-+
y(a,b)(
a,b~ZfN,
c 3 d ) = e2ni((aciP)+(bd'Q)), a, c
E ZIP,
b, d
E Z/Q,
(72)
leads to the two-dimensional P x Q Fourier transform. Up to data permutation, they are equivalent computations. In this section, we generalize the above result to the following situation. Suppose throughout that 4 is an isomorphism of A onto A*, and that we can write A as the direct sum A = B @ B',
(73)
where B is a subgroup of A and the dual B* is taken relative to 4. Since any direct sum decomposition, A = B @ C, determines an isomorphism 4 of A onto A* such that C = B1 relative to 4, the following discussion applies to any direct sum decomposition of A. The projection
b+bL+bL (74) canonically induces a group isomorphism of A / B onto El, which will be assumed throughout the discussion. In this way, we view F, as a linear isomorphism of t(B*)and F2 as a linear isomorphism of L(B),where F, and F2 refer to the Fourier transform presentations introduced in the previous section. Take f' E L(A)and denote again by f' the function on B x B* defined by f ( b ,6') = f ( b
+ b'),
b E B, 6'
E
B'.
(75)
In general, a function h(x,y) defined on a Cartesian product X x Y will be viewed as a collection of functions in L( Y ) indexed by X.For fixed x E X , the function h ( x , y ) E L ( Y ) is called the x-slice of h and is denoted by h,(y).
23
DISCRETE FAST FOURIER TRANSFORM ALGORITHMS
For each b E B, define the decimation Db(f ) off to the coset b aEb
( 4 ())(a) f = and the translation
+ B'
by
+ B',
otherwise,
q(f)by Ta(f)(4= f ( a + 4.
(77)
Since we have
We compute F,(f) by first computing I$ of the decimations D b ( f ) ,b E B. The function (80) is BL-decimated. By Theorem 3, F+(y)is B-periodic, which implies that it is completely determined by its values on B', and for c E B, c' E B', g=
(b, #(c
Tb(Db(f))
+ cA))= ( b , #(c)),
the following result has been proven. Theorem 6. For f E L(A), c E B, c' E B',
The right-hand summation can be rewritten as
F,(Y& )(c),
(85)
where (86) Computing F + ( f ) by Theorem 4 proceeds through the following stages: gcL(b)= F l ( f b ) ( c l )
1. Form the slices f b
E
L(B'):
fb(W= . f ( b+ b').
24
M. AN, I. GERTNER, M. ROFHEART, AND R. TOLIMIER1
2. Compute the Fourier transforms of these slices:
Good-Thomas:
L(A) form dices
3
L(B x
Bl)
FI
1
L(B x B ') transpose
1
L(B' x E) F2
1
L(B' x B) transpose
1 L(A). Applied to the case the Good-Thomas FFT is the row-column method for computing the twodimensional N, x N2 Fourier transform. If A = Z / N where N = PQ, with P and Q relatively prime, then the isomorphism of cyclic groups Z / N z ZIP 0 Z/Q
(88) changes the one-dimensional N-point FT to the two-dimensional P x Q FT. Increasingly the Chinese remainder theorem is identified with this case of the Good-Thomas algorithm. It provides a ring isomorphism and introduces
25
DISCRETE FAST FOURIER TRANSFORM ALGORITHMS
idempotents into the picture with important implications to algorithm design (Tolimieri ef al., 1989; Temperton, 1989).
D. Multidimensional Cooley - Tukey Algorithm Consider a finite abelian group A and an isomorphism 4 of A onto A*. Let B be a subgroup of A and B* be its dual. In this case we no longer assume that the subgroup B "splits" in A in the sense that A is the direct sum of B and some subgroup C.We still have, however, the short exact sequence
,
for A / B and coset representatives Take coset representatives a,, a,, . . . ,a,c, C,,...,C~-~ for A/B'. First we will give a straightforward proof of the multidimensional Cooley-Tukey FFT computation of F,(f), f E L ( A ) , relative to the subgroup B. The interpretation of the stages will reveal the underlying periodization- decimation.
(90)
We see that F,(f) can be computed in the following sequence of steps:
1. Restrict f to the coset ck
+ B'
and form K EL-decimated functions:
fk(bl)E L(A/B' x E l ) : fk(bi)
= f(ck -I-b l ) ,
b' E B,0 I k < K .
2. Compute, for each 0 Ik < K , gk,
= Fl(fk)(Uj),
0 Ij < J ,
where F , is the Fourier transform of L(B*)onto L ( A / B ) induced by 4. 3. Compute the JK products G k j , 0 I j < J , 0 Ik < K : Gkj
= g k j ( c k , 4(aj))*
4. Form the J functions hjk = hj(ck) E L ( A / B x A/B'): hj(Ck)
=Gjk,
0 I j < J , 0 Ik < K .
26
M. AN, 1. GEKTNER, M. ROFHEART, AND R. TOLIMIERI
5. Compute, for each 0 I j < J, b E B,
H,(b)= F z ( h j ) ( b ) ,
where F2 is the Fourier transform of L(A/B') onto L(B)induced by 4. 6. Form F+(f)(a)E L(A): F+(f)(aj+ b) = H,(b),
b E B, 0 I j < J.
Except for the twiddle factor, step 3, which results from the lack of splitting, the flow diagram of the multidimensional Cooley-Tukey FFT is essentially the same as that of the Good-Thomas FFT.
L(4
.fE
1
f k ( b ' ) ~L ( A / B ~x
BI)
1 gkj
= Fl(fk)(aj)
L(A/B1
A/B)
1 Gkj
= Ykj
4(aj>) EL(A/BL
A/B)
1
hjk = G k j E L ( A / B x A/B')
IV. LINEALGORITHM A . Introduction
Consider the two-dimensional N x N Fourier transform N-1 N-1
F(a, b) =
1 f(j , k)eZ"i(j"'N)+(kb"),0 I a, b -= N .
j = O k=O
(91)
The first stage computes
On a parallel machine, we can view this stage as placing into a processor
DISCRETE FAST FOURIER TRANSFORM ALGORITHMS
27
labeled b, 0 I b < N, the N numbers
f i ( O , b ) * . . . , f l ( N- 1,b)(93) In the second stage, we process by the N-point Fourier transform and read out the results. In the j t h address of processor b, we placed , f l ( j ,b),which can be computed by N multiply-accumulate operations. Once this step is implemented, the N registers are acted on in parallel, and data communication between processors is not required. The goal of the line algorithm is to replace the multiples solely by additions at the cost of a more complicated data flow. The basic idea is as follows. We fix 0 I S c N and compute the Fourier transform on the “line” through the origin and (I, S ) : N-1 N - 1
F(t,rS) =
j=O
k=O
f(j,k)~(j+‘~)‘, 0 I t < N.
(94)
Set a = j + kS, b = k , where equality is taken mod N. As ( j ,k ) runs over 0 < j , k < N , we have (a, b) running over 0 I a, b < N. Rewriting (94), we have
1
f’(a - bS,b) war,
0 I t < N.
(95)
This computation can be carried out in two stages. First for each 0 I a < N , we form the sum N-
fs(a) =
‘
1-f ( a - bS, b). I
h=O
If a = 0, this amounts to adding all the data on the line through (0,O) and the point ( - S,1). In general, we add all the data on the line “parallel” to this line passing through the point (a,O), 0 I a < N. Placing these N values in a processor labeled S, the N-point Fourier transform computes F ( t , t S ) , 0 I f < N. In the first step of the line algorithm, the indexing set Z / N x Z / N is decomposed into lines passing through the origin that cover the indexing set. Lines other than these described above will be required, but the general idea is the same. Details will be given in the following section. The Fourier transform is then computed, in parallel, on each of these with Formula (95) or a similar formula describing the preaddition steps required.
B. Prime Case We will present the 2-D line algorithm (Gertner, 1988)in detail for a variety of transform sizes and show that a particular implementation leads to the
28
M. AN, 1. GERTNER, M. ROFHEART, AND R. TOLIMIERI
Nussbaumer-Quandalle algorithm (Nussbaumer, 1982). Generalizations to higher dimensions can be found in Gertner (1987) and Tolimieri and Gertner (1989). First we consider the case Z / p x Z / p , p prime, and use the inner product
-
a b = albl
+ a2b2,
a, b E Z / p x Z / p
to define duality. The line through a point a
= (a,, a2)E Z / p
L(a) = { t a = ( t a , , t a 2 ) 0: It
-= p } .
(97) x Z / p is the set (98)
The dual to L(a) is the line L ( a l ) ,where a' = ( - a 2 , a 1 ) .
It is clear that
(99)
-
a a' = 0.
In general, it is not the case that L(a) # ,!,(a1). For example, if p = 5, then a = (1,2) implies a' = ( - 2 , l ) = -2(1,2) E L(a) and L(aL)= L(a). The following set of lines cover Z i p x Z i p :
To see this, we argue as follows. If a 4 Lp then a = (al, a,), where a , # 0, and a, is invertible mod p. We can write a. = a l ( l ,ay'a,) E L,, l a * . Thus there are p + 1 lines covering Z / p x Zip. Take a function f ( a ) on Z / p x Z / p . The Fourier transform F(f) on the line L j , 0 5 j < p , is given by F(t, t j ) = f ( a ) w @ a ~ +jtaz) = ezxi/P. (101)
c
aEZ/PXZ/P
Setting a, = b, - jb,, a, = b,,
which we see is the p-point Fourier transform of
DISCRETE FAST FOURIER TRANSFORM ALGORITHMS
In the language of the previous section, we form relative to the line L(--j, 1) = L’(1,j). The Fourier transform off on L p is given by
29
by periodizing f
which is the p-point Fourier transform of the periodization off’ relative to the line L( 1,O) = L’(0,l). In all, p 1 p-point Fourier transforms are required to compute two-dimensional FT on Z / p x Z / p . To have the line algorithm match up with the Nussbaumer-Quandalle algorithm (Nussbaumer, 1982), we restrict computation (102) to 1 I t < p, observing that computation (104) computes F(0,O). In matrix language we compute p matrix products of the form
+
Since the sum along each row equals 0, we can carry out this computation by the matrix product
which Nussbaumer calls pointed Fourier transforms. We can also write
As shown by Tolimieti (1986), this matrix product can be carried out as polynomial multiplication modulo the cyclotomic polynomial having w as a root.
30
M. AN, I. GERTNER, M. ROFHEART, AND R. TOLIMIERI
Example: Line algorithm to compute 3 x 3 FT. The 2D input array to be transformed is: f02
f12
f22
f0l
fl,
f21.
foo
f10
f20
Step 1: Summation of input data:
c f(a1, 2
Na2) =
a,
=O
a219
j , bl, a, = 0, 1,2.
The summations as given by the above equations are
Step 2: Computation of four one-dimensional three-point DFT:
A one-dimensional three-point Fourier transform is performed on each of go, gl, g2, h. Direct computation shows that this gives the 3 x 3 twodimensional DFT off.
DISCRETE FAST FOURIER TRANSFORM ALGORITHMS
31
C. Prime Power Case Consider a function f ( a ) ,a E Z/p2 x Zip2, where p is prime. The general prime power case can be handled in exactly the same way (Gertner, 1988). For any a E Z/p2 x ZIP2,the set L(a) = { f a :0 s t < p2]
(108)
is the subgroup of Z/p2 x Z / p 2 generated by a. By a line, we will mean a subgroup of the form L(a) having p2 points. If a = (a,,a,) with p f a,, or p f a 2 , then L(a) is a line, while if p/u, and p/a2, then L(a) is a subgroup of order p(a # 0). We will prove that the following lines cover Z/p2 x Z/p2: 0 Ij < p2,
L((l,j)),
(109) (1 10)
0 Ik < P.
U k p , I)),
(111)
We first observe that all points of the form u = ( a , , a 2 )where p f a, are contained in the union of the first collection of lines since, arguing as before, a , is invertible mod p and (a1,u2)= a1(1,a;'u2).
( 1 12)
If p/a, and p/a2 with a, # 0, then
which, since UllP
( 1 14)
is invertible mod p , implies that a is in the union of the first collection. The remaining points must satisfy p/al but p j a 2 . Then a = a2(a,'al,I)
(1 15)
is in the union of the second collection. There are p2 + p lines in all, and arguing as before, we can prove that p 2 + p p2-point Fourier transforms compute F(f) on all of Z / p 2 x Z/p2. We now have a great deal more redundancy in the computation. For instance, If 0 I j , k < p2, j # k, then
32
M. AN, I. GERTNER, M. ROFHEART, AND R. TOLIMIER1
If 0 Ij , k < p , j # k , then
JWP, 1)) n U j p , 1)) = W ,PI. If 0 I j < p 2 , 0 s k < p , then
W , j ) ) n U k p , 1) = (0). (118) The Fourier transforms F ( f ) on the lines (110) and ( l l l ) , respectively, are given by
which can be computed by p 2 + p p2-point Fourier transforms. To relate the line algorithm to the Nussbaumer algorithm, we observe that the computation in (1 19) can be carried out in three parts:
formed by erasing all rows corresponding to multiples of p in F( p 2 ) . Since in each row the sum of those elements determined by striding bypequals zero, we can replace this matrix product by a ( p 2 - p ) x ( p 2 - p ) matrix product that can be identified with polynomial product modulo the cyclotomic polynomial having w as a root (Tolimieri, 1986). The computations in (122) are pointed p point Fourier transforms that, except for (0,0), correspond to computing F( f ) on the subgroup L ( ( p , p j ) ) of order p . There are p such distinct computations. Arguing in the same way, the computations in (121) require p pointed p2-point Fourier transforms ~ ( f ) ( ( t k Pt)), ,
P f t , 0 I t < P Z , 0 Ik < p ,
(125)
DISCRETE FAST FOURIER TRANSFORM ALGORITHMS
33
and the p-point Fourier transform 0 If < P.
F(f)((O,rp)),
(126)
In all p 2 + p pointed p2-point Fourier transforms, p-pointed Fourier transforms, and one p-point Fourier transform are needed to carry out the computation of F ( f ' ) . The Nussbaumer- Quandalle algorithm removes redundant computation at the expense of uniformity. D. General Line Algorithm
The line algorithm can be extended to any transform size and any dimension. See Gertner (1988) for the composite size two-dimensional case. In this section we will extend the prime case to three dimensions (Gertner, 1987). The idea is the same: We compute the Fourier transform on lines in Z/p x Z / p x Z/p. However, since the dual of a line is a plane, the preaddition step periodizes relative to the dual plane. It follows equally from the general periodization-decimation theory that we could compute the Fourier transform on planes by carrying out a preaddition step that periodizes relative to the dual line. This is the route we will take. To compute F ( j ) over all of ( Z / P ) ~we, must determine a collection of planes P that covers ( Z / P ) ~The . minimal number of such planes will be the minimal number of two-dimensional p x p Fourier transforms required to compute F(f') by this method. The following p + 1 planes cover
P(j)= P((l,j?O),(O,O, I)),
0 Ij < p ,
P ( P) = P((0,1,O), (0,0,1)).
(127)
Computing F(f)over P ( j ) , we have
Setting a = r'
+ s),
b = s', c = t', we can rewrite (128) as p-1 p-1
F ( f ) ( r , r j , s )=
C 1
a=Oc=O
[c p-1
b=O
1
.f(a - jb, b, c) ora+sc,
( 129)
which is the two-dimensional p x p Fourier transform of the periodization of f' relative to the line L ( - j , L O ) perpendicular to P ( j ) . Computing F(f) on P ( p ) , we have
34
M. AN, I. GERTNER, M. ROFHEART, AND R. TOLIMIERI
which is the two-dimensional p x p Fourier transform of the periodization of f relative to the line L(l,O,O) perpendicular to P ( p ) . In all, p + 1 twodimensional p x p Fourier transforms are required after the preaddition steps
If F(f)is computed along lines, then p 2 + p + 1 one-dimensional p-point Fourier transforms are required, and periodization is over dual planes to the covering lines (Gertner, 1987):
0 Ij , k < p.
L(l,j,k),
An alternate derivation of the "line" algorithm can be given as follows for the case of computing F( f )on planes. Take linearly independent vectors, a, b E ( Z / P )and ~ form the plane P(a,b) = {ra + sb: 0 I r, s
-= p } .
(133) We want to compute the Fourier transform F( f ) on the plane P(Q,b), where F ( f ' ) is taken with respect to the standard inner product of (Zjp)'. Up to output permutation, we can compute, instead, the Fourier transform F,( f )on P (a,b) relative to the inner product (ra
+ sb + rc, r'a + s'b + t ' c ) , = ir' + ss' + tt',
(134)
where a, b, c is a basis of ( Z / P ) ~Since . F,( f )(ra + sb) = by writing u = r'a
U E (Zip)'
f(u)co(ratSb*U)l,
(135)
+ s'b + t'c, 0 Ir', s', t' < p , we can rewrite (135) as F,( f )(ra + sb) = 1 1 g(r', S ' ) W ~ ' * + ~ ' ~ , s'=o p-1
p-1
r'=O
where P- 1
g(r', s') =
"=O
f ( r ' a + s'b
+ t'c)
(137)
is the periodization off relative to the line through c. Computation (136) is easily recognized as the two-dimensional p x p Fourier transform of the periodization g(r', s'), 0 I r', s' < p .
DISCRETE FAST FOURIER TRANSFORM ALGORITHMS
35
E . N-dimensional Line Algorithm We will derive the three-dimensional p 2 x p 2 x p 2 line algorithm. The general N-dimensional p R x p R x p R case appears in Tolimieri and Gertner (1989). 17he Fourier transform F( f ) will be computed on lines in Z / p 2 x Z / p 2 x Z / p z that are defined as cyclic subgroups of order p z . For (a, b, c) E ( Z / P ~ ) ~ , the (cyclic) subgroup generated by a = (al a2 a 3 )is L(a) = { f a = (ta,, f a , , tu,): t E Z / p 2 ) .
(138)
L(a)is called the line through a if it has order p 2 . A set of lines covering ( Z / P ~is )given ~ by L(1, j , k)
0 Ij , k < p2,
L(pj, 1, k),
0 _< j < p , 0 I k < p z ,
U p j , p k 11,
0 5 j, k
(139)
In all we have p2(p2
+ p + 1) = p4 + p 3 + p 2
( 140)
such lines. By the same arguments as in the two-dimensional case, these lines cover ( Z / P ~ ) ~ . The Fourier transform F ( f )on these lines is given by the one-dimensional p2-point Fourier transforms
where g,(b,) is the periodization
"2-
I
36
M. AN, 1. GERTNER, M. ROFHEART, AND R. TOLlMIERl
Example:
-
y, = t ; y, = 0 * t; y3 = 0 t y, = t ; y 2 = 1 * t;y, = 0 . t y, = t ; y , = 2 * t ; y3 = 0 - t y , = t ; y , = 3 * t; y, = 0 . t
- t ; y3 = 1 .t y , = t ; y , = 1 - t ; y3 = 1 . t
y, = t ; y, = 0
y,=t;y2=2.t;y,=
1.t
yl=t;y,=3-t;y3=1't y, = t ; y , = 0 . t ; y, = 2 * t y, = t ; y , = 1 *
r;y3
= 2. t
yz = t ; y 2 = 2 * t ; y 3 = 2 . t
-
y, = t ; y , = 3 t ; y3 = 2 . t y , = t; y , = 0. t ; y3 = 3 . t
-
y , = t ; y , = 1 t; y , = 3 . t
y, = t ; y , = 2 . t; y3 = 1 * t y , = t ; y, = 3 . t ; y 3 = 1 ' t
The points generated by the following lines on plane y, = 2 mod(4) are
-
y , = 2 * t ; y, = 0 ' t ; y , = 1 t
(0, 0,0)(2,0, 1)(0,0, 2)(2,0, 3)
y , = 2 * t;y ,
(0, 0, 0)(2, 1, 1)(0,2, 2)(2, 3, 3)
-
=
1 * t ; y3 = 1 * t t; y , = 1 * t
(0, 0,0)(2, 2, 1)(0,0,2)(2,2,3)
~,=2.t;y2=3.t;y,=l.t
(O,O, 0)(2, 3, 1m 2,2)(2, 133)
y, = 2 t ; y ,
=2
- t ; y, = 1
' t ; y3 = 0 * t
(O,O, 0)(2, 1,0)(0,2,0)(2,3,O)
y, = 2 . t ; y 2 = 1 . t ; y 3 = 2 * t
(O,O, 0)(2, 1,2)(0,2, o w , 372)
y, = 2
The points generated by lines in plane y , = 0 mod(4) are
-
-
y , = 0 t; y , = 1 t; y , = 0 t
(O,O,O)(O, 1,0)(0,2,0)(0,3,0)
y , = 0 * t ; y , = 1 * t ; y3
=
(O,O,O)(O, 1,1)(0,2,2)(0,3,3)
y, = 0 * t ; y 2 = 1 * t ; y3
=2*
1* t
t
(O,O, O)(O, 1,2)(0,2,0)(0,3,2)
DISCRETE FAST FOURIER TRANSFORM ALGORITHMS
-
.
y , = 0 t ; y2 = 1 t ; y 3 = 3 t
(O,O,O)(O, 1,3)(0,2,2)(0,3,1)
y 1 -- O . t ; y z = 0 . t ; y 3 = l e t
(0,0,0)(0,0,1)(0,0,2)(0,0,3)
-
-
y, = 0 t ; y2 = 2 t ; y3 = 1 t a
37
(0,0,0)(0,2,1)(0,0,2)(0,2,3)
As we can see, first we apply algorithm 1 to compute the three-dimensional
Fourier transform on the planes y, = 1 mOd(4) and
y, = 3 mod(4),
and then we compute the three-dimensional Fourier transform on the plane y , = 2 mod(4),
and finally on the plane
y1 = 0 mod(4). As can be seen from the algorithm, the order of computation is totally irrelevant. This enables us to compute the 3-D Fourier transform first for the most important data, and then complete the computation.
F. Conclusion The N-dimensional Fourier transform with P R points in each dimension can be computed with
one-dimensional Fourier transforms on only P R points. Recall that the standard "row-column" approach would require NP"- l ) R one-dimensional Fourier transforms. The savings in the number of one-dimensional Fourier transforms in the proposed approach is
In our example, we have computed the three-dimensional Fourier transforms with 28 one-dimensional Fourier transforms four points each. In the standard approach one would require 48 one-dimensional Fourier transforms on four points each. Moreover, the proposed method computes the 3-D Fourier transform on any three points lying on the central line with only one 1-D Fourier transform. For example, to compute 3-D Fourier transforms on the plane y1 = 0 mod(PR),only six 1-D Fourier transforms are required.
38
M.AN, I. GERTNER, M. ROFHEART, AND R. TOLIMIERI
The proposed algorithmic structure is inherently suitable for parallel/ pipelined implementation in hardware. The ideas relating the line algorithm to the Nussbaumer-Quandalle FFT extend to the multidimensional P R x ... x P Rcase. The relationship is one of implementing the one-dimensional FFTs coming from the line algorithm by blocking together those rows whose index contains the same power of P, and erasing redundant multiplications by preadditions.
V. PARALLEL IMPLEMENTATION OF
THE LINEALGORITHM
The line algorithm (Gertner and Rofheart, 1989) can be computed using the following three-stage procedure: During the first stage, the data is downloaded, and the summations are performed. In the second stage, independent l-D FFTs are computed. The third stage is the data upload, which includes the removal of redundant data as well as a permutation. In this section, the realization of these stages is discussed with respect to a simple model of a parallel machine. A VLSI implementation can be found in Gertner and Shamash (1987). A . Machine Model
For a machine model we consider a multiprocessor externally connected to an 1 / 0 channel. The machine must support two communications functions, broadcast and report. 1. Broadcast: This function downloads data from the 1/0 channel to all processing elements (PE). 2. Report: This function allows a distinguished PE to upload data to the 1 / 0 channel.
This general model covers multiprocessor architectures from buses and trees to hypercubes. The algorithm presented below can be adapted to any of these machines.
B. Algorithm The computation of the summation stage is realized by the following procedure: 1. Assign the computation of h and the gj evenly among M or fewer PEs, where M is the number of lines covering Z/Px ZIP.
DISCRETE FAST FOURIER TRANSFORM ALGORITHMS
39
2. Broadcast the rows of the 2-D array of input data to the PEs. 3. The PE assigned h sums the elements of the ith row received and places that sum in h(i). For each gj assigned to a PE, the ith row received is rotated mi positions right and summed componentwise to the other received rows. Proceeding in this fashion, gj(d) will exist at location d of the first row received. The rotation is achieved by an address offset and modulo P address arithmetic; it requires no data movement. The next step is the computation of the P + 1 one-dimensional P point FFT. These are described by step 2 of the line algorithm and should be performed on the P points g j ( d ) and h(d), d = 0, 1, ..., N - 1 at the PE assigned to the summation. After this stage, the 2-D N x N DFT has been computed. The third stage is the data upload. There is some redundancy in the data, and the data are permuted among the PEs. Treatmg the transformed data as a 2 D array, each PE has the lines of output corresponding to the gj and h assigned to it. This permutation is given as: The 0th column of output is in the PE assigned the computation of h. The 0th row is in the PE assigned the computation of go. Both of these vectors are not permuted. The PE that computed gj contains all the points of the line of output described by L j = {(t,jf): t = 0, 1,. . . ,P - 1). That is, the Ith element of the vector produced by the P E that computed gj is the lth row and jlth column element of the 2-D DFT. While this output permutation is very regular, methods for handling it are machine dependent. For parallel machines where communications are through a host, the permutation is easy if either modulo address arithmetic or memory indirect addressing (table lookup) is supported. If sufficient control exists, an orchestrated upload can occur where the PEs report the transformed data in row major form. Below we present pseudo code for the parallel computation of the line algorithm. P + 1 PEs are allowed for, numbered 0, 1,. . . ,P. PEj is assigned gj for j = 0, 1,. ..,P - 1. PEP is assigned h.
/* Summation Stage */ fork=OP-1 broadcast row k to all PEs; if (k # 0) do-parallel: for i = 0 to P compute partial sum; end; end:
40
M. AN, I. GERTNER, M. ROFHEART, AND R. TOLIMIERI
/* I D P point D F T Stage */ do-parallel: for i = 0 to P perform 1D FFT(P): h(d), g j ( d ) + u d ; end; /* Data Upload Stage */ for ( k = 0) to P
/* column 0 */ if ( k = P ) for ( t = 0) to ( P - 1) F[O, t ] = report(PE,, u~);
/* row 0 */ if ( k = 0) for ( t = 1) to ( P - 1) F[t,O] = report(PEk,ut); if (1 < k < P ) for ( t = 1) to ( P - 1) F [ t , k t ] = report (PEk, u,); end;
VI. THEFOURIER TRANSFORM IN X-RAY CRYSTALLOGRAPHY A. Introduction
The role of the Fourier transform in x-ray crystallography has been discussed, in detail, in several books including Buerger (1942)and Blundell and Jahnson (1976). We will briefly set the stage for the applications of the results of the preceeding parts to the problem of computing the Fourier transform on data respecting crystallographic group symmetry. Although multiplicative structure can play an important role in this case (An and Tolimieri, 1989; Tolimieri and Bricogne, 1990), as in the preceding parts, such structure will play no role in this work. In normalized coordinates, an infinite crystal can be described by an electron density map p(&, x E R3 satisfying the periodicity condition
+
= p(x),
x E ~ 3 , Eg 2 3 .
(145)
The period lattice Z 3 is assumed to be the finest for which this periodicity condition holds. In this way, a basic pattern or motif is established in the unit cube and is regularly repeated by integer translations along each component throughout all R’.
DISCRETE FAST FOURIER TRANSFORM ALGORITHMS
41
As such we can expand p ( 5 ) as a Fourier series p(x) =
C
H(g)e-2‘ic.z,
no23
(146)
-
where g 5 = n l x l + n l x z + n 3 q . Mathematically, the “structure factors” H ( g ) ,n E Z 3 ,uniquely determine p ( 5 )but, in general, are not available after xray diffraction, which computes only the absolute values IH(g)l for n within some sphere 191 I A-’ decided upon by desired resolution A. The second restriction will be discussed further in the next section. The loss of phase information, the “phase problem”, greatly complicates the determination of crystal structure. During the last 40 years, many methods have been proposed to overcome the phase problem(Hauptman and Karle, 1953,1956,1957).The underlying basis for believing that a solution to this mathematically undeterminable problem exists is the existence of physical and chemical constraints that limit the possible outcomes. A summary of some of the various methods can be found in Sayre (1982). These have been successfully and routinely applied to small structure determination, but until recently have had sporadic success in large structure determination. Probabilistic direct methods (Bricogne, 1984) show great potential for extending the range of applications. Although these ideas are beyond the scope of this work, they invariably require large data sizes and many Fourier transform computations. Classically, the role of the Fourier transform has been limited to visualization purposes, and as such, infrequent and sporadic computations were required. One impact of these new methods has been to move the Fourier transform to center stage as a fundamental computational tool in phase determination and model refinement. In this part, we will incorporate crystallographic symmetry into the computation of the Fourier transform. The first step in this process is to decide on a sampling procedure that reduces the initial computation to one that is both relevant to applications and attainable by digital machines. This is by no means a simple task, involving many crystallographic and mathematical issues beyond the scope of this work, but we will begin by presenting a sketch of a mathematically consistent approach that at least describes how the finite Fourier transform enters.
B. Sampling Assuming normalized coordinates throughout, the electron density map p of a crystal is invariant under all integer translations, and as such can be viewed as a mapping defined on the torus R3/Z3.A lattice M is any subgroup of R3 of the form M
= 2x1 -
+ Z -X +~
2x 3,
(1 47)
42
M. AN, 1. GERTNER, M. ROFHEART, AND R. TOLIMIERI
where xl, x 2 , x3 form a basis of R3. A sampling lattice is any lattice M that contains Z 3 as a subgroup of finite index. Sampling p on a sampling lattice M uniquely determines a function j on the finite abelian group M / Z 3 by the rule
a=
j ( g= ) p(m),
+2 3
~ 1 2 3 .
(148)
We call f the sample function of p on M / Z 3 . The torus R/Z3 can be identified with the unit cube: 0 I XI,
x2, x3
< 1.
(149)
Under this identification, a sampling lattice M is any finite subset of the unit cube that is a group under addition taken mod Z 3 . The dual of a lattice M M 1 = (11'
E R3: 9 l - y= 0,
for all F E M }
(150)
is a lattice in R3.If M is a sampling lattice, then M I is a lattice contained in Z 3 as a subgroup of finite index. In fact, we have the group isomorphism 4:
( M / Z 3 )z ( Z 3 / M 1 ) * ,
(151)
4(a)(b)= e 2nim-n -
(152)
defined by the formula -3
+
where = m Z 3 E M / Z 3 and _b = g + M 1 E Z 3 / M 1 . The sample function f' of p on M / Z 3 is closely related to the periodized structure factor function F of H on Z 3 / M 1 , defined by periodizing relative to M I : F(b) =
1 w!!+ E l ) ,
mleMl
(153)
+
where b = g M I E Z 3 / M L . First we establish this relationship for sampling lattices of the form 1 M =-Z3. N
(154)
Since M / Z 3 z ( Z / N ) 3 ,the sample function f is defined on (Z/N)'. The dual lattice MI
= (NZ)'
(155)
satisfies Z 3 / M 1 z ( Z / N ) 3 with , the consequence that the periodized structure factor function F is defined on ( Z / N ) 3 .Since
DISCRETE FAST FOURIER TRANSFORM ALGORITHMS
43
by writing
-n = t, + N g , we have, with
0 I b,, b 2 , b, c N , ~ E Z ~ ,
(157)
m E ( l/N)Z3,
which can be rewritten as f(g) =
N
1
F(b)e-2”iW/N)
7
a E ( a V 3 .
(159)
b E (ZIN)’
This last formula expresses f E J ! , ( ( Z / N )as ~ ) the three-dimensional N x N Fourier transform of F E L ( ( Z / N ) 3 )By . Fourier inversion,
x
~(t= ,) C
j(g)e2”i(b-’!/N’ ,
b E (Z/N)j.
( 160)
qe(Z/NP
In general, if M is a sampling lattice and n,,
O l j < J
(161)
are coset representatives for Z 3 / M ’ , we have that each n E Z 3 can be written uniquely as = gj
+ m’,
0 I j < J, g ‘ MI ~
( 1 62)
and
This can be rewritten as
which expresses f E L ( M / Z 3 ) as the Fourier transform of F E L ( Z 3 / M 1 ) induced by the isomorphism 4 defined in (152). The problem of convergence has not been raised here, along with other issues including the role of distribution theory in making some of these ideas precise. As stated in the preceding section, x-ray diffraction makes available a limited number of structure factors If(!), (1115 A-’, neglecting loss of phase information. Periodization can be viewed as truncation, but then the corresponding “partial sums” compute only an approximation to electron density samples.
44
M. AN, I. GERTNER, M. ROFHEART, AND R. TOLIMIER1
C. Crystallographic Groups
Sampled crystallographic data is given as a finite three-dimensional array of numbers. Although computing the Fourier transform of such an array is independent of whether this array comes from sampled electron density or periodized (or truncated) structure factors, when data redundancy is introduced into the picture, this neutrality no longer holds. For this reason, computing the Fourier transform of sampled electron density is called the forward Fourier transform, while computing the (inverse) Fourier transform of periodized structure factors is called the backward Fourier transform. The crystallographic group r of a crystal with electron density &) is defined as the group of all Euclidean motions r of R 3 under which p ( 5 ) is invariant: p ( r 5 ) = p(x),
r E r, 5 E R3.
t 165)
Our primary goal is to use in Fourier transform computations the data redundancy present in sampled electron density resulting from this invariance. The classification of crystallographic groups was settled in about 1890, but their importance was not fully realized until the discovery of x-ray diffraction methods by crystallographers in 1919. In particular, these methods allowed for the determination of the crystallographic group of a crystal. Crystallographic group redundancy naturally rests in the space of the crystal, but by straightforward manipulation of Fourier series representation, it can easily be transformed into constraints on the structure factors. Ultimately, however, this redundancy must be placed on sampled electron density and periodized structure constant to have an effect on the digitial computation. In general, this process is straightfoward, but implies that for algorithm design, what is really required is a theory of crystallographic groups sitting inside, in general, group actions on Z I N , x ZIN, x Z f N , , rather than the standard theory of Euclidean actions on R3. As a class, crystallographic groups are distinguished by the infinite crystal assumption that a basic pattern or motif is regularly repeated throughout 54’ and the underlying atomic nature of electron density maps. In normalized coordinates, this motif is completely determined by its values in the unit cube and regularly repeated by integer translations along each component. We continue to assume that this motif is minimal in the sense that the integer translations contain all translations under which the electron density map is invariant. Let r be the crystallographic group of a crystal p ( x ) . The preceding discussion implies that the group of integer translations, denoted by Z 3 , is a maximal abelian normal subgroup of r. The atomic nature of electron density maps implies that provides a oneto-one correspondence between the ‘‘labeled’’atoms of the crystal. Since these
DISCRETE FAST FOURIER TRANSFORM ALGORITHMS
45
atoms cannot be arbitrarily close, r is a discrete subgroup of rigid motions, and the quotient space T/Z3 is finite. Consequently, satisfies the short exact sequence (166)
i + z 3 + 1 - - 4- + I ,
where G is a finite group and Z 3 is a maximal abelian normal subgroup of r. This is but one in a series of mathematical characterizations of crystallographic groups that have been given since classification. For others, see Cook (1988) and Henry and Losdale (1952). Since Z 3 is a maximal abelian normal in r,the finite group G acts faithfully by inner automorphisms o n Z 3resulting in an isomorphism of G into Aut(Z3), the group of all automorphisms of Z 3 . Identifying A u t ( Z 3 )with GL(3,Z ) , the group of all 3 x 3 integer matrices, we see that from the short exact sequence we can construct an isomorphism of G into GL(3,Z). In general, any finite group G admitting an isomorphism into G L ( 3 , Z )is called an abstract point group. The short exact sequence determines an integral equivalence class of isomorphisms of the abstract point group G into G L ( 3 , Z ) , where integral equivalence is defined as follows: Two isomorphisms R , and R , of G into G L ( 3 , Z )are integrally equivalent if for some M E GL(3,Z ) , R,(g) = M-'R*(q)M,
g E G.
(167)
The general classification of crystallographic groups can be carried out in three stages. 1. Determine the collection of all abstract point groups: finite groups G that can be embedded in GL(3,Z) in the sense that there is a group isomorphism R of G into GL(3,Z ) . 2. For each point group G, find the collection of inequivalent embeddings of G in GL(3,Z ) , where equivalence is defined by conjugation in GL(3,Z ) . 3. For each point group G and each equivalent class of embedding of G in G L ( 3 , Z ) ,find all inequivalent extensions of the action of G on Z 3 to R3, where equivalence is defined by conjugation in the affine group on R3. There are several ways of carrying out step 1 , but each proceeds by first showing that an element of finite order in GL(3,Z ) must have order I , 2,3,4, or 6. One shows that every finite subgroup of S L ( 3 , Z )is isomorphic to a group in the following list. 1. Z / N , N = 1, 2, 3, 4, 6. A cyclic group of order 1, 2, 3, 4, 6. 2. Z / N 3: Aur(Z/N),N = 3, 4, 6. The holomorph of a cyclic group of order 3 , 4 or 6. 3, 2 / 2 0 Z/2. The Klein four-group.
46
M. AN, 1. GERTNER, M. ROFHEART, AND R. TOLIMIERI
4. (212 0 212) c 213.
5 . (212 0 2 / 2 ) K Aut(ZI2 0 212). The holomorph of the Klein fourgroup.
In general, Aut( x ) denotes the automorphism group of the group x, and 3: denotes the semidirect product. The automorphism group Aut(Z/N), N = 3,4,6 is isomorphic to 212 and is generated by the automorphism a that maps g into -g. The automorphism group Aut(Z/2 0 Z / 2 ) is isomorphic to S,, the group of permutations of three letters that acts by these permutations on the nonzero elements of 2 / 2 0 2 / 2 . The subgroup of cyclic permutations A 3 is isomorphic to Z / 3 and forms the group in 4. The preceding list describes the abstract point groups that admit isomorphisms into SL(3, Z ) . To extend the list to all abstract point groups, we observe that a subgroup G c GL(3,Z ) not in SL(3,Z)has a nontrivial kernel K under the determinant homomorphism of G into Z / 2 identified with the set (1, - 1). CASE1:
- I 3 E G. G is the group generated by K = G n S L ( 3 , Z ) and - I , . The abstract point groups corresponding to this case have the form K 0 212, where K is an abstract point group in the list. CASE2: - I , # G. If G = K n yK is a coset decomposition, then G’ = K n -y K is a subgroup of S L ( 3 , Z )isomorphic to G . In all, up to group isomorphism, there are 20 abstract point groups: 11 abstract point groups from the list, plus an additional nine given by taking the direct product of groups from this list with 272. U p to integral equivalence, there are 73 equivalence classes of embeddings: the arithmetic classes. The 32 crystal classes appearing in standard crystallographic treatments are constructed by subjecting the arithmetic classes to equivalence by conjugation by rational 3 x 3 matrices with determinant 1. The following list provides representatives of the integral equivalence classes of embeddings of the 11 abstract point groups from the list. The term primitive that is attached to certain of these embeddings refers to those embeddings that are completely reducible over Z . The embeddings are described by the element, elements, or subgroups and elements that generate the images of the embedings. The names provided correspond to the standard names in “International Tables for Crystallography.” 1. 2,: (a) (primitive)
DISCRETE FAST FOURIER TRANSFORM ALGORITHMS
2. 2,: (a) (Primitive)
:)
-1
0
-1
(b)
1 -1
0
-1
0 1
-1
3. 2,: (a) (primitive)
(b)
: I
4. z,: (a) (primitive)
-1
8
(b)
I 0 -1
5.
z,:
-'j
-3
-1
I: 0
:) 0
(a) (primitive)
:
-1
1
47
48
M. AN, 1. GERTNER, M. ROFHEART, AND R. TOLIMIERI
6. Zj 4 Z , : (a) (primitive)
-%)
0
32,=3IV[-;
0
-1
(b) (primitive)
(4 32111=311 v
(-: 0
7.
0 0 -1
z, c z,:
(a) (primitive) 422,=41
[-:
0 0
V
0
-1
(-:
-8)
(-:
-:)
(b) 422,,=411
V
0 0
0
-1
8. 2 6 3: Z , : (a) (primitive) 622,=6,
v
I:-
0 0
0
-1
9. 2, @ 2,:
(a) (primitive) 0 0
-1
0 0
-1
[; :Iv[;-1 "i [; A)" is -; ")
DISCRETE FAST FOURIER TRANSFORM ALGORITHMS
(b)
1
22211 = 0 - 1
0
(4
-1
0
-1
-1
0
-1
1
222111= 0
(4
i'
222," = 0
0
10. ( Z , 0 Z , ) 3: (primitive)
-1
0
").[-:: :)
0
-1 0
-I
0
0
z,:
(a)
(b)
0
-1
2311 = 2 2 2 1 ~ ~
(4
1 1 . ( Z , 0 Z,) 3: s,: (primitive)
(a)
[
[-: 0
432, = 321
V
-1
-1
0 0
-1
-1
49
50
M. AN, 1. GERTNER, M. ROFHEART, AND R. TOLIMIERI
(b) 0
[
43211 = 3211 V -2
(4 432111 = 3211,
V
0
[-A
-:I -1
-1
0 -1 0
0
-1
For algorithm design incorporating point group crystallographic symmetry, say in Z I N x Z I N x Z I N , the concept of equivalence must be taken in GL(3,ZIN). Two distinct arithmetic classes R , and R2 of an abstract point group G can for some N become equivalent in G L ( 3 , Z I N ) :There exists M E G L ( 3 , Z I N )such that
(168) where R N ( g )is formed by projecting the coefficients of R ( g ) into Z / N . For example, (Rl)N(Q)
z,=“
= M(R2)N(SW1,
-j,
-1
9 E G,
- 11 0
:I,
-1
describe two distinct arithmetic classes of 212, but if N is odd, then 1
M“
-1
-1
where 1
0E
GL(3,Z I N ) .
In step 3, we fix an arithmetic class R: G + GL(3,Z) (169) and associate to each matrix R ( g ) , g E G , a “non primitive translation” T ( g ) such that
DISCRETE FAST FOURIER TRANSFORM ALGORITHMS
51
G acts on R3/Z3according to the rule S(g)x = Ng)x
+ T ( g )mod z3,
(171)
which by (1 70) implies
( 1 72) where we view T(g)as a vector in R3. We call the action of G on R3/Z3defined by S an affine action. The group r of rigid motions generated by the integer translation Z 3 and the products S(g,g,)
= S(S,)S(S,),
T(dR(S), is a crystallographic group satisfying
9EG
1 - + Z 3 4-+ G - , 1,
(173)
(174)
where the action of G on Z 3 is given by R . Up tb conjugation by A+(3), the group of affine motions having determinant + 1, there are 230 crystallographic groups. For a cohomological treatment of these results, see Janssen (1973). For example, consisting of the integral translations Z 3 and the action
is a crystallographic group.
VII. SYMMETRIC FOURIER TRANSFORMS A . Introduction
A complete package containing programs taking into account all the 230 space groups for variety of transform sizes is an exhaustive task. Such a package must include all the benefits of the fast multidimensional FFT algorithms, which include speed of computation, and rules for tuning computations to machine parameters with a minimum amount of programming effort. For the especially large data sizes encountered in crystallographic applications, automatic addressing is essential, along with measuring tools for judging the advantages of one algorithm over another. Cooley-Tukey methods compute an N x N x N FFT in N 3log N operations, and data flow structured by their stride permutation makes implementation relatively
52
M. AN, 1. GERTNER, M. ROFHEART, AND R. TOLIMIERI
straightforward. Direct use of crystallographic symmetry without fast routines requires N 6 / r 2 arithmetic operations, where r is the older of the cystallographic group. Removing all redundant arithmetic, if not accompanied by fast algorithms, results in a severe penalty in performance. Several methods have been introduced to keep all the advantages of fast algorithms while building in the advantages of smaller data sets required if data redundancy is used. B. Redundancy Conditions Consider a point group G and an isomorphism R of G into GL(3,Z ) . If p is an election density map invariant under the action G given by R, p ( R ( g ) x )= P ( x ) ,
g
E
G,x E R 3 ,
(175)
then from
where R(g)* = (R(g)')-', we have
H ( n ) = H(R(g)*c), 11 E Z 3 , g E G . (177) Sampling p on the lattice M = (l/N)Z3, the corresponding sample function f on ( Z / N ) 3E M / Z 3 satisfies
S(a) = f ( R , ( d ( a ) ) ,
a E ( Z / W 3 ,9 E G .
(178)
R,(g) E G L ( 3 , Z I N )is formed by reducing mod N the coefficientsin R(g).The preceding argument shows that the periodized structure constant function F on ( Z / N ) 32 Z 3 / M Lsatisfies
JW = F(RX(g)b), b E ( Z / W ,g E G,
(179)
where R$(g) = (RN')'(g). The lattice (l/N)Z3 can be replaced by any sampling lattice M invariant under R : R(g)M = M , g E G. ( 180) Denote the induced action of G on M / Z 3 by R,. The dual M L is invariant under R*. Denote the induced action of G on Z3/M' by R $ . Sampling p on M , crystallographic redundancy is expressed by the invariance of the sample
DISCRETE FAST FOURIER TRANSFORM ALGORITHMS
53
function f under R, and the invariance of the periodized structure constant function F under RG. We will extend these results to the case of any arbitrary crystallographic group action: S(g)x = R(g)x
+ T ( g ) mod Z 3 ,
g
E
G,
(181)
.lz E R 3 , Y
E
G,
( 182)
where G is a point group. We now have p ( W x + T M ) = dx),
which, placed into the Fourier series expansion, implies
H ( c ) = H(R(g)'fi)e2ff'!'T'g) , t+z3,gE~. Take any sampling lattice M . If
m)
E(1/~)23,
E
(1 83)
G,
then we can take M = (l/N)Z3. More generally, we can take any sampling lattice M invariant under R satisfying g E G.
T ( d EM ,
( 1 84)
Denote by S,(g) the induced action of S ( g ) on M / Z 3 ,and by R,*,the induced action of R* on M I . Sampling p on M , we have
f(~,(da)
=
f(d,
9 E G,a E W Z 3 ,
F ( b ) = F(R$(g)b)e2"ik""g),
b E Z3/M', g
(185) E
G.
(186)
-
T(g)is well defined. Since _T(g)E M , the inner product In general, the problem of designing symmetrized Fourier transforms can be stated in the following context. Let A be any finite abelian group and c$ an isomorphism of A onto A*. An affine action of A is a mapping of A of the form Sg = Re
+ r,
4 E A,
(187)
where R is an automorphism of A and T E A. The set of affine actions of A forms a group: If S,, S2 are affine actions
+ S2a = R2a + E , Sic = Rig
(S1S2)c = R l ( S 2 d
+ TI
( 1 88) ( 189)
( 190)
+ T2) + TI = R i R 2 a + RiT2 + TI, = RdR2a
S-la - R'-la I - -
l _ + T
(191)
54
M. AN, I. GERTNER, M. ROFHEART, AND R. TOLIMIERI
where I= -R;'Il. An affine action of a point group G and A is an isomorphism S of G into the group of affine actions on A:
%)a
= R(da
+ T(g).
(192) Denote by L ( A )the space of complex-valued functionsf on A. We say that f is S-invariant if
f(S(9)a)= f(_a), c? E A , 9 E G, (193) and denote by &(A) the space of all S-invariant functions on A. The Fourier transform F,(f) satisfies F+(f)(R'(g)b)= (T(g)¶(b(b)>-'F+(b)>
( 194)
where R'(g) is the transpose of R(g) relative to 4:
( W ' b Y +(a)) = ( b , $R(g)_a)),
_a¶ b E A .
(195) In An et a!. (1990), we raised the problem of choosing an isomorphism (b such that relative to a specified point group G and affine action S of G on A, we have Wg) =
W),g E G .
(196)
C. A Symmetrized Fourier Transform
Denote by L ( N ) the space of all complex-valued functions f on (Z/N)3. Take a subgroup G c GL(3,Z/N), and denote by L , ( N ) the space of functions f E L ( N )satisfying
f(s4 = f(d.
a E (Wv3, 9 E G.
(197)
(198)
Set
G* = { g * : 9 E G},
g* = ( 9 - ' ) I .
( 199)
Define the three-dimensional N x N x N Fourier transform FN relative to the standard inner product
( F , f ) ( b )=
C
f ( g ) ~ ~ ~ 'f~E ,L ( N ) , o = eZni".
(200)
a o ( ~ ~ ~ ) 3
From the preceding section, we have that FN is a linear isomorphism from L , ( N ) into L, * (N). We will now see how to remove all redundant computations. We will begin with several definitions. For a E ( Z / N ) 3 ,the set G(a) = {ga: 9 E G}
(201)
DISCRETE FAST FOURIER TRANSFORM ALGORITHMS
55
is called the G-orbit of _a. The subgroup of G Iso,(_u) = { g E G : yg
= _a>
(202)
is called the isotropy subgroup in G at _u. A subset X of ( Z / N ) 3is called a Gfundamental domain if the following two conditions are satisfied, 1. ( Z / N ) 3= UClpXG(a), and 2. for distinct & _b E X , G(g)n G(b) = 4 (empty set).
If .f E L,(N), then f is completely determined by its values on any Gfundamental domain X, since f takes on the same value at each point of a Gorbit and the G-orbits through points of X cover ( Z / N ) 3 .Denote the elements of X by 51, 5 2 , . . . ,X R
(203)
and set
and, since f is G-invariant,
If Y is a G*-fundamental domain having elements ?l'+,..
. f
y.9
then FN(f ) is completely determined by its values on Y: R
M f ) ( y s ) = rC= l f(xr)C(s, r), where
Although a computation has been obtained without redundant calculation, the direct application of a fast algorithm is not immediately apparent. In the following sections, fast algorithms computing the Fourier transform on symmetrized data will be obtained with no or little computational redundancy.
56
M. AN, I. GERTNER, M. ROFHEART, AND R. TOLIMIERI
D. Symmetrized Good- Thomas Algorithm Suppose an isomorphism 4 of A onto A* and a subgroup B of A exist such that A = B (3 B'.
(210)
First we assume that B and B' are R-invariant, where R is an isomorphism of a point group G into Aut(A). We can apply the Good-Thomas algorithm to the computation of F,(f), f E L,(A). The R-invariance of B and B1 will be used to modify the Good-Thomas algorithm to take advantage of R invariance of data. The results of this section generalize the orbit exchange method introduced by Myoung An in her thesis (An, 1988)in collaboration with J. Cooley. See An et al. (1990a, 1990b) for two distinct applications. Assume f E L ( A ) is R-invariant, and define the b'- E B' slice of f by
The isomorphism q5 induces an isomorphism of B onto (A/B'-)*.By condition (210) we can identify A/B' with B and take q5 as an isomorphism of B. Denote the corresponding Fourier transform on L(B)by F, .The action of R on B' will be denoted by Rsl. Set ISO,(b')
=
(X
E
G:REi(x)b'
=bl},
(212)
and call IsoG(bl)the RBI-isotropysubgroup in G at bl. If G = h G ( b ' ) ,then b' is fixed by the action of R. Denote the set of all b'- E B fixed by the action of R by F,. If b1 E F,, then
and consequently f b E~ L(B) is R,-invariant. If a Fourier transform routine on R,-invariant data is available, then Fl(fal) can be computed using this routine. In many applications, however, the percentage of savings using such a routine is small, and in this work, we will assume that F,( fa.)@) has been computed on all of B. Suppose that b' is not fixed under the action of R. The set of all such points, the movable points, will be denoted by M. A subset M, of M can be
57
DISCRETE FAST FOURIER TRANSFORM ALGORITHMS
found such that
RBl(x)MO, and 1. M = UxGG 2. if a and b are points in M, and _h
=
R,~(x)gfor some x E G, then b
=
a.
M , is a fundamental domain for the action of RBI on M , and X = F, u M , is a fundamental domain for the action of R B I on B'. Let O(_h') denote the orbit of b' E B' under RBl:
O(b') = {RBl(x)b': x E C } .
(214)
The set of distinct elements in O(b') can be given as RB~(xj)bl,
0 5 j < J,
(215)
where the xi, 0 5 j < J , determine a system of representatives of C/1soG(b1). Let 4 be the RE1(xj)b' slice of f .The R-invariance of f implies Fl(J)(b) = Fl(fi)(R(Xj)'bh (216) and Fl(io)(b),_b E B, determines Fl(A)(b),b E B,0 Ij c J . Consequently, the first computational stage of the Good-Thomas algorithm is completed once we have computed Fi(fbl)(b),
b1 E F u M o , b E B.
(217)
Although b1 E M, is not fixed under RBI, it is fixed under Iso,(~') = Gl , and the bl-slice is G,-invariant. We can take davantage of this data redundancy if a R,(G,)-invariant Fourier transform algorithm is available. As before, we prefer to ignore this savings for uniformity. Suppose we have computed F , ( f b l ) ( b ) for b' E F u M, and b E B. In the second computational stage of the Good-Thomas algorithm, we form gb(bL)= Fl(fbl)(b)
(data permutation)
(218)
and compute F2(gb)(_hL). Arguing as above, F2(gb)(bL),_b E B, b' E 'B is known if F 2 ( g b ) ( b L ) is known forb in an &-fundamental domain Y in B. It follows that we need to have available gb(_b'), b E Y, b' E B'. The values of gb(_b') on b' E X , _b E B determine the values of gb(_bL),& E Y, bL E B by the formula gRqx)b(b') = g b ( R ( X ) b l ) .
X
E
G , b € B,b' E B'.
(219)
This discussion can be generalized as follows. An affine map S of A diagonalizes relative to the decomposition (210) if we can write
S ( b + b l ) = S,(b) + S:(bL), where S1 is an affine map of B and S: is affine map of B'.
(220)
58
M. AN, I. GERTNER, M. ROFHEART, AND R. TOLIMIERI
Suppose S is now an affine action of the group G on A. We say that S diagonalizes relative to this decomposition if we can write
W ( b+ b l ) = S,(x)(b)+ S;(x)(L?1)9
(221)
where S , is an affine action of G on B and S' is an affine action of G on E l . The preceding discussion can be generalized to this setting with slight modifications. The fundamental domain X = F, u M o in E l is now taken with respect to the action S;, with Fo the subset of B1 fixed under this action and M o the fundamental domain of the movable set relative to this action. For bL E F,, the slice fbl E L ( B ) is S,-invariant, and a Fourier transform routine on S,invariant data in L ( B )can be used. The Sf-orbit in E l of _bl E M , is given by
,
where x,,. . . ,xJ - is a system of representatives of G/ls,,(bL) action Si. If with 4 is the S;(xj)bl slice off, we have
relative to the
Fl(J)(b) = F,( f o )(S+(xj)r(b)),
(223)
extending formula (2 16). It follows that F,(f,)(b), b E B, determines F,(&)(b), L, E B. Reasoning as before, we can build a Good-Thomas algorithm for Sinvariant data. E . Symmetrized Multidimensional Cooley- Tukey
Let S denote an affine action of a pojnt group G on a finite abelian group A . Fix an isomorphism 4 of A onto A*. Relative to any subgroup B of A, we have constructed Cooley-Tukey algorithms computing F,(f) for f E L(A).If we take B to be S-invariant, then we can modify the algorithm to take advantage of S-invariance of data. Suppose now f E L,(A). Consider coset representatives of A / B , GO,
gl?...,@J-l.
(224)
In the first step, we form gj(b) = f ( G j
+ b ) E L(B x
AIB)
and take the Fourier transform F , : L ( B )+ L(A/B1)
induced by 4. Since B is S-invariant if the coset Bj = g j + B has a fixed point
DISCRETE FAST FOURIER TRANSFORM ALGORITHMS
59
under S, we can choose g j such that S(x)gj = g j ,
x
E
G.
(227)
Since f’ is S-invariant, we have
implying gj E L,(B). If a routine for computing the Fourier transform of Sinvariant data on B exists, then it can be used to compute F,(gj). It can happen that Bj is S-invariant but has no S-fixed point. The corresponding gj is not invariant under the S action on B. Data redundancy may still occur if gj is invariant under the S action of B restricted to some subgroup of G. The greatest savings will occur when the coset Bjis not S-invariant. In this case, define the collection of cosets C ( j )= { S ( x ) g j
+ B : x E G}.
(228)
The data on the coset g j + B determines the data on each coset in C(j ) . Consequently, the Fourier transform off’ decimated to Bj determines the Fourier transform of ,f decimated to any coset in C ( j ) . In this way the cosets A / B are partitioned into three kinds: 1. Those having a fixed point under S. A routine computing the Fourier transform on S-invariant data on B can be used. 2. Those fixed under S but having no fixed point under S. 3. Subsets of cosets formed by the action of S on a coset not fixed under S.
Assume Bj is a coset not fixed by S. Choose elements in G e = x0, xl,. . . , x ~ - ~ ,
(229)
where e is the identity in G such that the cosets S(xm)gj
+ B,
0I m <M
(230)
describe all the distinct cosets in C ( j ) . In this discussion, for notational simplicity, we will redefine gm as
60
M. AN, I. GERTNER, M. ROFHEART, A N D R. TOLIMIERI
From the S-invariance off,
= F,(g,)(S(x,)'c).
Consequently, once we compute F,(y,) on A/B'-, we have computed F,(g,) on AjB', 0 I m < M by data permutation by S(x,)' on A / B L . In the first stage of the generalized Cooley-Tukey algorithm, the number of F, computations has been reduced from J, the order of AIB, to the number of coset classes contained in the partition. As in the preceding section, we can generalize the discussion and remove the restriction that B is S-invariant. Details appear in An and Tolimieri (1989). F. Implementation Example 1. Symmetrized Row-Column Method
The primitive crystallographic groups in the monoclinic and orthorhombic crystal classes (Henry and Losdale, 1952) can be incorporated into the row-column method. We will call these groups "diagonal" groups. Of the 230 crystallographic groups, 40 are diagonal. The property of the diagonal groups that admit the row-column method is that they act diagonally on Z I N , x Z j N , x Z I N , , in the sense of Section E. Let D denote the group of induced actions on ZIN, x Z I N , x Z I N , of an affine group. Then D acts diagonally if the following conditions holds. For x j E ZIN,, j = 1,2, 3. D ( x l , x 2 , x 3 )= ( D ( 1 ) ~ I , D ( 2D)' 3~ 'z~,3 ) ,
(232)
where D ( j ) is an affine action of the group on Z/N;.. By the results of Section II.D, we have F",)
0 F(Nz) 0 = Pi(ININIO
fw,) F(N3))pz(INlN3O
F(N2))p3(1N2N3
O F(Nl)), (233)
DISCRETE FAST FOURIER TRANSFORM ALGORITHMS
61
where PI, P2 and P3 are stride permutations. To incorporate the diagonal symmetry groups, we will determine three fundamental domains and mappings bet ween them. For d E D, the action of d on ( ~ 1 , . ~ 2 , . ~E3 )Z / N , x Z f N , x Z / N 3 can be described by
d: (
~ 1 ~ x 2M~( d~ l x 3 1)
+
t i , d2x2
+~
2 d ,3 x 3
+ t3),
(234)
where ( t , , t 2 , t 3 ) ~ Z / Nx IZ I N , x Z I N , and d , = f 1 , j = 1, 2, 3. Let f b e a D-invariant function on Z I N , x Z I N , x Z / N 3 . Define for y, E Z I N , , y 2 E Z I N , and y 3 E Z I N 3 ,
f 2 ( ~ 1 , y 2 , ~= 3 )
C
f’(yl,x2,~3)e-(2ni~Nl)x~yz,
C
f’(~l,y2,~3)e-(Zri’N~)x~y3.
x~EZ/N~
f 3 ( y l , y 2 , y 3 )=
XJEZ/N~
The function f3 is the Fourier transform of J: The D-invariance off gives rise to redundancy conditions of j’,, j 2and j 3 .From
we have
Let X3/D be a fundamental domain of D‘3).Decompose X3/D into disjoint subsets by the isotropy subgroups. Denote the disjoint subsets of X3/D and the associated isotropy subgroups by X,,/D, X,,/D, . . . , X3,/D and Iso,, , 1 ~ 0 3 2 , .. . , I s o ~ ~respectively. , For 1 5 k s K , x~k/lso,k is a fundamental domain of the group Iso,, acting on the space Z / N 2 . Then
u K
(X,?k/1S03k
x3k/D)
k= 1
is a fundamental domain of Dt2’ x D(3’ acting on Z I N , x Z / N 3 . Denote by D, the redundancy conditions on Z I N , x Z / N 2 x Z I N , for the function f,.( D l is a group obtained by setting t , = 0.) Decompose X,/D,, a D,-fundamental domain in Z I N , by the isotropy subgroups Iso,,, Iso,,, . . , ,
62
M. AN, I. GERTNER, M. ROFHEART, A N D R. TOLIMIERI
I s o l M ,into thedisjoint subsets X I , , X I 2 ,..., X1M. Then
u M
(x3/1s01m
xlm)
m= 1
is a D,-fundamental domain in Z I N , x Z I N , , where X 3 / I s o l m ,1 I rn 5 M is an Iso,,-fundamental domain in Z / N 3.In exactly the same way, we can obtain a fundamental domain of D , acting on Z / N l x Z / N , of the form
u L
(X,/ISO,l
x XZJ,
1=1
where D2 is the group of redundancy conditions on Z / N , x Z / N , x Z / N 3 for the function f,,X , , , X2,,. . . , X , , are the disjoint subsets by the isotropy subgroups of X J D , . The sets
u L
AS
Z,”,
x
I= 1
( X ~ J ~ Sx OX3,), ~,
M
AS1 = Z/N2
X
AS, = Z / N 3
X
(J (x3/IsOlm X
m=1
Xlm),
L
(xi/lSOzi X
X~I)
1=1
contain D, D1, and D2 fundamental domains, respectively. The functions f and fi are determined by their values on AS. The functions f l and f2 are determined by their values on ASl, and f2 and f3 are determined by their values on A S 2 .Thus the function f l defined on A S can be reindexed by A S , . The function f, on AS, can be reindexed by A S 2 . Let the order of
u K
k=l
(X,lI~O,k
x X3A
M
u L
( X l l l ~ 0 , lx XZI)
I=1
be o l , o,, and 0 3 , respectively. Then the Fourier transform of a D-invariant function f is obtained by
where O E X , and O E X , are the reindexing (permutation) matrices for f l and f2.
DISCRETE FAST FOURIER TRANSFORM ALGORITHMS
63
Notice that the sets AS, AS1, and AS2 are in general larger than the fundamental domains of the respective groups. Precise fundamental domains that are contained in these sets can be determined by iterating the decomposition scheme a step further: Consider the set
u K
x X3d
(X,/l%k
k=l
We can decompose this set into disjoint subsets by the isotropy subgroups. Denote the disjoint subsets by (x2
9
x 3 ) 1 > (x2
7
x3)2
5 . .
. *(x21
X3)K9
and the respective isotropy subgroups by Iso,, Iso,, . . ., Iso,. Then the set
u K
(xl/lsOk
9
x2)k)
k= 1
is a preciie D-fundamental domain on Z / N l x Z / N 2 x Z / N , . The computation of the Fourier transform of a D-invariant function f can begin by computing f l by Y
where ok is the order of the sets ( X , , X 2 ) kand , l$sok(Nl) is the Iso,-symmetrized Fourier transform. In much the same way, we can replace the factors
z, 0
I,, 0 F W 3 )
F(N2)l
by factors including symmetrized N2- and N,-point Fourier transforms.
2. Symmetrized Good- Thomas Algorithm
-
For N = p q, ( p , q ) = 1, the Fourier transform of a function f on ( Z / N ) 3 can be computed by P ( F ~ ( P0 ) 4 3 ) ( 4 3 0F3(g))Q, where F3(1) denotes the three-dimensional I-point Fourier transform, and P and Q are permutation matrices. Let G be a finite subgroup of GL(3,Z).G acts on ( Z / N ) 3by the matrix multiplication modulo N . By the Chinese remainder theorem, we have a ring isomorphism:
( Z / W 3 = (Z/pI3 x (z/qj3.
(235) This ring isomorphism defines the action (G,,,G,) on ( Z / p ) j x (Z/q)’ corre. f on ( Z / N ) 3is identified by sponding to the action of G on ( Z / L V )A~function . f be the same isomorphism with a function f ( . x p , x g )on ( Z / P )x~ ( Z / q ) 3 Let
64
M. AN, I. GERTNER, M. ROFHEART, AND R. TOLIMIERI
G-invariant. Define for (y,.y,) E ( Z / P )x~( Z / q ) 3 ,
f2(ypry,) =
1
j,(y,,,
x,)e-(Zni’q)<xs,ys).
x q E (zm3
is the Fourier transform of the function f. From the G-invariance of the function f,we have for g,,, gq E (G,, G,),
f2
fl(Y,>
xq)
= fl(g;YpY
(236)
9qXq)r
(237)
fZ(Y,,Y,) = f2(g;Y,,g:Y,)Y
where g: = (gal)‘, g: = (9;’)‘. , by X,/G,* a Gp*Denote by X,/G, a Gq-fundamental domain in ( Z / q ) 3and fundamental domain in ( Z / p ) j .Set A , = (Z/PI3x Xq/G,7 A , = X,,IGf x ( Z / P ) ~ . A , contains a (G,, G,) and a (G;, G,)-fundamental domain, and A , contains a (G;, G,) and a (G;, G;)-fundamental domain. The function f , on A , can be reindexed by the set A , by O E X . Let 0, and 0, be the order of the sets X,/G, and X,/Gq, respectively. The function f 2 is then computed by
P’(F3(q) 0 L p ) O E X ( L q@ F ~ ( P ) ) Q ’ ~ where P‘ and Q’ are permutation matrices. The most savings that can be gained by the G-invariance of the function f is the reduction of the data and computation by a factor of o(G),the order of G. These savings will be obtained in the case p 3 / 0 , and q3/0, are “close” to o(G). In case further reduction is desired (we are obtaining the reduction at the cost of OEX), we can follow one of the following two procedures: 1. Suppose p 3 / o , > q3/o, and p
-
= p1 p 2 , ( p l
- p,)
=
1. Then
( Z / d 3 = ( Z / P , )x~( Z / p d 3 . Decompose X,/G, into disjoint subsets X,,, Xq2,. .., Xqk by the isotropy subgroups, Isoq,, Isoql,.. ., Iso,,. Decomposing the fundamental domain in ( Z / P ~by) the ~ groups Iso,, 1 I 1 I k, into X,,/Iso,,, we have that k
( z / P ,x ) U ~ (x,,/lso,, x I=,
x,,)
(238)
contains a (G,,, GP2,GJ-fundamental domain in ( Z / P ~x )(~Z / P ~x )( Z ~/ q ) j . We can compute the Fourier transform of a G-invariant function using (238)
DISCRETE FAST FOURIER TRANSFORM ALGORITHMS
65
as an initial fundamental domain and iterating the procedure described for the two prime factorization case. 2. Decompose X,/G, by the isotropy types as before. The set k
(X,,/~S~,tx X,,)
(239)
I=1
is a precise (G,,,G,) fundamental domain in ( Z / P ) x~ (Zly)’. Let F,,,Jp) be an Iso,,-symmetrized Fourier transform. Then f l can be computed on (239) by
where o(X,,) is the order of the set X,,.In a similar way, the factor (F3(4)0
can be replaced by
REFERENCES An, M.(1988).“Group Invariant Finite Fourier Transforms.” Ph.D. thesis, The Graduate Center of City University of New York. An, M., and Auslander, L. ( 1987).“Fourier transforms that respect crystallographic symmetries.” IBM J. Res. Duu. 2(31), 213-223. An, M., Auslander, L., and Cook, M. (1990b). “Programming Methods for Crystallography.” Proceedings of SPIE, 1154, pp 17-28. An, M.,and Tolimieri, R. ( 1989). “Generalized orbit-exchange and multidimensional CooleyTukey algorithm.” Submitted. An, M., and Tolimieri. R. (1989).“Notes on FT computation for P 3 symmetry. Working paper, available at Center for Large Scale Computation, CUNY. An, M., Tolimieri, R.,and Cooley, J. (1990a). “Factorization methods for crystallographic FT.“ A d a Appl. Math., 11 (1990)pp 358-371 An, M., Auslander, L., and Cook, M. (N90b). “An algorithm for rnonoclinic and orthorhombic groups.” Submitted. Auslander, L., and Cook, M. (1990).“An algebraic classification of 3-dimensional crystallography group.“ Adu. Appl. Math., in press. Auslander, L., Feig, E., and Winograd, S. (1983).“New algorithm for the multidimensional discrete Fourier transform.” l E E E ASSP-31. Auslander, L.. Jonson, R. W., and Vulis, M.(1988). “Evaluating finite Fourier transforms that respect group symmetries.” Acra Crysf. AM, 467-478. Blundell, T. L., and Jahnson, L. N. (1976).“Crystallography.” Academic Press, London.
66
M. AN, I. GERTNER, M. ROFHEART, AND R. TOLIMIERI
Bricogne, G. (1984).“Maximum entropy method.” Acta Cryst. Bricogne, G., and Tolimieri, R., (1990). “Symmetrized FFT algorithms.” In “IMA Proceedings on Signal Processing.” Springer-Verlag, in press. Buerger, M. J. (1942).“X-ray Crystallography.” John Wiley and Sons, New York. Cochran, W. T. (1967). “What is the fast Fourier transform?” l E E E Trans. Aud. and Elec. AU15(2), 45-55. Cook, M. (1988).“Crystallographic Space Groups and Algorithms.” Ph.D. thesis, The Graduate Center of City University of New York. Cooley, J. (1 987). “The rediscovery of the fast Fourier transform algorithm.” Sixth International Conference on Fourier Transform Spectroscopy, Technical University of Vienna, Austria, August. Cooley, J., and Tukey, J. W. (1965).“An algorithm for the machine calculation of complex Fourier series.” Math. Comput. 19,297-301. Ten Eyck, L. F. (1973). “Crystallographic fast Fourier transform.” Acta Cryst. A29, 183-199. Gentleman, W. M., and Sande, G.(1966).“Fast Fourier transform algorithm-for fun and profit.” AFIPS Joint Computer Conference 29, 563-578. Gertner, I. (1987).“Radon transform over finite fields and its application to signal processing.” EE report, Technion-Israel Institute of Technology, p. 649. Gertner, I. (1988). “A new efficient algorithm to compute the two dimensional discrete Fourier transform.” IEEE Trans. ASSP 36(7), 1036-1050. Gertner, I., and Rofheart, M. (1989). “A parallel algorithm for 2D DFT computation with no interprocessor communication.” Submitted for publication. Gertner, I., and Shamash, M. (1987).“VLSI architectures for multidimensional Fourier transform processing.” IEEE Trans. Cornp. C-36(1l), 1265- 1274. Hauptman, H., and Karle, J. (1953). “Solution of the Phase Problem,” ACA Monograph No. 3. Policrystal Book Service, Pittsburgh, Pennsylvania. Hauptman, H., and Karle, J. (1956). “Solution of the phase problem.” Acta Cryst. 9,45-55. Hauptman, H., and Karle, J. (1957). “Solution of the phase problem.” Acta Cryst. 10, 515. Henry, N. F., and Losdale, K. (1952). “International Tables for X-ray Crystallography.” The Kynoch Press, Birmingham, United Kingdom. Huang, C. H., and Johnson, J. R. (1990). “A tensor product normulation of Strassen’s matrix multiplication algorithm.” Preprint. Janssen, T. (1973).“Crystallographic Groups.” North-Holland, Amsterdam. Johnson, J., Johnson, R. W., and Rodriguez, D., Tolimieri, R. (1990). “A methodology for designing, modifying, and implementing Fourier transform algorithms on various architectures.” IEEE Transactions on Circuits and Systems, 9, (4), pp 449-500. Johnson, R. W. (1990).“Algebraic classification of Bravais lattices.” Adu. A p p l . Math., in press. Korn, D., and Lambiotte, J. (1979).“Computing the fast Fourier transform on a vector computer.” Math. Comput. 33( 147): 977-992. Lu, C. (1988).“Fast Fourier Transform Algorithms For Special N s and The Implementation on VAX.” The City University of New York. Nussbaumer, H. (1982). “Fast Fourier Transform and the Convolution Algorithms.” SpringerVerlag. Pease, M. C. (1968).“An adaptation of the fast Fourier transform for parallel processing.” JACM 15(2), 252-264. Rodrigues, D. (1989). “On Tensor Products Formulation of Additive Fast Fourier Transforms Algorithms and Their Implementations.” Ph.D. thesis, E. E. Department, City University of New York. Sayre, D. (1982).“Computational Crystallography.” Oxford Science Publications, Oxford, United Kingdom.
DISCRETE FAST FOURIER TRANSFORM ALGORITHMS
67
Silverman, H. F. (1977). “An introduction to programming the Winograd Fourier transform algorithm [WFTA].” I E E E Trans. Comput., Speech und Signal Processing ASSP-25(2), 152-165. Temperton, C . ( 1983). “Self-sorting mixed-radix fast Fourier transforms.” J. Comp. Phys. 52( i), 1-23. Temperton, C. (1989).“Nesting strategies for prime factor FFT algorithms.” J . Comp. Phys. 82(2), 247-268. Tolimieri, R. (1986). “Multiplicative characters and the discrete Fourier transform.” Ado. A p p l . Math. 7, 344-380. Tolimieri, R., An, M., and Lu,C. “Algorithms for Discrete Fourier Transform and Convolutions.” Springer-Verlag, 1989. Tolimieri, R., and Gertner, I. (1989).“Fast algorithms to compute the multidimensional discrete Fourier transform.” S P I E Real-Time Signal Processing X I I , 1154, 132- 146.
This Page Intentionally Left Blank
.
ADVANCES IN ELECTRONICS A N D tLECTRON PHYSICS VOL XO
Number Theoretic Techniques in Digital Signal Processing GRAHAM A . JULLIEN Department OJ Electrical Engineering Llniuersity oj' Windsor Windsor. Ontario. Canuda
1. Introduction
. . . . . . . . . . . . . . . . . . . . . . . . .
A . Finite Impulse Response Digital Filtering Using DFTs . . . . . . . . . I1. Discrete Fourier Transforms . . . . . . . . . . . . . . . . . . . A . Inverse and Convolution Property . . . . . . . . . . . . . . . . 111. Fast Fourier Transform (FFT) Algorithms . . . . . . . . . . . . . . A . Decimation in Time FFT Algorithm . . . . . . . . . . . . . . . B. Useof the DFT in Indirect Computation of Convolution . . . . . . . . IV. Finite Algebras . . . . . . . . . . . . . . . . . . . . . . . . A . Rings and Fields . . . . . . . . . . . . . . . . . . . . . . V . Number Theoretic Transforms . . . . . . . . . . . . . . . . . . . A . Fermat Number Transforms . . . . . . . . . . . . . . . . . . B. Indirect Convolution Using General NTTs . . . . . . . . . . . . . C. NTTs over Extension Fields . . . . . . . . . . . . . . . . . . D. Quadratic Residue Rings . . . . . . . . . . . . . . . . . . . E . Multidimensional Mapping . . . . . . . . . . . . . . . . . . F . Extension or Dynamic Range . . . . . . . . . . . . . . . . . . G . Binary Implementations. . . . . . . . . . . . . . . . . . . . VI . Residue Number Systems . . . . . . . . . . . . . . . . . . . . A . Algebraic Structure of General Residue Systems . . . . . . . . . . . B. The Chinese Remainder Theorem . . . . . . . . . . . . . . . . C. The Associated Mixed Radix Number System . . . . . . . . . . . . D. Overflow Detection, Scaling, and Base Extension . . . . . . . . . . . V11. Implementation of NTTs using the RNS . . . . . . . . . . . . . . . A . Multiplication Using Index Calculus . . . . . . . . . . . . . . . VI11. VLSI implementations of Finite Algebraic Systems . . . . . . . . . . . A. Modular Arithmetic Elements . . . . . . . . . . . . . . . . . . B. A Generic Residue Processing Cell . . . . . . . . . . . . . . . . C. VLSI Implementations for ROM Generic Cells . . . . . . . . . . . . IX . Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . X . Acknowledgements. . . . . . . . . . . . . . . . . . . . . . . XI . References . . . . . . . . . . . . . . . . . . . . . . . . . .
70 73 75 75 77 77 79 80 80 84 85 92 95 110 115 118 119 121 122
124 126
127 131
133 140 140 141
151 159 160 160
69
.
Copyright ' 1991 hy Academic Press Inc All rights of reproduction in any Corm reserved. I S I ~ N0-12-0146x0-0
70
GRAHAM A. JULLIEN
I. INTRODUCTION This chapter discusses the application of number theory to the intensive computations required in many digital signal processing (DSP) systems. Number theory was always looked upon as one of the last vestiges of pure mathematical pursuit, where the idea of applying the theories to practical purposes was irrelevant. Over the last few decades we have been discovering very practical applications for number theory; among these are applications in coding theory, communications, physics, and digital information (Schroeder, 1985). Our specific interest in this work is the application of number theory to the mechanics of arithmetic computations themselves. Modern-day computer technology is heavily dependent on the manipulation of numbers; in fact, the automatization of arithmetic operations was the quest of all the early computer pioneers, from Pascal to Babbage to von Neumann. The involvement of number theory in this endeavor, however, has strangely been very limited. The use of number theory in helping to perform the basic arithmetic operations has not been accepted by the commercial world. Perhaps the reason for this is the pervasion of the binary number system in the mechanization of computer arithmetic. A large number of hardware designers have little or no knowledge that alternative ways of computing arithmetic exist! There is one field in which the high-speed arithmetic manipulation of numbers is of paramount importance. This is the field of digital signal processing (DSP),and it is here that alternative arithmetic hardware may find a niche. W h y is it that some DSP systems are so computationally intensive? Typically the DSP system we are discussing is involved with the processing of streams of data that are indeterminately long; the requirement of the system is to produce processed output data at the same rate that the input data is entering the processor. This is referred to as real-time processing. Often the arithmetic manipulations are simple in form (cascades of additions and multiplications in a well-defined structure), but the numbers of operations that have to be computed every second can be enormous. It will be constructive at this point to consider an example to illustrate the processing power required by a typical DSP system. Let us consider a basic problem: the filtering of television pictures in realtime. We will assume that a two-dimensional filter is required and is given by the following equation:
Here, wi. is the filter kernel and represents the weights on a two-dimensional averaging procedure around each output sample (y,,,"). The averaging is
NUMBER THEORETIC TECHNIQUES IN DIGITAL SIGNAL PROCESSING
7I
based on a neighborhood of input samples {xi.j}, that extends five samples in each dimension and is centered on the output pixel position. The double summation, in fact, represents two-dimensional convolution. Convolution is a basic operation to all linear signal processing problems since it represents the mapping function of an input signal to an output signal through a linear system. The real-time constraint is important since, as we will see, an extremely large computational requirement will be necessary in order to construct the filter. In order to see the speed of operation required by our filter, we will perform some elementary calculations based on typical requirements for the filter. We assume a 525-line scan with 500 picture elements (pixels) in each line. We also assume that each primary colour is to be filtered independently with the 5 x 5 filter. With an interlaced system we need to filter 30 images every second. Each output pixel (made up of three primary colour sub-pixels) requires 75 multiply/accumulate operations (M/A ops) to implement the filter. There are 262,500 pixels in each image, and so 1.96875 x lo7 M/A ops are required for each image. Since we require to filter 30 images per second, the total number of MIA ops per second is 5.9 x lo8. Assuming that an M/A op is considered a basic operation in the processor used to filter the image, then we need a machine that delivers 590 mega-operations per second (MOPS). In these terms, we require a supercomputer to deliver this performance! Clearly the implementation has to be somewhat different from a large mainframe supercomputer-it would certainly be difficult to fit one into a normal-sized television set! The filtering operation has the advantage of not requiring the full features of a general-purpose computer (the filtering operation does not have to be performed within the typical multi-user, complex operating system environment of a mainframe machine), and the computations are simply repeated arithmetic operations in a predetermined structure. In fact, it is rather pointless to talk about the computational requirements in normal general computer terms (MIPS, MOPS, MFLOPS, etc.), but rather to talk in direct terms of the data flow requirements-that is, the throughput rate of the filter in samples per second (in this example, a sample is a pixel). For our example filter we have to filter pixels at a rate of 7.875 x lo6 pixels per second. If we provide arithmetic hardware that contains 75 multiply/accumulate circuits, then each circuit has to operate within 120 ns. If we use currently available integrated circuits that operate within 60 ns, then we can build a circuit with only 38 integrated circuits for the basic arithmetic operations, and perform two operations, sequentially, with one circuit. We have therefore reduced the original requirement of a supercomputer to a system containing a relatively small number of chips. It is the purpose of this paper to discuss ways in which number theoretic techniques can be used to perform DSP operations, such as this example filter,
72
GRAHAM A. JULLIEN
by reducing the amount of hardware involved in the circuitry. Since we are directly concerned with the implementation of DSP arithmetic, it might be helpful to finish off our example by considering the arithmetic details of each multiply/accumulate operation. The representation of each primary colour pixel by 8 bits is more than adequate for very high definition of the image, and the quantization of each filter weight by 8 bits is also very generous. This naturally leads to a 16-bit multiplier structure. Since there is an accumulation of 25 such multiplications for each output pixel, an extra 5 bits is required for this. With 21 bits we can, in fact, compute each output pixel with no error in the computation process. Since 21 bits is a pessimistic upper bound for the computational dynamic range (only required if both the filter coefficients and the input pixels are at their maximum values-a very rare occurrence), we can easily specify a 20-bit range for accurate computation. This calculation leads us to an important observation that we do not need floating-point arithmetic. We can, in fact, perform error-free computation within the dynamic range of a typical floating point standard mantissa length-no manipulation of the exponent being required at all! This example represents a fine-grained computation (one in which successive parts of the computation are of a limited dynamic range) and is typical of many DSP algorithms. The limitations we are able to impose on the arithmetic calculations will be a great help in reducing the hardware required in the very high-speed throughput systems discussed in this paper. For the purposes of this paper we restrict ourselves to the computation of linear filter and transform algorithms that have, for the one-dimensional case, the inner product form N- 1 Yn =
C
XmWf(n,m)*
m=O
In the case of linear filtering (convolution) f ( n , m ) = n - m;for the case of a linear transform with the convolution property, wS(,,,,)is a primitive Nth root of unity. For the DFT,W J ( ~ = . ~ epi(2n/N)nm. ) Although this limitation appears overly restrictive, the general form of this equation probably accounts for the vast majority of digital signal processing functions implemented commercially, and also leads us to a body of literature that uses number theory to indirectly compute the convolution form of the inner product formulation. We will also restrict our investigation of the application of number theoretic techniques to those that use finite field arithmetic directly in the computations. This will not cover those algorithms whose structure is derived via number theoretic techniques, and whose computation is performed over the usual complex field ( e g , Winograd Fourier transforms); there is a vast amount of literature already available on this separate subject, and the reader is directed to the work by McClellan and Rader (1979) as a starting point.
NUMBER THEORETIC TECHNIQUES IN DIGITAL SIGNAL PROCESSING
73
Our discussion in this review concentrates on two areas: the use of finite ring/field operations for computing ordinary integer computations (residue number system) for high-speed signal processing, and the use of algorithms defined over finite rings/fields in which not only the arithmetic is performed over such finite structures, but also the properties of the algorithm depend on the properties of the finite field/ring over which the algorithm is defined. This is an important differentiation. Our sole concern here is to examine arithmetic and general computations over finite mathematical systems. Our starting point should be at the recognized starting point for the modern field of digital signal processing, the fast Fourier transform (FFT). The FFT is not a new transform, but really a name for a set of algorithms that compute the discrete Fourier transform (DFT)with much reduced arithmetic operations. In particular, multiplications, traditionally a hardware- and/or time-consuming arithmetic operation, are reduced considerably. With the publishing of the first FFT algorithm (Cooley and Tukey, 1965), digital signal processing began to emerge as a field in its own right. Only a few years after that landmark paper, the first applications of finite field transforms to computing convolution appeared. We do not attempt to retrace the history here, but we discuss these two discoveries as initial steps on the road to the application of number theory to the digital processing of signals. The discovery of efficient ways of computing the DFT allowed both the computation of spectra and the implementation of finite impulse response filters to be carried out with much-reduced computational time. The most interesting use of the FFT, in the initial years, was in the latter application. The ability of the DFT to indirectly compute the inner product form for FIR filters comes from its convolution property. The result is surprising because it seems, on the surface, to involve more work in that three transforms have to be computed. It is also surprising that a transform with a basis set involving trigonometric functions can be used to efficiently compute convolutions between, say, integer sequences, but it does indeed work. In order to examine this remarkable property, we briefly discuss FIR filters and then introduce the DFT. A. Finite Impulse Response Digital Filtering Using DFTs
Finite impulse response (FIR) digital filters produce an output based on a weighted sum of present and past inputs. Mathematically, this is a weighted averaging scheme represented by
where { h n )is the weighting sequence, and { x n }is the input sequence.
74
GRAHAM A. JULLIEN
The weighting sequence {h,} characterizes the filter, and it is the determination of the sequence that corresponds to the design of the filter. The sequence is sometimes referred to as the impulse response based on the equivalent characterization of analog filters. In the digital case the impulse function is a perfectly realizable sequence { 1,0,0,0,. ..}. If we use this impulse function as the x sequence, the output for the filter will be the ( h n ) sequence. We can also define a frequency response for the filter by using an input sequence that is a complex sinusoid, x, = ein". Here, the frequency is simply an angular increment R, which locates the value of the input, at sample n, on the unit circle at an angle of an.As with continuous systems, we can invoke the use of a transform to map convolution into multiplication. The transform we use is the z-transform (Jury, 1964), which is defined as
c
N - .1 .
X(z)=
x,z-,.
i=O
Using this definition, and the convolution property, we are able to define a transfer function for the FIR filter as
By letting z = eiR, we can find the magnitude and phase shift (polar coordinates) of H(e'*), and hence the frequency response (H(eiR)as a function of a).We note that a simple inversion procedure exists; we simply examine the coefficients of H(z) and assign them to delayed samples of the impulse response. H ( z ) can also be obtained by a polynomial division of Y(z) by X ( z ) ; for the FIR filter we know that this division will always have finite length and so H(z)(for z an indeterminate complex number) has only zeros. For filters that have an infinite length response (infinite impulse response (IIR) filters), H(z) contains poles as well as zeros; i.e., we represent H(z) as a rational function of polynomials. Closed solutions exist for determining the impulse response for IIR filters based on the roots of H(z) (Jury, 1964). We do not consider IIR filters here, since our interest is in implementing a finite convolution process. IIR filters possess the property that they are able to implement arbitrary frequency magnitude responses with many fewer multiplications than equivalent FIR filters. They have unfortunate side effects, however: possible instability due to inexact multipliers; the possibility of producing limit cycle oscillations; the possibility of overflow oscillations; and difficulty in implementing linear phrase responses. Although a great deal of effort has been spent on studying such filters (see, for example, Rabiner and Gold, 1975), it is not clear that the implementation efficiency suitably compensates for the problems. Perhaps the best solution is to concentrate on efficient implementation strategies for FIR filters. This is the main subject of this paper.
NUMBER THEORETIC TECHNIQUES IN DIGITAL SIGNAL PROCESSING 75
11. DISCRETE FOURIER TRANSFORMS
The DFT is defined as a transformation that maps a sequence x n where
5
xk
with RN = 2nfN. In terms of the Fourier transform, this may be regarded as an approximation to the Fourier integral, but it turns out that it is a transform in its own right with exact properties. Two of the properties that are important to the subject of this paper are the inverse and convolution properties. In fact, we need the first to prove the second. A. Inverse and Convolution Property
One important property is that the transform is reversible, i.e. the mapping x,,
G xk exists, with an inverse given by
The proof that the inverse exists yields the basis for the second important property, the convolution property. The easiest way to prove that Eq. (2) is the inverse of Eq. (1) is by direct substitution. We will make the substitution using an appropriate dummy variable, m;the result should yield that x , = x,, Vrn = n.
Interchanging the order of summation and gathering terms, we find
or
The function W(k[rn- n ] ) displays rather an interesting property. W(k[rn- n]) is a function on the unit circle in the complex plane with N unique values: 1, ejIm-nln~ e i 2 1 m - n l Q ~,-..,e i ( N - I ) [ m - n l Q ~ for k = 0, 1,2,...,
76
GRAHAM A. JULLIEN
N - 1. For [m - n] = KN,where K is some integer, all values of the function are 1 and so the sum, W,(k[m- n]), evaluates to N. For [m - n] # K N we can invoke the fact that the function yields a finite geometric series with the sum
c,":
Thus, the only nonzero values generated by Eq. (3) are for [rn - n] = KN, and for these values we obtain the result x, = x , , + ~If~ we . restrict both the sample and transformation domain to be periodic, with period N , then we have proved that the transform is invertible. Based on this result for the existence of an exact inverse for the DFT, we can now proceed to examine the convolution property. We start by considering the result of multiplying two transform domain sequences together. Assume that the sample domain sequences are x, and y, withetransform domain sequences of xk and &. The result of multiplying the transform domain sequences is a new sequence zk = xkUsing the DFT of the x sequence explicitly, we have
c x,W,(kn)&.
N-l
2, =
n=O
(4)
Inverting Eq. (4) yields
We now interchange the order of summation to find
The inner expression of Eq. (6),in parentheses, is the sequence yl, - Since { y } is periodic, we can retain values from a single period by forming yIma N, Note that here we are using the notation ONto indicate addition modulo N. This notation will be expanded later. From Eq. (6)we obtain the final result for the output sequence: N-
zm =
1
n=O
Xny[m@N(-n)]*
(7 1
z,,, is therefore the convolution of the sequence {x} with the sequence { y } . It
is the convolution of two periodic sequences and is often called cyclic, or periodic, convolution. As noted before, we only need to know the samples of a single period in order to perform the calculation. If we require to determine the
NUMBER THEORETIC TECHNIQUES IN DIGITAL SIGNAL PROCESSING 77
periodic convolution of two sequences, then we can perform the calculation indirectly by taking DFTs of each sequence, multiplying in the transform domain, and inverting the result. If we require the convolution of two aperiodic sequences, it turns out that we can still use periodic convolution, providing that we are prepared to perform several periodic convolutions with subsamples of the sequences, and assemble the final sequence from partial outputs of the periodic convolutions; these are the so-called overlap/save and overlap/add procedures. We are not able to give full details of such procedures here, but there are many fine textbooks that detail the two approaches (e.g., Rabiner and Gold, 1975). The use of an indirect computation of convolution is advantageous when the amount of computation involved is considerably less than that required using the direct convolution sum. An N-point periodic convolution requires N 2multiplications and N 2 - N binary(two-va1ued)additions. Can we reduce this by an indirect computation involving DFTs? The answer is yes, but only if the transform can be computed efficiently. An algorithm that efficiently computes a DFT is called a fast algorithm, and the computation of a DFT with such an algorithm is called a fast Fourier transform (FFT). Note that the FFT is not a separate transform, just a fast (or efficient) way of computing a DFT. For simplicity we will discuss the development of fixed radix decimation algorithms that use a power of two as the radix. These were the first fast algorithms to be developed, and they still represent the major implementation procedure for commercial hardware and software solutions. 111. FASTFOURIER TRANSFORM (FFT) ALGORITHMS
We will briefly examine the classical decimation in time algorithm to appreciate the magnitude of the savings that can occur in computing the DFT using a fast algorithm. A. Decimation in Time FFT Algorithm
There are two basic decimation (sequence slicing) procedures: decimation over the sample domain (decimation in time, DIT) or decimation over the transform domain (decimation in frequency, DIF). We will consider the first procedure, with the assumption that N contains some power-of-2 factors. The transformation x, S xk can be written N-l
78
GRAHAM A. JULLIEN
and we can decompose the computation by computing Eq. (8)for the even and odd samples, separately, and then put the two results back together. This yields
We have now reduced the computational requirement for an N-point DFT to that required for two (N/2)-point DFTs and an extra N multiplications. If we are interested in reducing multiplications, then we have reduced the requirement from N 2 multiplications to ( N 2 / 2 )+ N multiplications. Although this may not represent a large reduction, we are able to repeat the process by decimating each of the (N/2)-point DFTs to two (N/4)-point DFTs. We can continue in this way until we run out of factors of N . The greatest decimation occurs for N, a power of two, in which the basic computational element reduces to a two-point DFT (which can be computed using real numbers). In this case the number of multiplications reduces to (N/2)log, N (in fact, some of these multiplications are trivial, but we will use this figure as an upper bound). The calculation reduces to the calculation of ( N / 2 )log, N two-sample DFTs with a multiplication by a power of W. This multiplier is popularly referred to as a “twiddle factor.” The basic computational element, shown in the signal flow graph in Fig. 1, is often referred to as a “butterfly” based on its structure. Notice that there is only one complex multiplication and two complex additions per butterfly. An example is shown in Fig. 2 for an 8-point transform using 410g, 8 = 4 x 3 = 12 butterfly calculations. Appropriate pairs of shaded nodes represent the addition/subtraction network of the butterfly calculation. Note that although there are 12 multiplications indicated, in fact only five of these are nontrivial (i.e. not a multiplication’by unity). Before we leave this brief introduction to fast algorithms, it can be observed that the output sequence has been scrambled in the process of reducing the number of
WN
FIG. I . Butterfly calculation with twiddle factor multiplier.
NUMBER THEORETIC TECHNIQUES IN DIGITAL SIGNAL PROCESSING
79
FIG. 2. FFT signal flow graph for &point DFT transform.
multiplications, and that the structure is not the same from stage to stage. Algorithms are available that do not scramble the output data (or require the input data to be scrambled), and there are also algorithms that maintain a constant structure, but not together (a sort of Heisenberg uncertainty principle for FFTs!). It seems to be a law of nature that as one attempts to reduce the number of arithmetic operations (or at least the most costly, which is multiplication), the structure suffers. This comes back to haunt us when we examine VLSI implementations of such algorithms. There are many works available on the myriad of fast algorithms that have been discovered since the initial algorithm was published (for example, Blahut, 1985). It is our intent here simply to point out that these algorithms exist and to briefly discuss one of the earliest.
B. Use oj' the DFT in Indirect Computation of Convolution Since DFTs can be computed very efficiently, it becomes clear that the operation of convolution can also be computed much more efficiently than by implementing directly the convolution sum. Assuming that we wish to compute circular convolution, then an N-point convolution can be computed using two DFTs and an inverse DFT. Since an inverse D F T is the same complexity as a forward DFT, the number of multiplications involved are 3N log, N + N . The reduction in multiplications for an N-point circular convolution is (l/N){1 + 3 log, N 1. For the case of filtering with a fixed set of
80
GRAHAM A. JULLIEN
weights, we can store the DFT of the weight sequence and obtain a greater reduction of (1/N){1 + 2 log, N ) multiplications. These savings are reduced if one is interested in aperiodic convolution (the normal filtering operation), but for large filter kernels (large N ) the savings can be significant enough to warrant using this indirect approach. We note one unfortunate fact: even if we are convolving only real sequences, it is still necessary to use complex arithmetic. In fact, our reduction calculations above are predicated on the need for complex sequence convolution. There is another unfortunate effect that since the roots of unity are obtained from transcendental functions, we are not able to compute convolutions without error, even the convolution between sequences of integers, which in the direct form can be easily computed without invoking any errors, will have to be computed using approximations to the roots, and will, in general, invoke errors in the final convolution sequence. There are mathematical systems in which roots of unity can be generated and used in calculations without any approximations. In fact, approximations are meaningless. The mathematical systems are finite algebras, and we devote the remainder of this chapter to discussing their use in the computation of DSP algorithms. It is important t o establish some basic concepts and theories relating to the subject of finite algebra; the following section is included for that purpose.
IV. FINITE ALGEBRAS For our specific purposes we will be using algebraic structures that emulate normal integer computations. The entities to be discussed are groups, rings, and fields that contain a finite number of integers as their elements and allow arithmetic operations that are modifications of normal integer arithmetic operations. This will not be an exhaustive review; the reader is referred to standard texts on number theory (e.g., Dudley, 1969, and Vinogradov, 1954), as well as texts on modern algebra (e.g., Burton, 1970,and Schilling and Piper, 1975). As a starting point, we formally define the notion of a ring and field.
A. Rings and Fields
DeBnition 1 : If R is a nonempty set on which there are defined binary operations of addition, +, and multiplication, such that the following a,
NUMBER THEORETIC TECHNIQUES IN DIGITAL SIGNAL PROCESSING
81
postulates (I)-(VI) hold, we say that R is a ring:
(I) Commutative law of addition, u + h = b + a (11) Associative law, ( a + h) + c = a + ( b + c) (111) Distributive law, a (h + c) = a * b + a c (IV) Existence of an element, denoted by the symbol 0, of R such that a + 0 = a for every a E R (V) Existence of an additive inverse. For each a E R, there exists x E R such that a + x = 0 (VI) Closure, where it is understood that a, b, c are arbitrary elements of R in the above postulates B
-
-
A ring in which multiplication is a commutative operation is called a comrnutatiae ring. A ring with identity is a ring in which there exists an identity element, 1, for the operation of multiplication, a . 1 = 1 a = a for all a E R. The ring of integers is a well-known example of a commutative ring with identity. In abstract algebra, elements of a ring are not necessarily the integers, even not necessarily numbers; e.g., they can be polynomials. Given a ring R with identity I, an element a E R is said to be invertible, or to be a unit, whenever a possesses an inverse with respect to multiplication. The multiplicative inverse, a - I , is an element such that a-' a = 1. The set of all invertible elements of a ring is a group with respect to the operation of multiplication and is called a multiplicative group.
-
-
Definition 2: If R is a ring and 0 # a E R , then a is called a divisor of zero if there exists some b # 0, b E R such that a * b = 0. Definition 3: An integral domain is a commutative ring with identity that has no divisors of zero. Definition 4: A ring F is said to be a field provided that the set F - {O} is a commutative group under multiplication.
Viewed otherwise, a field is a commutative ring with identity in which each nonzero element possesses an inverse under multiplication. It follows (McCoy, 1972) that every field is an integral domain. The rings (fields) with a finite number of elements are called Jinite rings (fields).A ring of integers modulo m. denoted here as 9m, is an example of a finite ring. In every finite field with p elements, the nonzero elements form a multiplicative group. This multiplicative group of order p - 1 is cyclic; i.e., it contains an element, u, whose powers exhaust the entire group. This element is called a generator, or a primitive root of unity, and the period or order of o(
82
GRAHAM A. JULLIEN
is p - 1. The order of any element x in the multiplicative group is the least positive integer t such that 2' = 1, xs # I, s E [1, t). The order t is a divisor of p - 1 and x is called a primitive t-th root of unity. For every element x of the set F - (01, the mapping defined by x = mz, z E {0,1,. . ., p - 2) is the isomorphism of the additive group of integers with addition modulo p - 1 and the multiplicative group of the field F. The integer z is called the index of x relative to the base, denoted ind,x. 1. The Ring of Residue Classes
In order to describe the system, the notion of congruence should be introduced. The basic theorem, known as the division algorithm, is established first.
DIVISION ALGORITHM.For given integers a and b, b not zero, there exists two unique integers q and r such that a = bq r, I E (0, b). It is clear that q is the integer value of the quotient a/b. The quantity r is the least positive (integer) remainder of the division of a by b and is designated as the residue of a modulo b, or !atb.We will say that b divides a (written b 1 a ) if there exists an integer k such that a = bk.
+
Definition 5: Two integers c and d are said to be congruent modulo m, written c
= d (mod m) if and only if rn I (c - d ) .
Since m 10, c = c (mod m) by definition. Another alternative definition of congruence can be stated as follows: Two integers c and d are congruent modulo m if and only if they leave the same remainder when divided by m, IcI, = Idl,. rn Dejnition 6: A set of integers containing exactly those integers that are congruent, modulo m, to a fixed integer is called a residue class, modulo rn.
The residue classes (mod rn) form a commutative ring with identity with respect to the modulo rn addition and multiplication, traditionally known as the ring of integers modulo m or the residue ring, and denoted ?Zm.The ring of residue classes (mod m) contains exactly m distinct elements. The ring of residue classes (mod m) is a field if and only if m is a prime number, because exist for each nonzero multiplicative inverses, denoted a-'(mod m) or Ja-'Jm, a E Tm. Thus the nonzero classes of Tm form a cyclic multiplicative group of order m - 1, { 1,2,.. . , rn - I ) , with multiplication modulo m, isomorphic to the additive group (0, 1,. . .,m - 2) with addition modulo m - 1.
NUMBER THEORETIC TECHNIQUES IN DIGITAL SIGNAL PROCESSING
83
If m is composite, 3mis not a field. Multiplicative inverses do not exist for those nonzero elements a E Ymfor which the gcd (a, m) # 1. The Euler's totient function, denoted &m), is a widely used number theoretic function and is defined as the number of positive integers less than m and relatively prime to it. It follows that the number of invertible elements of 2Tm is equal to &m). If m has the prime power factorization m = m l e ' m z e 2 ... mLeL,then we can write
The Euler-Fermat theorem states that if e is an integer, m is a positive integer, and (e,m) = 1, then e@'"'' = l(mod m). The important consequence of this theorem is that there is an upper limit on the order of any element a E 2?m.Specifically,the order t is a divisor of &m). 2. Galois Fields
For any prime m and any positive integer n, there exists a finite field with m" elements. This (essentially unique) field is commonly denoted by the symbol GF(m")and is called a Galois field in honor of the French mathematician Evariste Galois. Since any finite field with m" elements is a simple algebraic a brief review of the basic concepts of the extensions extension of the field S,,,, of a given field is presented. Let F be a field. Then any field K containing F is an extension of F. If E. is algebraic over F, i.e., if iis a root of some irreducible polynomial f ( x ) E F [ x ] such that f(A) = 0, then the extension field arising from a field F by the adjunction of a root I is called a simple algebraic extension, denoted F(E.). Each element of F ( i ) can be uniquely represented as a polynomial: a,
+
+ ... + an-1i"-',
U, E
F.
This unique representation closely resembles the representation of a vector in terms of the vectors of a basis, 1 , ) .,..., A"-'.
The vector space concepts are sometimes applied to the extension fields, and F(A) is considered as a vector space of dimension n over F. The field of complex numbers is an example of an extension of the field of of the iirreducible real numbers; it is generated by adjoining a root j = & polynomial x 2 + 1. Ifj(x) is an irreducible polynomial of degree n over Tm, m a prime, then the Galois field with rn" elements, GF(m"), is usually defined as the quotient field 2m[x]/(jjx)), i.e., the field of residue classes of polynomials of Tm[x] reduced modulo f(x). All fields containing m" elements are isomorphic to each other.
84
GRAHAM A. JULLIEN
In particular, 2T,,,[x]/(f(x))is isomorphic to the simple algebraic extension 2Y,,,(A), where rZ is a root of f ( x ) = 0. Now that we have some basic background in finite algebraic structures, we can return to the problem of indirectly computing convolution via transforms that exhibit the cyclic covolution property. We will expand our definition of such transforms from the DFT to the NTT (number theoretic transform). V. NUMBER THEORETIC TRANSFORMS
Number theoretic transforms (NTTs) were discovered independently by
a number of researchers (Nicholson, 1971; Pollard, 1971; Rader, 1972a) as a generalization of the standard DFT with respect to the ring over which the transform is defined. The transform retains the same cyclic convolution property as the DFT, but allows error-free computation with the promise of much faster hardware solutions. The NTT has the same form as the DFT, with the exception that it is computed over a finite ring, or field, rather than over the field of complex numbers: N-l
where the ring, or field, is R( p ) or GF(p ) respectively. Note that a new notation has been introduced here. In order to explicitly identify arithmetic operations over finite fields, or rings, we use the symbols @, @, to represent addition and multiplication modulo p, respectively, and C, to represent summation modulo p . Where it is clear that a single ring or field is being referred to, then the subscript may be dropped (except on the summation). This notation is used throughout the rest of the paper, and is extended where appropriate. The NTT that has probably been studied more than any other is the Fermat number transform (FNT). This, in large part, is because modified binary arithmetic can be used to perform the computations. It is not clear, given the present body of knowledge on VLSI implementations, that this ability to use modified binary hardware is necessarily a desirable attribute of number theoretic algorithms. It is the intent of this paper to present alternative forms of implementation of number theoretic techniques that do not have to be predicated on the existence of standard computational elements. This opens up the possibility of using a wider variety of number theoretic transforms. Let us, however, start off by exploring number theoretic transforms based on Fermat numbers.
NUMBER THEORETIC TECHNIQUES IN DIGITAL SIGNAL PROCESSING 85
A. Fermut Number Trunsforms
Although the NTT allows convolution to be computed without error (unlike the DFT, which introduces errors because of the requirement for arithmetic with coefficients obtained from transcendental functions), the use of such a finite field (or ring) transform is only appropriate when the hardware has no greater complexity than normal computations over the reals. Satisfying this requirement can be a problem when the field modulus is not chosen with the implementation in mind. Among the earliest implementations considered (and still occupying a small, but current, research interest) are Fermat number transforms (FNTs). The FNT is initially defined over a Galois field GF(F,) where the modulus F, = 22' + 1 is the rth Fermat number (FN), where the FN is prime (primes do not exist for all values of t). This transform brings about three main implementation simplicities (Agarwal and Burrus, 1974): (1) The arithmetic is very similar to ordinary binary arithmetic because the modulus is close to a power of two. ( 2 ) Simple forms can be found for the multiplicative group generator that reduce the multiplications required to compute the transform to modified bitpattern shifts. (3) A standard fast algorithm, obtained from an appropriate FFT, can be used to compute the transform.
The idea of being able to compute over a finite field but with only slightly modified binary hardware is obviously appealing, albeit constraining. The FNT pair is defined as
x,=
z-
N-1 -
n=O
F,CX,
OF, ankl,
There are some points that we should note: (1) Since N 1 F, - 1, it is clear that we can generate a maximally composite N = 22' and so use one of the fast algorithms available for the DFT, to optimize a computational structure for the FNT. (2) If we restrict N to be even, N - ' will always exist. (3) The first four Fermat numbers are prime, and 3 is a primitive root for all four of these primes.
From 1 and 3 we find that 322'= 1. If we wish to find a generator, a, for a multiplicative group of order 2'(this will allow a transform of length 2'), then
86
GRAHAM A. JULLIEN
32l' = - 1., hence a 3 3z ( 2 l - b ) . This last expression allows us to determine the relationship between suitable values of a (for ease of implementation) and possible transform lengths. The easiest value of tl to choose is clearly a = 2. This will allow multiplication by powers of a to be replaced by shifts modulo 6 .Since F, is close to a power of two, the hardware should be only moderately more complex than shifts modulo a power of two (binary shifts). It turns out that for a = 2, the order is 2"'. This can be verified, using ordinary arithmetic, from a2b
= -
2 2 ( ' + ' 1 = (22'
+ 1)(22' - 1) + 1 = 1.
The largest Fermat prime is F4, for which the order is 32; the arithmetic is modified 17-bit binary. If we examine F, and F6, we find that they are composite numbers of the form F, = K 2 ' + 2 + 1. We have to pause at this moment to reflect on the fact that 6, t E {5,6} is not a prime number. Clearly we will not be computing over a field, so will the NTT still work? It turns out that the conditions for the existence of the transform still exist when computing over a ring. The restriction is that the transform has to exist over each field constructed from each of the prime factors of the composite modulus. The transform length, N, therefore satisfies N 1 Fj" Vi, where F, = Fj" (this assumes no powers of primes as factors). For F5 and F6 we can show that N = 2 1 + 2is the maximum transform length supported by these Fermat numbers; as a necessary condition, we can see that 2'+2 I (F, - 1). Although we are only computing over a ring of integers SFt,2'+24'K21'2 + 1, and so N-' exists. The length supported by a = 2 is 2'+', as before. For all the Fermat numbers considered here, it turns out that the transform length, N = 2'+', can be achieved by using a value of a = f i (Agarwal and Burrus, 1974);i.e. a2 = 2. The expression for c1 = f i is given by a = 22('-2) ( p - ' ]I),
n
which is clearly representable by a 2-bit (two ones in a field of zeros) generator. For a fast DIT or DIF algorithm, only the first, or last, stage will require odd powers of a, and so most of the multiplications will only require single-bit and the fifth FN, we have a dynamic mukipliers. Given the use of a = range of just over 32 bits, with a transform length of N = 128. It is clear that there is an unfortunate link between dynamic range and transform length, assuming the requirement for a fast algorithm. It might be helpful to take an example of convolution over GF(F2)in order to reinforce the results presented here. In order to illustrate the difference between performing convolution using an NTT and an 8-point DFT, we will chose a = 32 = 9 in order that the NTT may also generate an 8-point transform length.
a
NUMBER THEORETIC TECHNIQUES IN DIGITAL SIGNAL PROCESSING
87
The transform pair is
We can concisely show the calculation using matrix notation (the modulo arithmetic is implied): X = Tx,
(13)
where T is the transformation matrix, whose elements are given by
Kn = gkn Note that the corresponding D F T transformation matrix has elements given by T,, = e-Jnkn14 We can immediately see the major difference in the computation of both transforms. Although the DFT uses normal arithmetic, the calculations are ovqr the complex field, with elements that are not representable in a finite machine. The NTT,although requiring finite field (modulo) arithmetic, requires calculations over an integer field using elements that are defined without error. As an example calculation, in order to compare the two approaches, we will cyclically convolve the sequence {x} = { 1,2,3,4,0,0,0,0}with itself. This example is chosen to illustrate several points, as can be seen if we compute the cyclic convolution directly:
If we now compute the aperiodic (normal) convolution we find 7
yk=
1X n ' X k - n
or {y}={1,4,10,20,25,24,16,0).
n=O
We can observe that both the computed sequences are identical, which shows that it is possible to generate normal aperiodic convolution from periodic convolution by padding the sequences to be convolved with zeros. This forms the basis for the classical techniques of overlap-add and overlapsave for using cyclic convolution to compute normal convolution (Gold and Rader, 1969). As a second point, we also notice that some of the elements of the convolution output lie outside the elements of the field over which the
88
GRAHAM A. JULLIEN
NTT is to be computed, GF(17). This will be discussed after the example calculations. We generate the forward transform as the matrix multiplication of Eq. (13): 110 / 1 16 1 9 6 1 13 11 - 1 15 15 1 16 13 1 8 1 4 1 2
1
1 1 13 15 16 4 4 9 1 16 1 3 2 16 13 4 8
1 1 16 8 1 13 16 2 1 16 1 6 9 1 4 16 15
1 4 16 13 8 1 16 4 15 16 13 13
@I7
3 4 0 0 0
The symbol, O,, , indicates matrix multiplication over GF(17). Since we are convolving the sequence with itself, we generate the transform domain pointwise multiplication (over GF(17)) as below, and feed this result to the input of the inverse transform calculation.
15 16 1 6 2 2 - 11 4 15 13 16 7 15 4 15 12 7 113 5 9
16 6 11 817
15
13 7
1 2 4 8 1 6 1 5 1 4 16 13 1 4 1 8 1 3 2 1 6 9 1 16 1 16 1 16 1 1 5 4 9 1 6 2 1 13 16 4 1 13
1 3 9 16 13 4 1 5 1 16 1 3 8 16 4
1 2 2 017
16 15
The inverse transform requires a final multiplication by 8-' = 15 to obtain the cyclic convolution output.
4
10 3 = 88
‘
15 12 7 13
7
5
16
9
We note that some of the elements of the output convolution do not equal the original calculation using the direct convolution sum. They are, in fact, the result of finding the least positive residue, modulo 17, of the correct ouput. In fact the results are not wrong in the accepted sense; they are simply a mapping of the correct values onto GF(17). We will see later that such results ean still be used, and in fact will provide a technique for removing some of the problems of the linkage of dynamic range with transform length. It will constructive at this point to examine the use of the DFT in performing the same convolution. Because we are performing the calculations over the complex field, we will treat the “real” and “imaginary” parts of the sequences and W elements as the elements of a 2-tuple; the interaction between the “real” and “imaginary” parts will use the usual rules of complex arithmetic. The forward transform is 10.00 0.00 -0.41 -7.24 -2.00 2.00 2.41 - 1.24
-2.00 2.41
0.00 1.24
-2.00 -2.00
-0.41
7.24
I .oo 0.00 - 1.oo 0.00 1.00 0.00 I .oo 0.00 1 .oo 0.00
1O . O 0.00 0.71 -0.71 0.00 - 1.00 -0.71 -0.71’ - 1.00 0.00 -0.71 0.71 0.00 1.00 0.71 0.71
.oo 0.00 1.00 0.00 I .oo 0.00 0.00 - 1.00 -0.71 -0.71 - .ooo.00 .oo 0.00 0.00 1 .oo - 1.00 0.00 0.71 -0.71 - 1 .oo 0.00 0.00 1.00 1 .oo0.00 1.00 0.00 - 1.00 0.00 1 .000.00 0.71 0.71 0.00 - 1.00 1.00 0.00 0.00 - 1.00 - 1.00 0.00 0.71 - 1 .oo0.00 0.00 1.00 -0.71
90
GRAHAM A. JULLIEN 1.00 0.00 1.00 0.00 -0.71 0.71 0.00 1.00 0.71 0.71 0.00 - 1.00 - 1.00 0.00 0.00 1.00 0.71 0.00 - 1.00 -0.71 0.71 0.71 1.00 0.00 - 1.00 0.00 -1.00 0.00 0.00 1.00 -0.71 -0.71 0.71 -0.71 0.00 - 1.00 0.00 1.00 - 1.00 0.00 -0.71 -0.71 0.00 - 1.00
2.00 0.00 3.00 0.00 4.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
The transform domain input to the inverse transform is the pointwise multiplication of the forward transform with itself
-52.28 6.00 0.00 -8.00 4.28 -6.00 4.00 0.00 4.28 6.00 0.00 8.00 -52.28 -6.00
-
-0.41 -7.24 -2.00 2.00 2.41 - 1.24 -2.00 0.00 2.41 1.24 -2.00 -2.00 -0.41 7.24
-0.41 -7.24 -2.00 2.00 2.41 - 1.24 -2.00 0.00 2.41 1.24 -2.00 -2.00 -0.41 7.241
The inverse transform is computed below with a final division by 8. 8.00 0.00 32.00 0.00 80.00 0.00 160.00 0.00 200.00 0.00 192.000.00 128.00 0.00 0.00 0.00
.oo0.00
1.00 0.00 1.00 0.00 I .oo 0.00 I .oo 0.00 .00 0.00 0.71 0.71 0.00 1.00 -0.71 0.71 - I .00 0.00 .OO 0.00 0.00 1.00 - 1.00 0.00 0.00 - 1.00 IO . O 0.00 .oo 0.00 -0.71 0.71 0.00 - 1.00 0.71 0.71 - I .oo 0.00 .00 0.00 - 1.00 0.00 1.00 0.00 - 1.00 0.00 1 .oo0.00 .oo 0.00 -0.71 -0.71 0.00 1.00 0.71 -0.71 - 1 .oo 0.00 .00 0.00 0.00 - 1.00 - 1.00 0.00 0.00 1.00 1.00 0.00 .oo 0.00 0.71 -0.71 0.00 - 1.00 -0.71 -0.71 - I .oo 0.00
-0.71 -0.71 0.00 -1.00 0.00 1.00 - 1.00 0.00 0.71 -0.71 0.00 1.00 - 1.00 0.00 1.00 0.00 0.71 0.71 0.00 -1.00 0.00 - 1.00 - 1.00 0.00
0.71 -0.71 0.00 - 1.00 -0.71 -0.71 0.00
- 1.00 -0.71
0.00
0.71 1.00
-52.28 6.00 0.00 -8.00 4.28 -6.00 4.00 0.00 4.28 0.00
6.00 8.00
' 1.00 0.00
4.00 0.00 10.00 0.00 20.00 0.00 25.00 0.00 24.00 0.00 16.00 0.00 0.00 0.00,
1
=-.
8
32.00 0.00 80.00 0.00 160.000.00 200.000.00 192.00 0.00 128.00 0.00
The calculations were performed at great precision, and so the effect of quantization noise is not seen in this limited-precision format. The quantization noise is present both in the inexact representation of the W coefficients and in the scaling procedures used to keep the data within the dynamic range of the computational system. Even at large precision, zero elements become nonzero, albeit very small, values. The fact that the input sequence is purely real shows up in the hermitian symmetry of the input to the inverse transform (even symmetry in the real part about the center sample [sample 43 and odd symmetry in the imaginary part). Although we have explicitly shown all details of the calculations (not normally shown in most texts), an examination of the DFT calculation compared to the GF(17) NTT calculation demonstrates the simplicity and error-free behavior of computing indirect convolution over finite fields. We are now in a position to summarize the features about each approach to computing indirect convolution. This is presented in Table I. There are probably more issues than those covered in the table, but the essential points are there. We see that the DFT offers no restrictions on dynamic range, transform length, and type of fast algorithm to be employed, but inherently introduces errors both in its inability to represent the roots of unity with perfect precision, and in the fact that in computing the transform there is computational number growth that has to be checked by scaling. On the other hand, the FNT has some fairly severe algebraic constraints that limit the type of transform that can be computed, but offers completely error-free computation of convolution, with the ability to perform the calculations with modified integer arithmetic and with simply structured coefficients.
92
GRAHAM A. JULLIEN TABLE I
COMPARISON OF DFT A N D FNT INTEGER SEQUENCE CONVOLUTION Comments Feature
DFT
Transform length
No restriction
Dynamic range
No restriction
Arithmetic Coefficients
Ordinary complex arithmetic Transcendental roots of unity
Precision
Limited by wordlength of the Computer system All “fast” algorithms
Algorithmic structure
FNT Restricted by requirement N 1 Fr’Vi Restricted by requirement
Modulo integer arithmetic Integer elements with simple structure Error-free Transform length for “fast” algorithms restricted
How are we to set about removing, or at least relaxing, some of the transform length and dynamic range constraints of the original FNT? We will start by considering algebraic “tricks” that can be played, followed by the use of multidimensional algorithms that can be used to dramatically increase transform length with no increase in the dynamic range requirement. We will now expand our horizons to examine general forms of NTTs. E . Indirect Convolution Using General NTTs
In order to look more carefully at the properties of the NTT, let us introduce it in a formal way: The general form of the number theoretic transform, over a ring R, is given by T: N- I
x , = ’ ~ - , C x . O , a ~ q ; kE{O,1,2 ,..., N - l}, n=O
where c1 is a primitive Nth root of unity in R. The inverse transform has the form
NUMBER THEORETIC TECHNIQUES IN DIGITAL SIGNAL PROCESSING 93
T-1. N-
x, =
1
1 ,[X,
0, a+”];
n E {0,1,2,..., N - l},
(15)
n=O
with the restriction that N - ’ E R . Over the complex field, the DFT, with ci = e - j ( 2 n / Nis ) , the only transform that exists of this form and possesses the cyclic convolution property as we demonstrated earlier. The complex field can support a DFT of any length, since Nth roots of unity exist for all N. In a finite algebraic system (finite ring* or field), the length of the transform depends on the choice of the modulus, p , and the generator, ci. As part of this formal introduction of NTT’s, we will consider the conditions for the existence of a transform. Nicholson (1971) was the first to present these conditions for existence, in his treatise on the algebraic theory of the “generalized DFT.” His algebraic system for the generalization was a commutative ring k with identity and without zero divisors (i.e., an integral domain). The necessary and sufficient conditions to generate an NTT of length N are: (1) That a is a primitive root of unity in R, i.e., aN = 1, ak# 1, k = 1,2,. .., N - 1. If we denote by f?@ a multiplicative (cyclic) group of R, ci is any generator of &,of order N . (2) That the multiplicative inverse of N , N - ’ , belongs to I?. For the special case of a field, GF(p), the maximum number of elements in R, is p - 1. More general results for the existence of NTTs over finite commutative rings have been presented by Dubois and Venetsanopoulos (1978). Let l? be a finite commutative ring with identity. Then R decomposes uniquely as a direct sum of local rings,** { R i } , which we will write as f? 2 R iwhere z indicates isomorphism. Under this decomposition, an element r E R has an L-tuple representation ( r l , . ..,rL). R supports a generalized DFT of length N if and only if
ex
(1) each R icontains a primitive Nth root of unity ai, and (2) N - ’ exists in i.
A primitive root of unity in R iis any generator of a multiplicative group of order N. If Zi is the maximal ideal of R , , then Ri,,, is a finite (Galois) field of
* Here the term “ring” implies a commutative ring with identity. ** A local ring is a commutative ring with identity that has a unique maximal ideal I
element u 4 I is invertible in a local ring).
(any
94
GRAHAM A. JULLIEN
mini elements (Burton, 1970).Thus for any finite ring R , a necessary and sufficient condition can be derived that R supports a length N generalized D F T if and only if N 1 gcd(m;‘ - l), V i E { 1,2,. . , ,L } . The notation a I b means a divides b; gcd is the greatest common divisor. Practical considerations dictate a selection of ring/fields that support transforms whose parameters lead to efficient implementation of modular arithmetic, either in hardware or software. Most of the reported work on number theoretic transforms has supposed that the hardware will be implemented using the binary number system. In the conventional binary arithmetic, residue reduction is particularly easy when the modulus is of the form 2k k 1. The choice 2k does not admit useful transforms since N = 1. Also, in order to simplify multiplications in the binary number system, the cyclic group generator, a, is chosen as a power of two. When this constraint has to be fulfilled, the transform length is usually not close to the maximum length attainable. In addition to the above, the transform length is normally chosen to be highly composite, so that a “fast” algorithm exists for the implementation. A considerable effort has been made to provide rings and fields that allow for adequate dynamic range and satisfy the above constraints, so that the conflicts hidden in these constraints are minimized. In particular, Rader (1972b) proposed transforms defined over the ring of integers modulo Mersenne numbers, M = 2 P - 1, p prime. These transforms are referred to as Mersenne number transforms (MNT). In the ring of integers, modulo a Mersenne number, 2 is a pth root of unity and -2 is a 2pth root of unity. Thus the transform can be computed without general multiplication, although the use of “fast” algorithms is precluded because the order of the transform is not a power of two and not even highly composite. And, as we have already discussed, both Rader (1972a) and Agarwal and Burrus (1974) proposed to compute the NTT with a modulus of the form of the tth Fermat number, F, = 2‘ + 1, b = 2‘. Agarwal and Burrus (1974) show that an FNT with a = 2 - 1) allows a transform length N = 2‘” and an F N T with a = 22t+2(22‘-L (known as 4,since a’ = 2 (Mod F,) allows N = 2‘”. Thus an FFT-type algorithm can be used. The hardware implementation of a 64-point FNT is described by McClellan (1976). The main disadvantage of both the MNT and the FNT is the rigid relationship between the dynamic range and attainable transform lengths. For example, with a 32-bit word machine using F, = 232 -t 1, N = 64 for a = 2 and N = 128 for a = There is also a limited choice of possible wordlengths. This point is especially significant when an FNT is used and may result in a mismatch of wordlength and dynamic range required for the particular convolution, because of the large spacing between Fermat numbers. As a modification to the original FNT and MNT, Agarwal and Burrus
a.
NUMBER THEORETIC TECHNIQUES IN DIGITAL SIGNAL PROCESSING
95
(1974) introduced a modified FNT by using ring moduli that had the same implementation properties as a Fermat number: namely, 2‘ + 1, b # 2‘. Such moduli have small factors and so yield small transform lengths. The transform length can be artificially increased by using multidimensional transform techniques, as we will see later. Probably the most interesting idea along the lines of modifying the ring modulus while retaining implementation efficiency was proposed by Nussbaumer (1976b), as pseudo-Fermat and pseudoMersenne number transforms. A pseudo-MNT is defined with a ring modulus of MA = (2” - l)/q, p composite and q some factor of 2p - 1, and a pseudoFNT is defined with a ring modulus of Ms = (2b+ l)/s, b # 2’ and s some factor of 2b + 1. Because 2 P - 1 and 2‘ + 1 defined as above are not prime, transforms will have a short length. Thus, if 2’ + 1 and 2p - 1 contain small factors, these can be divided out in the pseudo-MNT, or pseudo-FNT, to yield longer transform lengths. The arithmetic implementation of these transforms can still be performed using modified binary arithmetic over the modulus 2” - 1 or modulus 2‘ + 1, for the pseudo-MNT or pseudo-FNT, respectively, followed by a final reduction modulo MA or Ms. The difficult arithmetic operation (that is not performed easily with binary arithmetic) is therefore relegated to a very small part of the overall computation budget. In a complete search for algebraic properties that lead to useful transforms, and yet are still easily implemented by modified binary arithmetic, the search broadened to include field, or ring, moduli with a 3-bit form (three “ones” in a field of “zeros”) as compared to the 2-bit structure of Fermat and Mersenne numbers. These moduli are of the form 2” f 2” & 1 and have been investigated by Pollard (1976) and Liu et al. (1976). Other rings have also been considered, such as the ring of Eisenstein integers (Dubois and Venetsanopoulos, 1978). The driving force behind all of the investigations reported has been the use of modified binary arithmetic elements with multiplication by elements of the transform matrix kept as simple as possible, whilst yielding highly composite transform lengths that are reasonably long. It is amazing that algebraic structures exist that allow all of these requirements to be met! In the quest for more degrees of freedom in the algebraic structures that had already been discovered, the use of extension fields, and rings, provided an important tool. C. NTTs over Extension Fields
Extension fields can be of any dimension; in order to see how such fields are built, we discuss the construction of a second-degree extension field. The concept is a familiar one to those not versed in any modern algebra or number theory. We simply have to consider the relationship between the field of real
96
GRAHAM A. JULLIEN
TABLE I1 DECOMPOSITION OF COMPLEX ARITHMETIC OPERATIONS Operation
Resulting polynomial
+ (a + c)
(a
+ i6) + (c + id)
i(b + d )
(a
+ ib)*(c + i d )
i 2 ( b * d ) i(a*d
+
+ b*c) + ( a * c )
TO
REALOPERATIONS
Reduction i(b
Result
+ d ) + (a + c )
+1 i 2 ( b * d ) + i(a*d + b*c) iz
(n
+ c) + i(b + d )
(a*c
- b * d ) + i(a*d + h*c)
(a*c)
+-r 2 +
1
numbers, 9, and the field of complex numbers, 55‘. The field, 59,is in fact built from 9,by the adjoining of the solution of an irreducible polynomial over 9. The polynomial over 9 that has no solution in x 2 + 1=O. We usually refer to the solution as i 4 9,and we “build” the complex field as a polynomial field in i by the construction 55‘ = {9: +, * f where Y = a ib with a, b E R.The “complex” operators + and * are defined by reducing the polynomial that results from the ordinary algebraic application of addition and multiplica1 = 0. In the tion, + and * respectively, by the irreducible polynomial i2 case of addition, the resulting polynomial is first-order and so no reduction is applied, i.e., addition operates pointwise. In the case of multiplication, the resulting polynomial is of second order in i, and so it is reduced. This gives rise to the extra arithmetic operations involved in complex multiplication compared to complex addition-i.e., four “real” multiplications and two “real” additions for complex multiplication, compared to only two “real” additions for complex addition. The specific, well-known details are shown in Table 11. Note that the addition operator in the polynomial elements of % is formal, in that it is never implemented. We could just as easily write the element (a ib) as (a,@. The same concept of building fields can be applied to a Galois field, GF(p), and the degree of the extension field, d, produces a field of p d elements, GF(pd). In order to build the field, we use an irreducible dth-order polynomial over GF(p). It has been shown by Pollard (1971) that the cyclic convolution property exists over such extension fields, and that the transform length N 1 ( p d - 1). A polynomial of degree d of the form f ( x ) = C f = ,aixi,with aiE GF(p) and ad # 0, is defined to be irreducible (Gaal, 1971) if it cannot be expressed as a product of two polynomials of positive degree over GF(p). The quotient field GF(p ) , / ( f ( x ) is ) the required Galois field with p d elements. Up to isomorphism, the field GF(p d ) depends only on the degree of the polynomial and not on its particular form. A special case is d = 2, for which we generate a field GF(p2)which can
+
+
+
NUMBER THEORETIC TECHNIQUES IN DIGITAL SIGNAL PROCESSING
97
TABLE I I I (.r2 t 1)
2+ 1
0 1
(xz
1
Y
+ 1)mod 7
OVER
GF(7)
3
2 5 5
1
2 2
10 3
4 17 3
5
26 4
6 37 2
TABLE IV (.r2
+ 1 ) OVER GF(5) ~
Y
0
+ +
I
x2 1 ( x 2 1)modS
1
1 2 2
2 5 0
3 10 0
4 17 2
also be written GF(p)[x], where x 2 - r is the irreducible polynomial with which the field is built. In order to reinforce the ideas, we will go over this case with an example. We will take p = 7; the base field is given by GF(7) = {0, 1, 2, 3, 4, 5, 6; 0 7@,}. , We will assume an irreducible polynomial x 2 + 1 = 0. In order to examine the assumption, we create Table 111. We note that none of the results of applying the polynomial, x2 + 1, are zero, and so the polynomial is indeed irreducible over GF(7). As a contrast to this, we see in Table IV that the same polynomial defined over GF(5) is reducible. Formally we define - 1 as a quadratic nonresidue over GF(7) and as a quudrutic residue over GF(5). The determination of - 1 as a quadratic residue, or nonresidue, holds special significance for the computation of complex sequences over finite fields. The following theorem tells us which form of primes support which type of residue or nonresidue. This is a classical theorem, but we will present a proof using indices (Baraniecka and Jullien, 1980) that is normally not used. We will use indices later, and so an introduction here will be useful. A field GF(p ) possesses a multiplicative group G,( p - 1) = { 1,2,. . .,p : 0,) = GF(p) - fO}, with p - 1 nonzero elements. The entire group can be generated by repeatedly multiplying a generator, a, by itself p - 1 times. We therefore have a unique mapping between the power of the generator and any e where e is an element in GA( p - 1).We call k element in the group. Thus n' the index of e. Note that k E G,(p - 1) = (0,1,.. . , p - 1: 8,- an additive group. Indices are the finite field equivalent of logarithms and can be used to map multiplication into addition.
-
GRAHAM A. JULLIEN
98
Theorem 1. - 1 is a quadratic residue over GF(p) for all odd primes p of the form p = 4K + 1, and - 1 is a quadratic nonresidue over GF(p) for all odd primes p of the form p = 4K + 3. Proof: Clearly p = 4K + 1 or p = 4K + 3 defines the form of any odd and CT be number and therefore of any odd prime. Let z be the index of the index of - 1. Then 22 = a(mod(p - 1)). A solution will only exist for 0 divisible by 2; 2-' (mod(p - 1)) does not exist since 2 ( ( p - 1). is therefore a quadratic residue for CT even and a quadratic nonresidue for CT odd. ($_1)4 3 l(mod p), and so 40 = p - l(mod p - 1). This congruence has a solution only for p = 4K + 1. rn
J-1, J-1
Although we can find other quadratic nonresidues with which to build second-degreeextension fields, - 1 carries special significancein that it allows the emulation of complex arithmetic over a Galois field. If we compute transforms over such fields, we can convolve sequences of complex numbersa useful property in many communication filtering problems. We see that all primes are not created equal when it comes to emulating complex arithmetic. The development of the extension field operations of multiplication and addition uses the same principle as the development of complex field operations from base real field addition and multiplication; this is shown in Table V (similar to Table 11). In the case of a finite field, the polynomial reduction uses arithmetic defined over the base field GF(p). We illustrate the operation using a quadratic nonresidue for - 1 over GF(p2). Note in the table that we have introduced the notation ePand @, for extension field operations of addition and multiplication, respectively; the base field modulus is used for the subscript. The degree of the extension field is obtained from the number of elements within each n-tuple (in this case two); we have explicitly shown each element of GF(pZ)as a 2-tuple rather than using the formal operator, +. Note, also, that all implemented operations are base field operations. The result in Table 111 is a specific result for the quadratic nonresidue of - 1. In general, for a quadratic nonresidue of 4, we can generate Table VI. TABLE V EXTENSION FIELDARITHMETIC OPERATIONS DECOMPOSED AS BASE FIELDOPERATIONS FOR A QUADRATIC NONRESIDUE OF - 1 Operation
Result
NUMBER THEORETIC TECHNIQUES IN DIGITAL SIGNAL PROCESSING
99
TABLE VI EXTENSION FIELD ARITHMETIC OPERATIONS DECOMPOSED AS BASE FIELDOPERATIONS FOR A QUADRATIC NONRE~IDUE OF q ~
Result
Operation
As an example of multiplication over GF(7’), consider (2,4) O7 (5,3). The result is
(294) 0 {C2
0 7
31 0
7
C4 0
7
(593)
7
= ( { C2
0 7
5l} = ( ( 3 0
7
51 0 7 [ - 4
0 7
3]},
2},{6 O7 6 ) ) = ( 5 , 5 )
There are perhaps two questions uppermost in our minds right now: Why does such a structure form a field, and why are such fields useful in implementing NTTs? To answer the first question we need to explore the existence of all the field axioms. We do not offer a formal proof here except to point out what we need to prove: closure, the rules associated with the operations of addition and multiplication, identities, and the existence of the inverse of every element. The reason such fields are useful is two-fold. Firstly, the fields have p d elements, compared to the p elements of the base field; this means that transform lengths N 1 ( p d - l), as was pointed out earlier, and so the algebraic link between transform length and modulus of implementation has been relaxed. To answer the second question, in the case of second-degree extension fields based on nonquadratic residues of - 1, the field provides a natural system for implementing convolution between sequences of complex numbers. The form of a number theoretic transform over extension fields is identical to that of a base field transform, with the exception that the elements are n-tuples, and that the arithmetic follows the rules of polynomial addition and multiplication with reduction by the irreducible polynomial. Details of complex NTTs can be found in Vanwormhoud ( 1 978) and Baraniecka (1 980). In order to reinforce the ideas presented above, let us take the previous example of convolution, but this time we will compute the transform over GF(31); note that 3 1 is a Mersenne prime, and of the form 4K -t- 3, which supports the quadratic nonresidue of - 1. First we need to determine the possible transform lengths. The length N ((31’ - 1 ) or N 1960, and clearly N = 8 is a valid transform length. We can determine suitable generators for an 8-element multiplicative cyclic group by performing a linear search. In general, the
100
GRAHAM A. JULLIEN TABLE VII EIGHT-ORDER MULTIPLICATIVE GROUPOVER GF(31') ci'
a2
u3
a4
as
u6
a'
a*
4 4
0 1
27 4
30 0
27 27
0 30
4 27
1 0
search for an Nth-order group over a field GF(p2)can be facilitated by noting that (Baraniecka, 1980) (1) we need to evaluate only N / 2 powers of a, because aNi2= - 1 (mod p); and (2) we only need to find an element Ol whose order is KN where K is any positive integer, since c1 = P.
A suitable generator is a = (4,4); this can be demonstrated by generating the series {a,a2,.. . ,a8). The series is generated in Table VII, with the twotuple of each element in the series placed vertically. We note that '10 = (1,O) and that 01" # (l,O), n < 8, which satisfies the criterion for the multiplicative group. The convolution is shown below. We start with the forward transform. The elements are shown as 2-tuples; the symbol indicates matrix multiplication over the extension field GF(31'): 24 27 1 0 4 4 0 1 27 4 300 2727 030 4 2 7 29 29 10 0 I 30 0 0 3 0 1 0 0 1 30 0 030 9 21 - 1 0 27 4 030 4 4 300 4 2 7 0 I 2727 29 0 10 30 0 1 0 30 0 1 0 30 0 1 0 30 0 9 10 1 0 2727 0 1 4 2 7 300 4 4 0 3 0 27 4 10 030 30 0 0 1 29 2 10 030 30 0 0 1
20 30 40 00' 00 00
The input to the inverse calculation is the transform domain pointwise multiplication shown below:
2 25 24 27 24 27 0 8 29 29 29 29 12 6 - 9 21 9 21 4 0 29 0 03' 29 0 12 25 9 10 9 10 0 23 29 2 29 2
*
NUMBER THEORETIC TECHNIQUES IN DIGITAL SlGNAL PROCESSING
101
The inverse transform is computed as below:
so\
10 1 0 1 0 1 0 10 1 0 1 0 1 0 ' 1 0 4 2 7 030 2727 300 27 4 10 0 I 4 4 180 1 0 030 30 0 0 I 1 0 030 30 0 0 1 5 0 - I 0 2727 0 1 4 2 7 300 4 4 0 3 0 27 4 140 10 30 0 1 0 30 0 1 0 30 0 1 0 30 0 60 1 0 27 4 030 4 4 300 4 2 7 0 1 2727 1 0 0 1 30 0 0 3 0 1 0 0 I 30 0 0 3 0 40 \OO 10 4 4 0 1 27 4 3 0 0 2727 0 3 0 4 2 7
7 0 2 25 0 8 12 6 4 0 12 25 0 23
'
\ 2 6
with the final reduction by the multiplicative inverse of 8: 8-' = 4. 1 0'
40 10 0
20 0 = 8-l 25 0 24 0 16 0
10 18 0 50 14 0 60 40
We can note similarities between the calculation over GF(31 2, and the DFT calculation. The transform matrices for both the forward and the reverse transform have the same symmetry; if numbers close to 31 are replaced by their negative equivalent, the similarity is easy to see. We notice the same Hermitian symmetry in the forward transform domain as the DFT example, and because the modulus is larger than the largest convolution output sample, the convolution is determined exactly rather than with least positive residues for some of the samples, as was the case with the calculation over GF(17). Note also that convolution over GF(31) only yields a maximum transform length of 30 as against the maximum transform length of 960 for GF(312).We note that the largest power-of-two transform length for GF(31) is 2, whereas the largest power-of-two transform length for GF(31') is 64. This gives a dramatic example of the usefulness of extension fields for relaxing the relationship between dynamic range and transform length. We can find, quite easily, the maximum power of two transform length for an NTT defined over GF(p*),p of form 4K + 3. We simply have to express the prime in a different way. This is presented in the following theorem (Baraniecka, 1980). Theorem 2. Given a prime p = 4K + 3 = q2' - 1 where r is any positive integer and (q,2) = I ; an NTT defined over GF(p 2 ) can have a transform length N = 2' where the maximum value of B is r + 1.
102
GRAHAM A. JULLIEN
Proof: The order of the cyclic subgroup used to define the NTT has to divide p 2 - 1. Therefore N (q22" - 42'"). Since q is odd, N I $2'+' where (li is odd. Therefore we can always find an N = 2,' where the maximum value of B=r+l.
I
.
We can try this out in the previous example. p = 31 = 1 25 - 1; r = 5 and so the maximum transform length is N = 26 = 64, which checks with our calculation from p 2 - 1 = 312 - 1 = 960. Since one of the reasons for using extension fields is the ability to handle larger transform lengths, particularly transforms of maximum length N = 2', then the above theorem is important from an implementation perspective. We can always convolve two "real" sequences with a third "real" sequence using the complex extension field transform, by creating a complex sequence consisting of elements made up of a sample from each of the sequences; thus if the two sequences are {x,} and { y " } , then the sequence to be convolved is the complex sequence { x n , y n }This . works because the sequence being convolved with has the form (xn, 01, which operates on the two parts of the complex sequence independently. This also works for DFT indirect convolution. The two juxtaposed sequences do not have to be from separate sources; they can be from the same source at different sampling times. We can illustrate this below using the same NTT as in the previous example. We assume that all three sequences are the example sequence { 1,2,3,4,0,0,0,O}. The forward transform is computed as below:
The input to the inverse calculation is the transform domain multiplication shown below. Note that one of the vectors is taken from the previous convolution example transform domain output representing the transform domain output of the sequence {(xn,O)].
NUMBER THEORETIC TECHNIQUES IN DIGITAL SIGNAL PROCESSING
103
The inverse transform is computed as below: I 1 18 18 5 14 6 4
10 030 5 - I 0 21 27 1 0 30 0 14 10 27 4 6 4 10 0 1
1 0 1 0 10 1 0 1 7 1 0 l O \ 0 3 0 2727 300 27 4 0 1 4 4 8 21 10 030 30 0 0 I 30 0 0 I 23 8 0 1 427 300 4 4 030 27 4 6 18 1 0 300°31 4 4 ' I 0 3 0 0 10 3 0 0 030 4 4 300 427 0 1 2727 18 6 30 0 030 10 0 1 30 0 030 8 23 0 I 21 4 3 0 0 2727 030 4 2 1 ,21 8
4 4 10 10 20 20 = 8-' @, 25 25 24 24 16 16
Note that both the elements in the 2-tuple output sequence produce the correct convolved sequence. In general the juxtaposed sequence will be from different sequences or from the same sequence sampled at different times. One possible problem is the fact that the generator is no longer of a simple form; in fact, we have two elements to manipulate in the generator. Using the rules of the extension field arithmetic, we have increased the additive complexity by 2 and the multiplicative complexity by more than 4 (four multiplications and two additions). Let us examine the form of the generator for both types of prime field moduli. Although primes of the form p = 4K + 3 allow complex arithmetic rules for second-degree extension fields, GF(pZ),we cannot find a generator for a multiplicative group of order N = 2 B (where B is maximum) that has a simple form (i.e,,only one element). This has been shown by Baraniecka (1980) and is expressed in the following theorem. The generator CL of the multiplicative group GF(p2) - { O } , of order N = 2', has to have the general polynomial form c( = (y. j)with y, # 0 for B = I + 1, where p = q2' - 1, q odd.
Theorem 3.
Assume fl = 0, then ct has maximum multiplicative order p - 1 < hence fl # 0. Let ct = (y,fi) = (c f i d ) q have order 2rf1, where x2 - 6 = 0 is the irreducible polynomial used to build the extension field. Then (c + Gd)42' = - 1, since a"' = - 1. The above can be written = - 1. If we employ the binomial theorem and as: (c + f i d ) ( c + &d)@'= -&, then the property C P - ' '=~-) 1 (Vinogradov, 1954), hence (&)p Proof':
2pf1;
+
104
GRAHAM A. JULLIEN
we can simplify the above equation to (c + &d)(c + d d ) = - 1 or c2 6d2 = - 1. If we assume c = 0 (equivalent to the assumption fi = 0 since q is odd), then 6 d 2 = 1 has to have a solution, i.e. the index of 8-', ind(6-I)' has to be a multiple of 2. Since ind(6-') = p - 1 - ind(b), then ind(d) also has to be a multiple of 2, a contradiction. It follows that c # 0, which proves that y # 0. We now examine primes of the form p = 4K + 1.We can represent such primes in another way: p = s2' + 1, where s is odd. The largest multiplicative group, with a power-of-2 order, has an order N = 2"'. Baraniecka (1980) has shown, for such primes, that the generator can always be simplified to (0,a), a single nonzero element form. The following theorem generates this result. Theorem 4.
Let p = s2'
+ 1, (s,2) = 1, be an odd prime number. Then
(1) If g is a generator for the multiplicative group GF(p) - {O}, then
x 2 - g is an irreducible polynomial in GF(p)[x]. (2) If y is as in (l), then has multiplicative order ~ 2 " ' in GF(p2), where GF(p2)= {a b&. a, b E GF(p)}.
+
&
fi
(3) We can find a generator of a cyclic subgroup of order 2''' in GF(pZ),where r = g P s with ( p , 2) = 1 and x 2 - r a n irreducible polynomial in GF(p)[xl. Proofi (1) Suppose x 2 - g is reducible. Then there exists d E GF(p2) such that d 2 = g. From Fermat's theorem (Dudley, 1969), d P - ' = 1 = d 2 ( p - ' ) 1 2= g(P-')'2 = - 1. This is not the case for an odd prime, hence x 2 - g is
irreducible.
' = gs2'
that s2"' is the smallest order of &,Suppose (&)' = 1 fo: some 1 Iu < s2'+', then if 21 u, so that u = 224, 1 5 u < s2', we have (h)"' = g" = 1, which is impossible. If 2 1 u, then u 1 s; but u = un + y, 0 I y < n, n = s2', and so g y = g"-"" = g"(g")-" = 1. Therefore y = 0 and u = un. Because p - 1 = s2' is even, then t 0. Hence 2 1 n and so 2 u, a contradiction. ( 3 ) From (2), (,,@2cf' = 1, therefore (&>s has order 2'+1. Since s is odd, (&)s $ GF(p) and so there exists an irreducible polynomial x 2 - gs.We also know (Dudley, 1969) that for some p, such that (p, 2'") = 1, has order 2" '. It is clear (from the prime factor decomposition of 2"') that we only need to show ( p , 2 ) = 1. We now set r = yps. Since ps is odd, x2 - r is an rn irreducible polynomial in GF(p)[x]. (2)
(&)S2'+
I
= 1. We now have to prove
=-
(h).'
The theorem is important because it not only tells us that a simple generator form exists. but also how to find it! This is not the case for the other form
NUMBER THEORETIC TECHNIQUES IN DIGITAL SIGNAL PROCESSING
105
TABLE VIII
ARITHMETIC RULESFOR GF(p2),p Operation (a + j h ) + (c ((1
+ jdl
+ jh)*(c + j d )
Resulting polynomial j(h
+ d ) + (a + c )
jz(b*d)
+
+ j(a*d + h*O
(U*C)
=
4K
+3
Reduction j(h
+ d ) + (a + c ) j2+r
j 2 ( h * d )+ j(a*d (a*d
+-j ’ + r
+ b*c)
Result (a
+ c) + j ( b + d )
(a*c + r * b * d ) j(a*d + b*c)
+
of primes. We have, of course, only gained a doubling of the transform length from that offered by the base field, but we should not lose any of the efficiency inherent in using extension fields for p = 4K + 3, since a “complex” multiplication by the elements of the transform matrix only requires two “real” multiplications (plus a “real” addition). We must note here that in producing the first element of the 2-tuple (when the power of a has a zero second element), a multiplication by a constant, r, is required in the calculation. It is generally assumed that multiplication by constants is free, and although this seems a strange statement to make, we will demonstrate this fact in the section on hardware implementation. The rules of arithmetic used in GF(pZ),p = 4K + 3, are shown in Table VIII, where j = fi.We see the constant r in the multiplication result for the first element of the resulting 2-tuple. Let us take an example to see how the transform over this new type of field modulus works; we can find a suitable prime from tabulated data such as Abramowitz and Stegun (1968).We can find the tables under the section on combinational analysis, in which a table of least positive and negative primitive roots, along with the factorization of p - 1, for p a prime, is given. Our task is to find a prime p such that the factorization of p - 1 has a power of 2 as one of the factors. As an example we will take p = 29, for which p - 1 = 2’ 7. This means that a transform of order 8 is available over the extension field. Using the least positive primitive root from the tables, g = 2, b will generate a multiplicative subgroup of order 23 7 = 56. The and so , = generator of the multiplicative group of order 8 is given by a = where ( p , 2) = 1. Mapping to indices, we are searching for an element r subject to the constraint ind,r = 7p. The only indices that satisfy this constraint are 7 x 1 = 7 and 7 x 3 = 21. We can compute the associated values of r from r = 2ind2r, We therefore obtain r = 12 and r = 17 as the only two values of r. These yield four values of a: a = f i = 8 fi = (0,8)= 2 1 f i = (0,21) and c( = , h? = 9 f i = (0,9)= 20fi = (0,20).The elements of the multiplicative subgroup generated by each of these generators are shown in Table IX. The
-
-
4
106
GRAHAM A. JULLIEN
TABLE IX MULTIPLICATIVE SUBGROUPS OVER GF(292)
I I
0
12
0
28
0
17
0
1
8
0
9
0
21
0
20
0
0
17
0
28
0
12
0
1
8
0
20
0
21
0
0
1
0
17
0
28
0
12
2o
0
21
0
9
0
0
12
0
28
0
17
0
1
21
0
20
0
8
0
9
0
O
1 I
1
elements are arranged vertically with the first element of the 2-tuple above the second. We note from the table that the lower two subgroups are the inverse of the upper two subgroups in that CT x T = (l,O), where CT, T are elements of the respective groups. In forming a transform we can use one of the subgroups to generate the transform matrix for the forward transform, and the other to generate the matrix for the inverse transform. This is demonstrated below using the same convolution example as used earlier, with the first and third subgroups used to generate the forward and inverse transforms, respectively. The forward transform is computed as below: 1 0 0 8 120 0 9 25 2 1 0 12 0 280 17 0 3 3 715 - 1 0 0 9 170 0 8 1 0 28 0 1 0 28 0 2727 2014 1 0 0 2 1 120 0 2 0 1 0 17 0 2 8 0 12 0 2222 1 0 0 2 0 170 021 10 2,
2 8 0 0 2 1 1 7 0 0 20 1 0 12 0 2 8 0 17 0 2 8 0 0 2 0 1 2 0 0 21 1 0 28 0 1 0 28 0 280 0 8 170 0 9 1 0 1 7 0 2 8 0 12 0 280 0 9 120 0 8/
22 33
44 @29
00 00 00 00
.
The input to the inverse calculation is the transform domain multiplication shown below. The rightmost vector was generated from the forward transform of the sequence { 1,0 2,O 3,O 4,O 0,O 0,O 0,O O,O}.
NUMBER THEORETIC TECHNIQUES IN DIGITAL SIGNAL PROCESSING
2 11 9 9 8 28 4 4 9 0 20 20
1 8 8\ 3 3 2222 15 15 2626 1918 1212
/lo
10 10
-
10
10 10 10
1 0
020 17 0 021 28 0 0 9 120
-
25 2 3 3 7 15 27 27 20 14 22 22
0 2 9
0 23 3 0 23 21 27 0 8 6 22 0 23 8
1 0 1 0 1 0 1 0 1 0 I O\ 170 021 280 0 9 120 0 8 280 12 0 1 0 17 0 280 12 0 120 020 280 0 8 170 0 9 10 2 8 0 1 0 28 0 10 28 0'" 170 0 8 280 020 120 021 280 1 7 0 1 0 1 2 0 280 17 0
4 4 10 10 20 20 = 8-' 25 25 24 24 16 16
@29
107
3 3 22 22 15 15 2626 18 18 12 12
13 13
2 11 9 9 8 28 4 4 9 0 2020
'
We can see that each element of the transform matrix, for both the forward and reverse transform, is of the simple form, but the zero generator element is exchanged between the first and second place in the 2-tuple for adjacent matrix elements. It appears as though we may need to multiplex the arithmetic hardware between the adjacent inner product partial product calculations in the transform matrix/vector multiplication for both the forward and reverse transforms. An interesting result occurs if we consider implementing the transform using a fast algorithm of the type discussed in Section 1II.A. Consider the decimation-in-time algorithm, as shown in the %point transform flow graph of Fig. 2. Only in the final stage do we require multiplications by odd powers of a; the other stages only use even powers of c1
108
GRAHAM A. JULLIEN
TABLE XA
TABLE XB
n
1
2
3
n
1
2
3
w
0 8
12
0 9
0
17
YI
20
0
0 21
0
and for these we only require a real multiplication. Thus the hardware only changes at the final stage. We will illustrate the operation of the fast algorithm on the previous convolution example using the 8-point DIT fast algorithm to compute the transforms. The forward transform uses the “twiddle factors” shown in Table Xa, and the inverse transform uses the “twiddle factors” in Table Xb; the 2-tuple is arranged vertically. The fast NTT (FNTT) is displayed below in columns of 2-tuples, within vertical lines, with bold italics indicating the result after multiplication. If no twiddle factor is present in the signal flow graph of Fig 2, a muliplier of unity is used for consistency, and the italics are normal face. Since this is a numerical equivalent of the flow graph of Fig. 2, the operation proceeds from left to right, rather than the reverse for the matrix multiplication notation. The forward transform is shown below with the standard input sequence {( l,l), (2,2), (3,3), (4,419 (O,O), (O,O), (O,O),(0,O)). 1
1
2 2 3 3
4 4 0 0 0 0
I 2 3 4
1 2 3 4
1 1 2 2 3 3
D O
1 1 2 2 3 3
0 0 0 0
0 0 0 0 0 0
Input
Multiply
4 4
4 4 First stage butterfly
I 2 3 4
1 2 3
4 1 2 2 7 7 ‘9 19 1
.ulultiply
4 6
4 6
27 27 8 21 23 12
27
27 8 21 23 12
Second stage butterfly
4 6 27 5 8 17 23 23
4 6 27 5 8 23 23 21
Multiply
10 27 3 22 25 20 7 10
10 27 3 22 2 14
15 2
Third stage butterfly
10 25 3 7 27 20 22 10
10 2 3 15
27 14
22 2
Shuffle
The output of the transform is multiplied with the transform of the convolving sequence: { 1,0 2,O 3,O 4,O 0,O 0,O 0,O O,O}.
NUMBER THEORETIC TECHNIQUES IN DIGITAL SIGNAL PROCESSING
10 10
25 3 7 27 20 22 10
2 3 15 27 14 22 2
10
X
0
8 23 3 0 23 21 27 0 8 6 22 0 23 8
*
13 2 9 8 4 9 20 1
109
13 11 9 28 4 0 20 10
The result of the multiplication is now used as the input to the inverse transform:
13 2 9 8 4 9 10 1
13
13 13
11
2 I1 9 9 8 28 4 4 9 0 20 20
9 28 4 0 20 10
Input
17 20 17 2 25 25 22 19
17 20 17 2 25 27 22 24
Second stage butterfly
I
10
Multiply
17
17
20 20 17 17
5 5 25 25 7 7 22 22 22 22 Multiply
11
17 17 11 I1
0 9 9 11 18 18
0 0 9 9 9 9 22 I 1 16 16 3 16
First stage butterfly
Multiply
8 26 22 12 3 ia 15 0
8 3 22 15 26 ia 12 0
17 I1 0 9 9 22 18 7
17
8 26 22 12 3 18 15 0
Third stage butterfly
8 3 22 15
26 18 12 0
’
110
GRAHAM A. JULLIEN
and the result is finally multiplied by 8-’
=
11.
8 3 22 15 26 18 12 0
D. Quadratic Residue Rings While we are on the subject of implementation simplicities, it will be useful to examine some work by Nussbaumer (1976a)on Fermat number transforms. Very often it is required to be able to convolve complex sequences, rather than the real sequences that have been the subject of our discussion so far. We have already seen that extension fields, GF(p2), where p = 4K + 3, can support finite field equivalents of arithmetic operations over the complex numbers. The only problem is that we have to perform these operations over the base field (the same way that complex arithmetic is computed over the reals), and this involves some unfortunate complexities with regard to the implementation of multiplication. Addition is component-wise, but multiplication involves a “cross-coupling” of the parts of the 2-tuple defining the field elements, including the requirement for four base field multiplications and two base field additions, as was discussed earlier in the definition of seconddegree extension fields. Nussbaumer discovered a simplification in this requirement based on what would initially appear to be a wrong way of doing things. He considered the case of Fermat primes, where the prime is of the form p = 4K + 1. This prime will not support a “complete” extension field, since - 1 is a quadratic residue; we can, however, form a ring of 2-tuples, W.!%(F,), where the ring operators take the form of complex addition and complex multiplication on the elements of the 2-tuples, but all base operations are carried out modulo F,. What Nussbaumer did was, effectively, to define an extension ring (he did not call it that) that is isomorphic to this complex ring, but where the operations of addition and multiplication are both component-wise. We will define this quadratic residue ring as 9W(F,) = { S : 0 ,@}, where the set S consists of elements that are 2-tuples, a = (aD,a*), a E S, and the operations 0 and 6 are both component-wise. We can map the results of this ring t o VB(F,)in the following way:
NUMBER THEORETIC TECHNIQUES IN DIGITAL SIGNAL PROCESSING
+ iai; a,,
1 11
ai E GF(F,), a E VR(F,);then we define the mapping Q* = a, OFt(- j OFt ai), where j = = 2'''. Clearly a', a* E GF(F,). The inverse mapping can be found to be a, = 2-I @ (ao a * ) and ai = 2-' j-' (u" 0 ( - a * ) ) (we have dropped the subscript, F,, for clarity). Let b = (b",b*),map to p = + ipi, apply pointwise addition and multiplication to a and b, and confirm the mapping to %?R(F,): Let a = a,
ao = a,
J-1
OFt( j OF* a i ) and
Addition: c =a @ b
A (a" O b", a* O b*)
c " = u @ @b " = a , O ( j O ~ i ) $ ~ , $ ( j O ~ i ) = ~ , O P r O j @ ( a i O P i )
c* = u* O b* = CY, 0 ( - j 0 ai)O /Ir 0 ( - j 0
/Ii)
6 p, 0 - j 0 (mi 6 BJ Now apply the inverse mapping to the result over WR(F,),y = a,
7, = 2-'
+ iyi:
0 (c" O c*) = 2-l @ (a, O P r O a, O 8)= a r O Pr
y, = 2-' 0 j - ' Q (c" O c * ) = 2-' 0 j - ' = ai
= y,,
0 [ j Q (ai 0 Pi O ai O pi)]
O pi.
This verifies the mapping for addition. Multiplication: c =a @
b
(ao 0 bo, a* @ b*)
cc' = a" 0 b" = [ar 0 ( j @ ail1 0
c* = a*
O b*
CPr
O(j0
Pi)]
= [(a,
o 8,) O -(ai
= [a,
0 ( - j 0 ail] 0 [P, 0 ( - j 0
= [(a,
O 8i)l O jC(a, @
Pi)
O (ai Q Pr)1
Pi)]
O P r ) 8 -(mi O Bill O -jC(a, 6 Bi) 0 (ai Q 8r)I
Now apply the inverse mapping to the result over WR(F,),y = y, 7, = 2-'
(c@@
+ iyi:
c*) = 2-' Q 2 0 [(a, 0 B r ) 6 -(ai 0 Bill
O P r ) O -(ai O Pill yi = 2-l 0 j - ' 6 (c" o c*) = 2-' O j - ' O j @ [(a, = C(ar
= [(a,
O Pi) O (ai @ P r ) 1
O Pi) O (ai 0 P r ) 1 .
This verifies the mapping for multiplication. For notational convenience we will refer to the u o term as the normal element and the a* term as the conjugate element.
112
GRAHAM A. JULLIEN
It might be useful to reinforce the idea behind this quadratic residue mapping with a simple scalar example, using GF(17). The parameters to be used are j = 22 = 4; j - ' = 13; 2-' = 9. Let c1 = (13,5), = (9,ll). Computing over the ring, W9(17), yields: a @ /3 = (13 @ 9,5 0 11) = ($16)
and a 6 / 3 = [ ( 1 3 @ 9 ) 0 - ( 5 @ 11),(13@ 1 1 ) @ ( 5 @ 9 ) 1 = ( 1 4 1 ) . Now let us compare performing this calculation using the Q R mapping: > 11 = (13 @ (4 @
a = (13,5)
8=(9,11)
5), 13 @ -(4 @ 5)) = (16,lO)
9R(17)
> b = ( 9 @ ( 4 @ 11),9@ - ( 4 @ 11))=(2,16).
Now we compute the addition and multiplication over 15(17), component-wise: a
0 b =(16 0 2, 10 Q3 16) = (1,9);
-
a 8 6 = (16 8 2,lO 0 16) = (15,7)
Finally we map back to g9(17): (1,9) (1537)
Wi(17)
[9 @ (1 @ 9), 9 @ 13 @ ( 1 Q3 -9)] = (5,16)
WB(17)
7
[9 @ (15 @ 7), 9 @ 13 @ (15 8 -7)] = (11, l),
and we see that we obtain the same result as the direct complex calculation over W?( 17). The method seems involved, in that there are three distinct steps involved. Note, however, that if the middle step (the component-wise calculation) contains many multiplications and additions, the overhead associated with steps 1 and 3 can be relatively small. We therefore concentrate on the features of step 2, bearing this assumption in mind. There are two main features to the calculation over SW(F,):
( 1 ) The multiplication operation requires two base field multiplications, compared to four base field multiplications and two base field additions for the calculation %B(&). (2) The calculations can be carried out without any interaction between the normal and conjugate channels. The first feature has advantages in minimizing implementation hardware; the second feature provides advantages in testing and fault tolerance of a complex sequence processor. A more comprehensive example is probably in order at this point. We consider the complex convolution of two 4-point sequences over GF(17).The sequences and their convolution sum are shown in equation (16) below, where
NUMBER THEORETIC TECHNIQUES IN DIGITAL SIGNAL PROCESSING
113
the symbol, @, represents the cyclic convolution operator: 1 3
0 0
We now consider the computation of this convolution using the FNTT defined over GF(17). We can compute this by transforming each real and imaginary sequence of both inputs separately. We then multiply these transformed sequences, using the rules of complex arithmetic, and invert the real and imaginary sequences of this result separately. The inverted two sequences will be the real and imaginary components of the convolution sum (Nussbaumer, 1976a). The transform of the real component of the first input sequence is shown in Eq. ( I 7); 01,represents matrix multiplication over GF(17).
The transformed sequences and their multiplication, using the rules of complex arithmetic, are shown in Eq. (18):
2
2
5
5
We now invert the real and imaginary parts of the result as separate sequences. The inverse transform for the real component is shown in Eq. (19):
The final sequence is the combination of the inverse of the real and imaginary components of the transform domain multiplication (shown in
114
GRAHAM A. JULLIEN
Eq. (20)with the final multiplication by 4-’ = 13):
The final result agrees with the original convolution sum in Eq. (16). Now we have to compute the convolution using the mapping to 1B(17). First we perform the mapping, as shown below:
Now we compute the convolution for the normal and conjugate sequences separately. The computation for the normal sequence is shown below.
The computation for the conjugate sequence is shown below.
NUMBER THEORETIC TECHNIQUES IN DIGITAL SIGNAL PROCESSING 1 15
Now all that remains for us to do is to compute the inverse mapping from the quadratic residue ring:
The result matches that obtained directly and using the complex ring transform method. Note the complete independence of the normal and conjugate calculations once the mapping to the quadratic ring is performed. We are able to compute the forward transforms, transform domain multiplication, and inverse transform without invoking any interaction between normal and conjugate channels. Although the quadratic residue ring has been built from Galois fields based on Fermat numbers, the only requirement for such a ring to exist is that the base field have a prime modulus of the form p = 4K + 1, since we have already discovered that such a field contains a quadratic residue for - 1. The ramifications of computations over quadratic rings go beyond the simplification of convolution via Fermat NTTs, and seems that Nussbaumer did not realize the importance of this finding at the time. We will return to quadratic rings later. E . Multidimensional Mapping Another technique used to remove the tight coupling of transform length with the algebraic properties of the number of elements in the field is that of multidimensional mapping. This was first pointed out by Rader (1972b) and expanded upon by Agarwal and Burrus (1974).The basic concept is that a onedimensional convolution can always be implemented by rewriting the onedimensional input sequence as a multidimensional sequence. The convolution can then be indirectly computed via multidimensional transforms that, in turn, can be computed as a series of short one-dimensional transforms. The final step is a mapping back to the one-dimensional output sequence. As an example for a two-dimensional mapping of an original one-dimensional sequence, consider the cyclic convolution of Eq. (2 1): N-1 Yn
=
C
xqh[neN(-ql]*
q=o
(21)
We assume that we can write N = LM, where L and M are integers. A change of variables can now be made: n=l+mL q=k+pL
k,1=0,1,
..., L - 1 ;
p.m=0,1, ..., M - 1
116
GRAHAM A. JULLIEN
and the convolution now becomes
where we have dropped the subscript on the modulo N addition operator. Let us now define two-dimensional arrays for y, x, and h. We will keep the same notation as used by Agarwal and Burrus (1974). Thus
and the convolution can be written as Eq. (24):
This is a two-dimensional cyclic convolution, and we can compute it indirectly using two-dimensional NTTs. Two-dimensional NTTs can be calculated using one-dimensional NTTs along the rows and then along the columns of the intermediate results. Clearly two-dimensional convolution is a sort of overlay of column-wise, followed by row-wise (or vice-versa) one-dimensional cyclic convolution. If we examine the decomposition of the original one-dimensional sequence, we find that increasing values of the rn-index (row index) defines a sampling of the original signal by a reduction factor of L and thus preserves the cyclic nature of the sequence (this new sequence has period M rather than N).Increasing values of the /-index (column index) are contiguous samples of only a segment of the original sequence. Thus, although cyclic convolution will work for the rows, it will not work for the columns, since this sequence is not a periodic subsampling of the original signal. We must therefore compute aperiodic convolution along the columns, and this means invoking one of the two techniques, overlap-add or overlap-save (Gold and Rader, 1969;chapter by T. Stockham, Jr,), available for computing aperiodic convolution from cyclic convolution. Another way of looking at the problem is to consider that although the indices of the 2-D sequences are computed over finite rings, R(L) and R(M), the formation of these rings from the original index sequence was over R(N). The overlap-save technique involves appending at least ( L- 1) zero samples to the original column sequences of the 2 array; the fi array is augmented by the periodic extension of the original { h ) sequence, as indicated in the index mapping of Eq. (23). The final result will have L correct values and L- 1 incorrect values per column (Agarwal and Burrus, 1974). Normally, in order to compute fast convolution, we will require to append L zeros to the 2 columns (rather than L - 1 zeros), requiring a total two-dimensional array of 2L x M . Two of the rows of the final result will be found to be dependent (except for a cyclic shift) because of this redundancy of one extra row added to the 2-D arrays.
NUMBER THEORETIC TECHNIQUES IN DIGITAL SIGNAL PROCESSING
1 17
The 2-D NTT is defined in equation 25, where the Nth-order generator (a) has been replaced by an Mth order generator (aL)and an 2Lth order generator (u'12):
Consider taking 1-D transforms along the columns ( p index) of the input,
The index [ k ] corresponds to the column of the 1-D transform result. We can now compute the 2-D NTT by taking 1-D transforms along the rows (k index) of this modified intermediate result:
we can form the 2-D cyclic By computing 2-D transforms for fl,,and gl,m, convolution by multiplication in the transform domain, followed by inverse transformation:
It now remains to unscramble the resulting 2-D sequence back into a 1-D sequence; this result is the required 1-D cyclic convolution. The entire process appears quite involved, but it allows the use of small length transforms to implement much longer length convolutions. As before, we will take an example to illustrate the procedure, and as before we will consider the cyclic convolution of the sequence { I , 2,3,4,0,0,0,0>with itself. In order to demonstrate the ability of this technique to allow longer length convolutions than possible with a direct procedure, we will consider the convolution over GF(29). We have already determined that we can only achieve a direct one-dimensional length 8 transform using a second-degree extension field, GF(29'). We will decompose the length N = 8 = 4 x 2; if we let L = 2 and M = 4, then we can compute the convolution using 4 x 4 arrays, with a series of length 4 one-dimensional convolutions. The parameters for the does not exist in GF(29)), and example are aI, = aM/' = a 2 = 12(a = ( 2 N - ' ) = 16-' = 20. The sequence {1,2,3,4,0,0,0,0} is mapped into the H and 2 array as below:
Ji?
0 H=[ 0 0 0
-
3 4 0 0
1 2 3 4
0 0 1 2
3 4 0 0
z=[
1 2 0 0
0 0 0 0
0 0 0 ' 0
118
GRAHAM A. JULLIEN
Note the cyclic extension of the fi array and the two rows of zeros appended to the 2 array. The next step is to compute the two-dimensional NTT of each array. We first show the intermediate step after taking one-dimensional NTTs of each column in the two arrays:
1 ‘1 0
(fiL1) =
7 1 0
3
(.I 7
22 0 28 27 28 0 13 22 23
-Ikl
IXrn’-
-
3 0 0
25 28 28 0 0 13 6 0 0
*
This is followed by one-dimensional NTTs of the rows:
10 14 4 0 18 3 26 12 27 16 0 11 29 27 7 28
2 0 9 0 0 0 10 6 13 2 5 2 0 2 0 3 15 11
The next step is to form the product, fi @ 2 9 I
126 10
k@,,&(
O8
\
’
2,and invert the result: A
0
O\
3 l1 0 211) 2’
0 23 18 181
:I=[ 1
and the inverse transform is computed by 1-D transforms on the columns and rows with a final multiplication by 16-’. The final inverse transform with multiplication is 6 11 4 28 f = 1 6 - 1 Q 2 9 [16 1 15 23 24
4 17 22 01 10 4 20 25 24 16
4 20 24
‘
0
We now unravel the bottom two rows of the result array to find the output sequence: { 1,4,10,20,25,24,16, O}. F. Extension of Dynamic Range
An inverse problem to that discussed above is the extension of dynamic range for a given transform length. This seems trivial, in that we can probably find a large transform length for a large dynamic range and simply reduce the
NUMBER THEORETIC TECHNIQUES IN DIGITAL SIGNAL PROCESSING
119
TABLE XI PRIMES (10
B i n OR
LESS)AND
TRANSFORM LENGTHS
P 91 193 251 449
2'
P
29
32 64
557
64
641
256 64
673 169
128 32 256
transform length by using powers of the multiplicative group generator. That, however, is not the exercise we wish to perform here. Suppose that we find small (10 bits or less) primes that have desirable properties for the construction of fields in which reasonable power-of-two transform lengths are possible, and our goal is to produce such a length transform. For example, primes, p , which have the property 241 p - 1, will allow transform lengths of 24. Table XI shows some small primes and their associated transform length, 24. The transform lengths shown are quite respectable in terms of usefulness in DSP, but the dynamic range afforded by the size of the prime is probably inadequate. Can we stick several of these transforms together to form a larger dynamic range? The answer is a resounding yes, and we show, in the following sections, that this approach has much larger ramifications than one would initially imagine.
G . Binary Implementations The interesting fact is that, aside from the flurry of academic and general theoretical interest, all of these efforts have lead to very little in the way of special-purpose hardware being built to implement these transforms. It appears that most of the implementations are on general-purpose computers (software implementations) with dubious usefulness. The only hardware approach, using a single base field, has been discussed by McClellan (1976) using a Fermat number transform. A special coding scheme was used to implement the modulo F, computation. An interesting modification to binary arithmetic was also proposed by Leibowitz (1976). Both of these approaches have important ramifications for the efficient binary implementation of Fermat number transform operations. It is appropriate to discuss these techniques briefly here, for later contrast with the nonbinary implementations to be discussed later.
120
GRAHAM A. JULLIEN
I . McCellan Approach
The representation of Fermat numbers in the binary representation is only 50%efficient, since only one of the field* elements requires the most significant bit to be one. In the obvious mapping, this element is the largest in the field. The coding scheme introduced by McCellan maps this isolated case to zero. The full mapping is described below. The representation of an element, B, in GF(F,) is by t + 1 bits { 6,, b, ,....,6,} with the following mapping:
-,
(1)
If 6, = 1, then E
= 0. f
( 2 ) If b, = 0, then E =
or-i2r-i, where oj= i= 1
1
-1
if bj = 1. if bj = 0.
Using the example given by McCellan for F4 1000 0
represents zero;
0 1 0 10
represents 23 - 22 + 2 - 1 = 5 ;
00011
represents - 2 3 - 2 2 + 2 + 1 =-9(=8).
Over GF(F,) this representation is valid for all elements. The representation of zero as a special case (6, = 1) allows simple hardware for arithmetic with zero as one of the elements. General addition involves an ordinary binary addition followed by an adjustment based on the state of the output carry. This is the same complexity as 1’s complement binary arithmetic. Multiplication by powers of 2 (required in the formation of the forward or inverse FNT) is a simple cyclic shift with the addition of a logical inversion as bits are fed back into the least significant position. We see that ordinary binary arithmetic elements can be used, with slight modifications to the circuitry. General multiplication, of course, retains the same complexity that it has for binary arithmetic. We note that all general arithmetic operations are performed with only t bits rather than the t + 1 bits of the full representation. McCellan’s hardware only computed the transform and not the transform domain multiplication.
2. Leibowitz Approach Leibowitz introduced a modification that reduced the complexity of the arithmetic hardware. This diminished-I representation involves a simple translation of 1 bit from the normal binary representation. This contrasts to
* We will assume that the Fermat number chosen is prime.
NUMBER THEORETIC TECHNIQUES IN DIGITAL SIGNAL PROCESSING
121
McClellan’s use of symmetrical weightings { - 1, l}. The diminished-1 representation simply adds one to the binary representation; thus (0, 1,2,. , . .,2‘) becomes { 1,2,3,.. . ,O). Note that this mapping places 0 as the only element that requires 6, = 1 , and so corresponds with McClellan’s mapping of that element. The other elements are, in general, mapped to different positions than McClellan’s mapping. Leibowi tz demonstrates that simpler implementations, than McClellan’s mapping are possible, and also that general multiplication can be carried out as a simple modification of a binary multiplication. We see that general multiplication is more complicated than binary multiplication (where the majority of computational complexity is centered) and that these representations only work for fields based on Fermat primes. We now introduce an alternative to the single modulus NTT that has been used to great effect in the implementation of convolution hardware, and that opens the door to some very interesting VLSI implementations; this alternative is the residue number system (RNS). Rather than finding special large moduli that allow modified binary arithmetic hardware to be used, we restrict the modulus to be small enough that look-up tables can be used to implement the arithmetic (these look-up tables are basically unminimized truth tables). We are no longer restricted as to the form of the modulus,just the size for practical hardware solutions. We can remove the modulus size problem by several parallel computations over a direct sum ring. We start by disgressing, somewhat, into the theory of the RNS; this knowledge is then applied to the implementation of flexible NTTs.
VI. RESIDUENUMBER SYSTEMS In the first century A.D. the Chinese scholar Sun-Tsu wrote an obscure verse (Fig. 3) that described a rule called r’ai-yen (great generalization) to determine a number having the remainders 2, 3, and 2 when divided by the numbers 3, 5, and 7. Although there is considerable uncertainty about the exact origin of this mathematical concept, as well as about who first discovered it and when, it is clear that the background to number theory was forming quite early in the dawn of modern recorded history. When the secret of the general technique to determine numbers based on residues was discovered, it became known as the Chinese Remainder Theorem in honor of its ancient Chinese origins. Residue arithmetic is based on the ability to perform exact integer computations by the manipulation of residues of the numbers in the computations (Szabo and Tanaka, 1967). It was first examined in the 1950s as a possible computational medium for the vacuum tube computers of the day (Svoboda
FIG.3. Original verse for the Chinese Remainder Theorem.
and Valach, 1955). The natural fault-tolerant nature of the computational structure was probably a great attraction, considering the unreliability of the contemporary computer systems. Undoubtedly the appearance of the more reliable transistor-based computers, at the turn of the decade, pushed residue arithmetic (also known as the residue number system) into the background. There was some activity around the beginning of the 1960s (Svoboda 1958; Garner, 1959; Baugh and Day, 1961;Cheney, 1961; Tanaka, 1962; which continued for a small number of researchers, but it was not until towards the end of the following decade that the digital hardware (in particular large read-only memories) was available to realize appropriately the logical manipulations required to implement the arithmetic (Jenkins and Leon, 1977; Soderstrand, 1977; Jullien, 1978). We spend the rest of this section on the general principles of residue arithmetic; this is then used, in the following section, to implement number theoretic transforms using look-up table techniques. A . Algebraic Structure of General Residue Systems A general class of modular systems is constructed as a direct sum of several simple modular structures (either fields or rings) that have moduli that are
123
NUMBER THEORETIC TECHNIQUES IN DIGITAL SIGNAL PROCESSING
pairwise prime integers (no two have a common factor greater than one). The system is itself a ring:
I=
1
,.at = (ml, m2, ..., mL} is the set of moduli. The integer ring, R ( M ) , is isomorphic to the direct sum of the L subrings R(m,),i = 1,. . . , L:
z.. I R(m,).
(28) The direct sum representation of R ( M ) , as denoted by the right side of Eq. (28), is typically called a nonredundant residue number system (RNS). The interval [0, M-11 is called the legitimate range of the RNS because it represents the useful computational range of the number system, The ring, R ( M ) ,has the additive identity 0, and we use this to define the additive inverse of a number X , written ( - X ) , as
xO M (-X)
= 0.
(29)
Using ordinary arithmetic we can interpret Eq. (29) as
x + (-
X )=M.
(30)
(-X)
M
(31)
This yields =
-X.
If M is odd, the dynamic range of an RNS becomes [ - ( M - 1)/2, ( M - 1)/2]; if M is even, it is [ - M/2, (M/2 - l)]. Each natural integer X in the dynamic range is mapped onto the legitimate range and represented as an L-tuple of residue digits (xl,x 2 , . . ,x L ) , where x, = X mod mi for X in the positive half of the dynamic range, and xi = m i- Xmodmi for X in the negative half of the dynamic range. Note that if M is odd, [ - M / 2 , - 13 maps onto [ ( M + 1)/2, M - 11; if M is even, [-M/2, 11 maps onto [M/2, M - 11. Hence negative numbers map onto the upper half of the legitimate range through additive inverse encoding. Using some more-formal notation, we can define residue arithmetic by A
O B o b l O m , b i , a 2 Om2b2r....,aL-I
Om2bL-l)=(c1,C2
,....,CL-l)
(32) A
O B o ( a 1 O m , b1,az O m , b 2 , . . . . r a L - l
O m 2 b L - 1 ) =(dl,dz,-...,d,-I}), (33)
with A, B E R ( M ) ;a,, b k E R(m,). Note the use of the implied operators, 0 and @. If the computation were carried out explicitly over the ring R ( M ) , we would be required to perform operations O Mand O w .Since ciror d,, are determined entirely from x iand y,,
124
GRAHAM A. JULLIEN
RNS arithmetic is carry-free in the sense that there is no propagation of information from the ith channel to the j t h channel, i # j. As an example, consider the RNS defined by mi = 7, m, = 9, m3 = 11, m4 = 13. For this case, M = 9009, the legitimate range is [0,9008], and the dynamic range is [ -4504,45041. A positive number X = 300 is encoded as (6,3,3, l), whereas Y = -2 is encoded as (5,7,9,11). Then
X
6Y
o (4,1,1,12) z 298 and X
0 Y + (2,3,5,11) z (600).
Note that the signed numbers are easily manipulated by exactly the same rules as positive numbers after the initial complement encoding is done. However, it is rather difficult to determine that (5,7,9,11) represents a negative number because there is no explicit sign bit in the code. Therefore signed number arithmetic is easy to realize in RNS codes, but sign testing to control datadependent decisions is rather difficult. B. The Chinese Remainder Theorem As stated previously, the isomorphic mapping between R(mk) and R ( M ) is known as the Chinese Remainder Theorem (CRT) in honor of its ancient Chinese origins. The CRT is given by
with f i k = M/mk, x E R ( M ) ,X k E R(mk). is the multiplicative inverse operator, mod mk. Since (62,,mk)= 1, the inverse, [(hi&'], exists. An alternative expression uses the concept of metric vectors (A. Svoboda, 1957) and leads to an easy explanation of the isomorphic mapping function. The mapping is
We can explain the mapping procedure of Eq. (35) by replicating all of the R ( M ) operations within the individuai computational rings, {R(mk)};i.e., we will consider the operations 0 and @ rather than O M and BM.This is embodied in the following theorem. Theorem 5. Equation (35) is a mapping function for the isomorphism R(M)2 @CRbk). Proof: The function i i k maps to the representation (0, 0,. ..,p k , 0,. ..,O). The L - 1 zero residues correspond to the fact that kk is divisible by all moduli
NUMBER THEORETIC TECHNIQUES IN DIGITAL SIGNAL PROCESSING
125
except m k . The term [(riQ;'] maps to (O,O,. . . ,[ p ; ' ] , O , . . . ,O), and so the product hik BM[(hi&'] has the representation (O,O,. . ., I , 0,. . .,O); this is called the kth unit metric vector. Although the ring R(mk)has elements that have no multiplicative inverse, we can guarantee that the element pk has an has an inverse. It is now clear that {hi, O M[(fik); '1 @M xk) inverse, since ik has the representation (O,O,. . .,xk,0,....,O), the kth metric uector of X , and so the modulo M summation is, in fact, a summation of orthogonal metric vectors; this yields the representation for X = (x,,x 2 , .. ..,xL Since x k E R(mk),it is clear that Eq. (35) will yield the same result as Eq. (34). Both forms of the CRT are encountered in the literature, although the form given by Eq. (34) is probably the most frequently used in number theory literature. As far as implementation is concerned, the form in Eq. (34) requires an extra mod mioperation to be computed in the inner loop. However, the unit metric vectors {6ikO M[ ( i k ) ; 'I} implied in Eq. (35) are large integers, and hence the arithmetic in the second form requires multiplication by rather large numbers, a feature that is undesirable from the hardware point of view. For a fixed moduli set, M,the unit metric vectors are constants; this can be used to advantage in mapping hardware. The mapping in Eq. (35)can also lead to a rather efficient scaling strategy (Jullien, 1978). As a purely historical exercise, let us use the modern version of the CRT to explain the ancient script found in Fig. 3. We first need to give a full translation: 1. The Ancient Verse
The verse uses an example to demonstrate how the CRT can be used to recover a number from its residues. If we read the verse carefully (see the translation in Fig. 4), we see that it is describing a residue number system with a moduli set 13,571. The total number of things that we can represent with this set is 3.5.7 = 105. We find the metric vectors to be
3 @ (j)-'= 35 @ 2 = 70;
s^ @ (s)-' = 21 @
'I @ (7j-l = 15 @
1 = 21;
1 = 15.
The two examples are now clear: (1,1,1)- 70 0 21 0 15 = 1 . (2,3,2)- 140 0 63 0 30 = 23; Note that the second example is a cunning way of exposing the metric vectors, from which any 3-residue representation with this moduli set can be mapped. Was this a mathematical novelty, or used for a purpose? We can imagine that it might have been used to count soldiers, for example. Simply have them group first in threes, then in fives, and finally in sevens, and count the residue (the last row) in each case.
126
GRAHAM A. JULLIEN
The
computation
. )..... .......>.: :.. ..:. ...... .. :.... :. .... is"': as :follows. 2so
140.
We have:.. gome things of which we
3so
63. is 2 30.
FIG.4. The translation.
This would certainly have impressed observers; how can somebody count up to 105 soIdiers by only observing the last row? Of course, the soldiers would have been doing a lot of the work by regrouping three times. For the binary aficionados: I d o not think a binary counting system was described this early in history!
C . T h e Associated Mixed Radix Number System It is possible to translate numbers from a residue representation to a mixed radix representation, which is a weighted representation that facilitates sign detection and magnitude comparisons. If the moduli of the RNS are chosen to
NUMBER THEORETIC TECHNIQUES IN DIGITAL SIGNAL PROCESSING
127
be the weights in the mixed radix representation, the mixed radix system is said to be associated with the RNS, and the translation operation is simplified. More specifically, a number X,which falls into the total range of an RNS with moduli { m , , . .., m L ) can be represented by L mixed radix digits ( a L , .. , a l ) defined by L
The importance of an associated mixed radix representation cannot be underestimated in RNS theory because the mixed radix digits a L , ..., a , , a , provide a weighted representation of the residue number X that is quite easy to generate. From Eq. (36) it can be seen that a , = x,, directly. The higherorder digits can be generated by the following recursive relationship: aI(l’=X;a,=
X 1 ; W
= ( @ ( n - l ’ - a n - , )@mn(m,-,);l;
a, = 0;’= p;qmn.
(37)
The most important point to note is that all of the operations can be carried out over the individual rings, so that conversion, magnitude comparison, and sign detection can be performed without resorting to large dynamic range connected adders (Jullien, 1978). D. Overjiow Detection, Scaling, and Base Extension
Overflow detection, scaling, and base extension are RNS operations that, although not as difficult as general division, are considerably more difficult to implement than addition, subtraction, and multiplication. In all threecases the mixed radix converter can form the basis of the operation, since a mixed radix representation is required as an intermediate step in the procedure. 1. Overflow Detection
In order to determine if overflow has occurred, it is necessary to provide additional dynamic range in the RNS, and then to test the result of a computation to see if it has overflowed into the “extra” range. In general, only the final results of a computation must be tested in this way, since overflow has no meaning within the residue algebra itself; it has meaning only when the ring is mapped onto an interval of real numbers when decoding. The extra range needed for this purpose is provided by adding a redundant modulus whose only purpose is overflow detection. A necessary and sufficient condition to check for overflow with one redundant modulus is that the redundant modulus be the largest modulus; i.e., if ML+ is the redundant modulus, then
,
128
GRAHAM A. JULLIEN
the required condition is mL+l > max{mj}, j = 1, ...,L. The occurrence of overflow is then detected if uL+ # 0, where aL+ is the highest-order mixed radix digit of the redundant RNS representation of X.This assumes that the quantity being tested, which has possibly overflowed the original RNS range, is not so large as to overflow the augmented range of the redundant system. This also illustrates that overflow detection requires a mixed radix converter designed to accommodate the augmented residue representation needed for redundancy. It can be seen from this discussion that overflow detection and mixed radix conversion are similar in complexity and are both considerably more complicated than RNS addition and multiplication. It is fortunate that overflow detection is a relatively infrequent operation in many signal processing problems, in contrast to much more frequently required addition and multiplication.
2. Scaling In signal processing, scaling is a special form of division in which one of the operands, I,, is a fixed scale factor that allows the implementation to be greatly simplified in comparison to a general division. In weighted binary systems, scaling is often accomplished by simple right or left shifts, corresponding to scaling by positive (right shift) or negative (left shift) powers of 2. In residue systems, scaling by a factor that is composed of the product of several of the moduli is relatively easy. In philosophical terms, the presence of scaling is normally associated with inadequacies in the arithmetic hardware and not with a requirement of the algorithm. As such, we can conclude that, in a great many algorithms, the use of scaling is to be kept to a minimum, allowing a reasonable number of closed operations of addition and multiplication to be performed for each scaling operation. Coupling this with simplified scaling procedures does not make scaling the onerous operation that some architecture experts claim. Since the limitation on scale factors to products of the RNS moduli is usually acceptable in most practical situations, the following discussion concentrates on implementing the operation where I, = l / m i and QC.1 denotes the quantization required to produce an integer result. First note that x,, = (mi);' mm,x k , i # k, if X is an exact multiple of mi. This condition can be forced on the original number by subtracting the residue, mod mi. This subtraction is the essence of the quantization function Q. After the subtraction of x1 and multiplication by (mi);', the residue representation of Q[m; 'XI is automatically available, although one digit is missing: namely, the first one, xsI = (X,) mod ml.
NUMBER THEORETIC TECHNIQUES IN DIGITAL SIGNAL PROCESSING
129
This procedure can be repeated for other moduli that may be in the scale factor product. For each modulus in the scaling factor, the residue of the scaled number for that modulus will not be computed (this is caused by the nonexistence of the inverse (mi);' for each mi in the scaling factor). The scaled number is now represented by a reduced RNS (reduced by the moduli in the scaling factor). This is perfectly acceptable as a number representation because the reduced dynamic range of the reduced RNS is still capable of representing the scaled number X,. If we write down this general scaling procedure (where the scaling factor is generally defined to be: mi) as a recursive relationship, we find
We notice that Eq. (38) is of identical form to Eq. (37), and this allows us to build a complete scaling and conversion structure in the same array. This is readily shown using the following example. a. A Scaling Example Define an RNS system with a moduli set { 3,4,5,7}, where M = 420, and scale and convert the number 367 by 12, which is the product of the first two moduli. We will add in the usual requirement of rounding, rather than truncation, scaling and so will expect an answer of 31. Because of the modulus set reduction during scaling, we will only obtain this result as a two-digit mixed radix number (6, l), since 31 = 6 x 5 + 1. As a starting point we will add one-half the scaling factor (6) to the original number (367 + 6 = 373) to facilitate the rounding process (Jullien, 1978). The calculations are shown in Table XII. Note that because of the scaling prior to mixed radix conversion (MRC), the indices of the MRC recursive formulation are changed. Invalid results are indicated by dashes. If we require to use the original RNS (this will certainly be the case if we are to continue computations within the RNS), we then have to regenerate the missing residues. This operation is called base extension and is discussed in the following subsection. We note that base extension allows us to discover the residue of an existing RNS representation in any new ring R(m);in most cases, however, we extend to the same RNS that was in use prior to the scaling operation.
3. Base Extension In order to perform base extension, we have to determine the exact magnitude of the number. Since this is not available using the RNS representation, we require to perform a conversion to a weighted magnitude form. Since we have used the initial section of the mixed radix tree to perform
130
GRAHAM A. JULLIEN TABLE XI1 SCALING AND MRC FOR
[E] = (6,l)
scaling, it is clear that the remaining part of the tree can be used for magnitude determination. Assuming that the mixed radix digits have been generated for the scaled number, Xs,we note that { x s i } :i E [l,S] can be determined by computing the mixed radix expression of Eq. (39) over R(mi):
The base extension for the example of Table XI1 is shown in Table XIII. The calculation produces 1311, = 1 and (311, = 3, which is correct. If we carry the moduli produced in Table VII to the output of the entire scaling, MRC, and TABLE XI11 BASEEXTENSION FOR 1 3
I mi
In
mjlm,
@mt
as+ 1 @*,
2 0 0 1 1
THE
2 4
1
EXAMPLE IN TABLE XI1 3 5 1
4 7
6
xs
2 2 1 3
Xsi
NUMBER THEORETIC TECHNIQUES IN DIGITAL SIGNAL PROCESSING
131
base extension process, we obtain the mapping X, ( I , 3,1,3), which is the correct representation of X, = 31. There are many other approaches to RNS overflow detection, scaling, and base extension that can be found described in the literature (Soderstrand et ul., 1986), but that are beyond the scope of this discussion. Many of these alternate techniques are based on the modifications of the Chinese Remainder Theorem that first obtain a weighted number representation (analogous to the mixed radix conversion process) and then perform various tasks to produce the desired result. Another type of approach is based on the concept called “autoscale,” which was first introduced by Taylor and Ramnarayanan (198 1) for residue-to-decimal conversion, and controlled-precision multiplication (Taylor and Huang, 1982) for modular systems that are characterized with relatively few large moduli (typically three 8-bit moduli). However, when the designer wishes to keep the moduli small in number and to use quite a few of them, the mixed radix conversion technique described above is probably best suited for VLSI implementation. Also, since all of the computations required are operations in RNS arithmetic, it is possible to detect and correct errors that occur within these circuits themselves (Jenkins and Altman, 1988), a feature that is not so directly provided in other approaches. Now that we have the basic structure of the RNS, we can turn our attention back to the subject of convolution via NTTs.
VII. IMPLEMENTATIONOF NTTs USINGTHE RNS Although the RNS can be used to implement standard integer arithmetic, the operations are being performed over individual finite rings or fields. Rather than emulate an integer arithmetic calculation, we can use these finite rings/ fields to compute complete NTTs. The fact that there is no calculation overflow during the computation means that the problem areas of RNS (scaling, etc.) are not encountered until we require to evaluate the results of the NTT convolution (after many stages of computation). We can implement each NTT over a relatively small ring/field and simply add enough parallel ring/field computations to provide the dynamic range required. We have now introduced a separation of algebraic versus dynamic range constraints with this a new approach to the implementation problem. The concept is presented in Fig. 5, where the input sequence is coded into residues for several NTT parallel computations (four parallel computations are shown in the figure), and the final convolution output decoded back to a conventional binary output sequence. There is some overhead associated with the RNS approach: the input
132
GRAHAM A. JULLIEN
Fic. 5. RNS implementation of an NTT convolver.
coding and the output decoding steps. Providing the computation within the parallel RNS hardware is extensive, this overhead can be regarded as small. In order for the parallel computations of the NTT to be valid, each computational ring/field has to have appropriate algebraic properties. These are the properties required in order to compute an NTT, and each computational ring has to have the same subset over which the final NTT is to be computed, namely that each ring/field has to support a transform of length N. Some of the rings/fields may, of course, support larger transform lengths. If we can find enough rings/fields to support our required transform length, and if their individual moduli are relatively small, then we can relax our requirement for special binary-like moduli (which are required for single ring/field computations in order to simplify the hardware). Our aim is to remove the hardware issue for both addition and multiplication, so that we are not required to restrict the form of the multiplicative group generator and so that transform domain multiplication is not difficult. The latter point is very important, since transform domain multiplication can totally dominate an otherwise efficient transform procedure. It is interesting that of the many publications on NTT indirect convolution, relatively few deal with the issue of general multiplication modulo the ring/field modulus. Since the NTT only has applications in convolution, the transform domain multiplication step is as important as performing the forward and inverse transforms!
NUMBER THEORETIC TECHNIQUES IN DIGlTAL SIGNAL PROCESSING
133
A . Multiplication Using Index Calculus If we restrict the computational system to fields, then we can simplify multiplication using index calculus. This technique, as applied to NTT computation, has been explored by several workers (Jenkins, 1975; Baraniecka and Jullien, 1980;Jullien, 1980;Nagpal et al., 1983).Index calculus has already been introduced; the simplification of multiplication arises from the ability to map multiplication to addition. The mapping is gn @ ,, (Jj 0 p'k"8n1, I 'J) (39) This can be coupled with efficient methods of performing modulo addition. One implementation is embodied by the following three-step procedure using only look-up tables (Jullien, 1980): (1) Find the index, ki,for each number. (2) Add indices (modulo mi - 1). (3) Perform the inverse index operation. There is an immediate simplification that can be made to this three-step procedure. Since, in computing the NTT, every multiplier is known a priori, the index mapping can be precomputed, and so step 1 is appropriately simplified. It is now possible to perform addition using look-up tables, providing that the prime modulus, m,, is not too large (so that the tables can be kept to a reasonable size). For larger prime moduli, a different technique is used. The addition is performed using a two-modulus RNS so that the tables in the residue calculations can be reduced to reasonable sizes. This seems a strange statement to make based on the fact that the addition is over an additive group with a modulus that may not support a suitable RNS decomposition. Since addition can only overflow the modulus at the most once, however, then we can compute over a composite modulus m(') > 2mi - 2, and correct for any overflow as a final step. m"' will be selected based on a suitable su6modular decomposition, d ) = m f h ! ' . The inverse mapping step can include the overflow correction without increasing the size of the truth table. Note here that we are using an inner residue system to perform relatively large Galois field multiplication, the complete computation taking place over several Galois fields to form an outer residue system. With this technique we are effectively reusing the inner computation submoduli for each of the outer computation moduli. 1. An Inner Residue System Example As an example we will take m"' = 42 = 6 x 7, therefore m(," = 6 and = 7. With this composite modulus we will be able to use primes p < 44/2 = 22. The largest prime that satisfies this condition is 19. Rather than use this as an example, let us repeat the Fermat NTT example from an earlier
my'
134
GRAHAM A. JULLIEN
section (we will use the fast implementation) over GF(17), but this time using index calculus computed over submoduli. We will use the example of convolving the sequence { 1,2,3,4,0,0,0,0}with itself; the result is { 1,4,10,3,8, 7,16,0} over GF(17). The forward and inverse transforms are given by the matrix multiplications discussed earlier: I 1 1 11 - 1 15 1 13 1 7 1 151 \1
l l 9 1 13 1 5 16 8 4
1 3 1 16 4 1 13 16 2 4
1 1 1 1 1 l\ 2 5 1 6 8 4 2 3 4 1 13 16 4 4 9 1 6 2 1 3 8 16 1 16 1 16 @I7 0 0 2 16 9 4 15 0 13 1 4 16 13 8 16 15 13 91 \O,
1 16 2 6 2 - 11 4 15 16 13 15 7
16 6 11 15
817
13 7
8'
15 12 7
1 1 1 1 13 1 5 1 9
2 4 8 1 6 1 5 4 16 13 1 4 8 1 3 2 1 6 9 16 1 16 1 16 1 5 4 9 1 6 2 13 16 4 1 13
1 3 9 16 13 4 1 5 1 16 1 3 8 16 4
0,
4 10 3 8 7
16
=
8-'
017
15 12 7 13 5 9
*
@"
15 1 2 2 4 16 15 4
NUMBER THEORETIC TECHNIQUES IN DIGITAL SIGNAL PROCESSING 135
In order to compute the transform using the signal flow graph of Fig. 2, we require to build a butterfly with premultiplication by a twiddle factor. Since the butterflies will be cascaded, it makes sense to maintain the input and output data of the butterfly in submodular form; this will include index lookup (in preparation for the twiddle factor multiplication in the following stage) for the subtraction output of the butterfly. This output data structure will be maintained through the forward transform, transform domain multiplication, and reverse transform. Note in this realization that the transform domain multiplication is simply treated, in complexity, as a twiddle factor multiplier and does not overwhelm the computational requirements as in the more conventional approaches. The sub-modular elements required by the butterfly, from Fig. 1, are: (1) Index addition tables and inverse look-up for the “B” input; (2) Addition tables for the “C” output; ( 3 ) Subtraction tables for the “D” output.
in a submodulus index form. We first pre-store the transform matrix, (snk), Using a mapping generator, p = 3, we form forward and inverse mapping tables (Tables XIV and XV). We note that the forward table maps elements of the multiplicative group to the additive group; the inverse table performs the opposite mapping. The multiplicative group has all elements of GF(17) except 0. The additive group has all elements except 16. In using index calculus for multiplication by performing addition, we recognize that a mapping for 0 has no meaning, and so this has to be treated as an exceptional case. The look-up table approach is very efficient in implementing the 0 case, since this case can be treated as the looking-up of a special code that routes through the computational tables so as to produce a zero result. For example, we can use the special code 7. This is a number that will not be generated as the result of TABLE XIV
FORWARD TABLE 1
2
3
4
5
7
6
8 9 1 0 1 1 1 2 1 3 1 4 1 5 1 6
~~
0
1
4
1
1
2
5
1
5
1
1
1
0
3
2
7 1 3
4
9
6
8
TABLE XV INVERSE TABLE 0 1
1 3
2 9
1
4
3
0
1
3
5 5
1
5
1
8
7
6 1
1
6
9 1 0 1 1 1 2 1 3 1 4 1 5 1
4
8
7
4 1 2
2
6
136
GRAHAM A. JULLIEN
any valid arithmetic operation over either GF(7) or R(6).This code will drive the subsequent look-up table whose contents of address 7 are zero. The complete look-up tables, with interconnections, are shown in Fig. 6. By repeated use of this network we are able to generate the forward transform, the transform domain multiplications, and the inverse transform. We can i
I
0 1 2 3 4 5 6 7
0 1 2 3 4 5 6 7
0 1 2 3 4 5 6 7
0 1
2
-4
3
4 5 6 1
6 7
I'
1
I
I.
4 5 6
1
7
0 1 2 3 4 5 6 7 0 1 2 3
4 5 6 7
i
L I
0 1 2 3
1
r'
0 1 2 3 b 5 6 7 0
1 2
-34 5 6 7
I I
1
0 1 2 3 b 5 6 7
0 1 2 3
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 1
I
I'
0 1 2 3 4 5 6 7
6 7
0 1 2 3
-
3
-
0 1 2 3 b 5 6 7
2 3
4 5 6
1 2 3 4 5
1
0 1 2 3 4 5 6 7
0 1
0 1 2
0
-
5
0 1 2 3 4 5 6 7
0 1 2 3 1 4 5 6 7
0 1 2 3 4 5 6 7
0 1 2 3
1
0 1 2 3 $ 5 6 7
4 5 6 1
FIG.6 . Modulo 17 NT'T butterfly using ROM arrays. '
NUMBER THEORETIC TECHNIQUES IN DIGITAL SIGNAL PROCESSING
137
either multiplex the network or make many copies of it, depending on the speed/hardware ratio required. Note that we do not use special “multiplierless” hardware for the transform and suffer the consequences during the transform domain multiplication, as is advocated by many approaches to the use of NTTs for indirect convolution. In Table XVI and Table XVII we demonstrate the use of the network of Fig. 6 in implementing the convolution. Both tables have the same form. For the forward transform in Table XVI, the input is given in the upper left column. This is converted to a submodulus representation of the forward mapping; e.g., the second input, 2, is mapped, using Table XIV, to 14 and this is converted to the submodulus RNS 2-tuple (0,2). The application of the network of Fig. 6 results in an index multiplication* by the premultipiier (unity for the forward input), followed by the three look-up stages of inverse mapping, addition/subtraction, and forward mapping. All tables use the submodulus RNS with moduli (7,6). The four rightmost columns represent the four stages of the butterfly of Fig. 6. The 8point FNTT requires three stages, as shown by the three sets of look-up columns (vertically arranged). The output of the third stage is shuffled to remove the effect of bit-reversal. Table XVII shows the inverse transform. Note that we have decoded the output from the forward transform to allow a comparison with the results of the matrix multiplication of the example. The transform domain multiplication is performed in look-up c l of the inverse transform. The final shuffled output of the inverse transform, shown in the lower right column, can be reconstructed and inverse-mapped to yield the sequence { 1,4,10,3,8,7,16,0), which is the correct sequence. Note that the code (7,7) represents the final sample of the shuffled output. This is a representation of zero using the special code discussed above. The reader can verify that all of the look-ups are directly obtainable from the butterfly network of Fig. 6. A useful exercise for the reader might be to take some other sequence(s)and use the network to compute their convolution. This particular scheme has been used in the hardware construction of a high-speed image convolver (Nagpal et al., (1983)).The convolver was built as a peripheral to an image-processing system based on a SEL/27 computer. The architecture used a single two-dimensional computational element (CE) to compute the NTT of four samples taken from a two-dimensional block of image data; the CE was implemented by multiplexing a single two-sample butterfly. As a modification to the complete look-up table implementation, the butterfly additions were computed with binary adders with EPROM correction. Look-up tables were used by the post multiplier in the computational * Represented by the multiplier and index addition (look-up c l ) stages.
138
GRAHAM A. JULLIEN TABLE XVI
FORWARD TRANSFORM LOOK-UP RESULTS Forward Input
1
Multiplier
6
Look up c l
Look up c2
Look up c3
Look up c4
7
6
7
6
1
6
1
6
1
6
0 0 0 0 0
0 0 0 0 0
0 0
0 2
I
1
5 7
0 1
1 2 3 4
1 2 3 4
0 2 1 0
0
7
7
1 7
1 1
0 0 0 0
0 0 0 0
1 2 3 4 1 2 3 4
0 0 1 5
0
1 2 3 4 1 2 3 4
0 0 1 5
0 2 1 0
~~
I
0 0 1 5
0 2 1 0
0
I 7
7 1
0 0
1
1
0
0
I
7
0
0
2 3 4 0
Multiplier
Look up c l
Look up c2
Look up c3
Look up c4
7
6
I
6
7
6
I
6
1
6
0 0 0 0 0 0 4 4
0 0 0 0 0 0 4 4
0 0 1 5 0 0 5 2
0 2 1 0 0 2 5 4
1 2 3 4 1 2 5 1
1 2 3 4 1 2 5 1
4 6 5 5
4 0 4 4
5 1 6 6 1 1 4 0
0 3 0 0 3 1 4 0
6
0
3 3 1
3 2 1 ~~
Multiplier
Look up c l
Look up c2
Look up c3
Look up c4
Shuffled
7
6
7
6
7
6
1
6
1
6
7
6
0 0 0 4 0 2 0 6
0 0 0 4 0 2 0 0
5 1
0 3
4
4 0 3 2 0 4 1 3
3 5 2 0 2 3 0 5
4 4 5 1 4 2 4 4
3 6 1 4 1 4 0 6
3 0 3 5 2 4 1 0
3 1 1 0
3 2 3 1
6
0
3 1 3 4 6
4 3 3 4 0
6 1 1 6 3 6 1
6
0
4 4 6
4 5 0
element. The outer residue system computed over the fields GF(641) and GF(769). Both fields support multiplicative subgroups of 128 elements, and so a two-dimensional cyclic convolution of 128 x 128 image points could be implemented. The multiplication and correction hardware used 32K EPROMs (erasable/programmable read-only memories), computing over an
NUMBER THEORETIC TECHNIQUES IN DIGITAL SIGNAL PROCESSING
139
TABLE XVII
INVERSE TRANSFORM LOOK-UP RESULTS Input 10 16 6 11 15 13 7 15
Forward 3 1 1
3 2 3
0
1
6 4 4 6
0 4 5 0
Multiplier 2 0 5 6
3 2 5 1
5
0
3 I
4 1
5
0
Multiplier 0 0 0 0 0 0
0 0 0 0 0 0
5
0 0
5
Multiplier 0 0 0 5 0 0 0 3
0 0 0 0 0 2 0 4
Look up c l
Look up c l 5 1 6 6 4 0 5 4
0 4
2 2 0 2 0 0
Look up rl 4
4
7
1
7
7
5
5
6 4 0 3
1 4 2 0
Look up c2
4
4
6
1
5
5
5
5
4 4 2 6 3 5
4 1 3 3 3 1
6 0 0 0 3 3
1 2 2 1 4 4
Look up c2 4 1 6 6 2 2 4 2
4 3 1 1 3 2 4 3
Look up c2 6 0 0 5 5 6 2 2
1 0 0 5 0 1 2 4
Look up c3 4 1 6 6 0
0 2 3 5 3
0 6 0
Look up c3 6 3 3 1 2 6 2 4
1 5 5 4 1 1 3 4
Look up c3 6
1
5
5
6 2 0 1 3 4
1 1 2 5 4 3
Look up c4
1
0 3 3 1 5 4
0 4 3 2 0 5
2 0
1 7
1 7
Look up c4 4 7 7 5 6 4 2 5
4 1 7 5 1 4 2 0
Look up c4 4 5 4 6 2 6 3 2
4 5 4 1 3 1 3 3
Shuffled 0 5 3
0 0 3
1
1
3 4 I 7
4 5 2 7
inner residue system. Submoduli of 63 and 31 were sufficient to implement the inner residue number system using 32K EPROMs. It is interesting to note that if look-up tables had been used for the Galois field additions, then the entire architecture would have consisted of memory elements (ROMs for the arithmetic and RAMS for the data manipulation). By 1985, the ability to design full and semi-custom VLSI chips for specialized applications became feasible with the use of multi-project chips and wafers. Several groups have studied the optimization of computational VLSI circuits for residue arithmetic (Jenkins rt al., 1985; Jenkins and Lao, 1987; Soderstrand and Chang, 1986; Bayoumi et al., 1987; Taheri et al., 1988).
140
GRAHAM A. JULLIEN
In the next section we will explore some of the basic issues in building special computational circuits, with a concentration on the use of look-up tables to generate pre-stored computations rather than the standard practice (in the binary world) of computing the results as they are required. This is a reversal of the accepted practice of implementing small residue calculations with combinational logic (Carhoun el al., 1983), based on a presumed complexity efficiency of look-up tables versus combinational logic. We show that this assumption is not necessarily valid.
VIII. VLSI IMPLEMENTATIONS OF FINITEALGEBRAIC SYSTEMS The thrust of digital signal processing (DSP) algorithms into the main stream of general signal processing has been due entirely to the advances in integrated circuit fabrication. With the advent of VLSI, a decade ago, DSP has witnessed a large increase in the number of applications of its theory and practice. VLSI implementations naturally adopted standard digital logic components that were developed for the “binary world”. Such hardware includes ROMs, adders, multipliers, microprocessors and special DSP processors. This is a perfectly reasonable approach when the custom design, and fabrication, of VLSI hardware is not practical, particularly for small production runs. Now, however, it is possible to both design and fabricate small production runs using the many “silicon foundries” that have appeared over the last several years. Many are oriented to “standard cell” implementations, but it is possible to produce small runs of full custom circuits. In this section we consider that the latter approach is realistic, and concentrate on the implementation of special finite ring high-speed computational cells and architectures that are especially suited to custom VLSI implementations.
A. MODULAR ARITHMETIC ELEMENTS
Although ROMs play a central role in finite ring computational architectures (Jullien, 1978, Bayoumi et al., 1987), the use of direct Boolean logic implementation for modular addition and multiplication is important for many implementations. We start with an assumption that we are only interested in very high-speed implementations. We are naturally led to the use of bit-level systolic arrays (McCanny and McWhirter, 1982) for the architecture of RNS arithmetic elements. These provide the minimum critical path for the pipelined cells and the local communication between adjacent cells that
NUMBER THEORETIC TECHNIQUES IN DIGITAL SIGNAL PROCESSING
141
is sought after in high-speed VLSI implementations. We will look at a recently published result that shows the efficiency of finite ring computations as a direct replacement for binary arithmetic (Jullien et al., 1989).
B. A GENERIC RESIDUEPROCESSING CELL A pivotal element in the processing of most linear DSP algorithms is the inner product step processor (IPSP). This element is, in essence, a multiplier/accumulator cell (see Section I). In this section we discuss a recent development of a finite ring bit-level systolic cell realization of the IPSP that allows most RNS implementations of linear DSP algorithms to be constructed with large linear systolic chains of the cell (Taheri et al., 1988).This leads us to a true generic cell implementation of DSP algorithms. We also discuss recent results that compare such a generic finite ring cell to the equivalent generic binary cell -a gated full-adder (Jullien et al., 1989). 1. The Finite Ring I P S P
Many matrix operations and DSP algorithms can be implemented using repeated multiply and add operations in a loop. The operation is performed using the IPSP. This processing cell computes
Xu, = yin + (axin),
(40)
where Y is a running accumulator, A is a multiplier, and X is the independent input data. By building a chain of such devices with an initial yi,, = 0, we are able to compute the inner product Y=A-XT.
(41)
We assume that the a in each IPSP changes accordingly. If we allow the restriction that the elements of vector A are fixed, then we can build a finite ring IPSP that has a complexity identical to simple addition, and a hardware cost function much lower than that of a binary implementation for many practical DSP applications. The finite ring IPSP-we will give it the symbol IPSP,,,-provides the relationship Xu,
=
uin
O m [ a O m Xi,].
(42)
All the inputs and outputs are B bit ring elements Y, a, X E R(m) with B = [log, ml. We can now break down the representation of the ring element of X into a binary form and generate the bit-level equivalent for the IPSP,; we will use the symbol BIPSP,.
142
GRAHAM A. JULLIEN
FIG.7. Implementation of the BIPSP, cell.
The operation of the BIPSP,, with a fixed multiplier, can be defined by yi,
1
= y,
0 [a 0 xril 0 2‘1,
(43)
where i is the spatial array index, y , , yi, a E R(m),and xtil is the ith bit of Xi,E R(m).Note that we have made the variables lower-case to indicate operation at the bit level (also note that both a and y are still assumed to be word variables within the ring). The ring operations are shown without the modulus subscript. The implementation of the BIPSP, cell is shown in Fig. 7. The cell contains a ROM of size B 2 B bits and a set of steering switches. Although we only need m words of storage, it is convenient to design a cell based on the largest value of m, namely 2’. Inputs to the cell are y , and Zi,the outputs are y i + and A,, 1; each output line is latched. The x input is given the new symbol 2 to indicate that the bits are circularly shifted by one position for each cell, automatically presenting the correct bit to the steering switches in each cell. This results in a regular, common cell structure; the trade-off is the requirement for extra latches. The ROM stores the operation of y i 0 [2’ 0 a]. The cell computes the following relationship: For
Y , + ~= yi 0 [2’ 0 a]
= 1:
For 2!01 = 0:
yi+ = yi.
(44)
?lo’ is known as the steering bit, since it is used to determine the direction the y data take through the cell (either through the ROM or around it). If we expand Eq. (42) as B- 1
r,”,=
KII
0
c, {a 0
j=O
xb’
0 2j},
(45)
NUMBER THEORETIC TECHNIQUES IN DIGITAL SIGNAL PROCESSING
143
then the output can be computed recursively using the recurrence Yo =
Y”
yi+ 1 = ~i O m (2’ O m .x[jJ}
r,”,= Y E . It can be seen that the operator IPSP, is equivalent to a linear array containing B stages of BIPSP, cells. The addition of latches at the output of each cell allows adjacent cells to compute, at the same time, with stable input data. At the end of the computation period, the data are transferred to the latch (which is basically a single-bit storage device), ready for the next computation period. This is known as pipelining, and in this case we have constructed a one-dimensional, or linear, pipeline. Pipelining is the essential ingredient of systolic arrays (Kung and Leiserson, 1978), along with local interconnection between processing cells. We do not have the space in this paper to delve into systolic arrays, but it is clear from the two requirements just stated that an array of these BIPSP, cells forms a linear systolic array. The linear systolic array structure satisfies the requirements of modularity, homogeneity, and local communication that are sought after in VLSI designs. By repeatedly using the generic cell structure, all closed computations can be performed with such parallel linear systolic structures, including interfacing with normal fixed-point computations (Taheri et al., 1988). We now have a universal bit-level systolic cell that only connects to its immediate neighbours. This cell is the finite ring counterpart to the gated full adder binary bit-level cell (Hatamian, et al., 1986), with the added advantage of allowing fixed coefficient multiplication to be implemented with the same cell. In order to provide an example of the operation of the cell, consider the problem of encoding an S-bit binary number to a residue representation for a ring R(m). 2. Binary to Residue Encoding An S-bit binary number can be reduced, modulo M , as follows: s- I
1x1, = 1 ,2b 0 X[*]. b=O Eq. (46) can be computed via the following recursion:
Yo = 0;
x+l= x 0 (2’ 0 IXI, = ys.
xli’);
(47)
144
GRAHAM A. JULLIEN
Eq. (47) can be calculated using the BIPSP,, modified to contain S bits in its data path. A linear array containing S stages of BIPSP, cells is capable of modulo m reduction. If it is required to multiply the input by a fixed coefficient A, this can be performed without any extra hardware. The stage can also correctly encode twos complement form for the input binary sequence by generating the additive inverse of an element in the ring, modulo m,if required. This is performed by mapping address to content within the final stage (MSB stage) ROM as follows:
Address ( i ) 0,
(1 - 2'- 1,' 0, a } * Content ( i ) .
(48) The procedure is illustrated in Fig. 8 for S = 16, a = 13, m = 11, and input X = -29. The columns represent ROM contents for 16 BIPSP, cells, with active ROM contents outlined. The ROM addresses are the column on the left, and the serial binary input is shown, starting at the LSB on the left, as the top row. In practice, the binary number will be partititoned into several B-bit summations so that the 2 data path is the standard B-bit width. This example illustrates the cyclic nature of the ROM contents, formed as elements of an additive group under the ring addition operation. The only information required to generate the entire ROM contents is the first location; it seems, however, that a general ROM structure is the most efficient implementation mechanism, providing much smaller (area period) complexity measures than direct residue adders.
.
3. Other Operations
The BIPSP, cell can be used to compute all closed finite ring operations over linear bit-level systolic arrays. The reader is directed to the reference (Taheri et a/., 1988)for detailed information. 1 1 0 0 0 1 1 1 1 1 1 1 1 1 1 1
FIG.8. ROM Contents for -29 Bl113
NUMBER THEORETIC TECHNIQUES IN DIGITAL SIGNAL PROCESSING
145
Perhaps one of the most startling uses of the cell is in CRT mapping. This is normally implemented either with binary adders in a true CRT configuration(S0derstrand et al., 1983), or with large ROM tables in an MRC configuration (Jullien, 1978). By combining IPSP modules with a binary modulus base extension scheme (a recently disclosed technique by Senoy and Kumaresan, 1987), we are able to compute over the individual rings and also produce the output in bit-sliced binary. The basic concept is iteration around a loop that contains base extension to a power of two modulus, followed by scaling with that same ring modulus. The base extension employs the mixed radix conversion process discussed earlier, and the output is produced in slices of bits, with the number of bits equal to [log, me], where me is the extended ring modulus. If we use a ring modulus of, for example, 32, then the output will be in 5-bit slices. The operations required are all individual ring operations and are associative; therefore we can use the linear systolic structure (with small modifications in data paths) to perform the mapping. Often the mapped data are not required over the precision of the direct sum ring. Scaling strategies can be used to great effect in this situation. The following example illustrates the point. With four 5-bit moduli (say, 32, 31, 29, 27) we have greater than 19 bits of dynamic range. If we only require the mapped output to have about 10 bits of dynamic range, then we can adopt an exact division scaling strategy (Jullien, 1978). By carefully arranging the order of the moduli, we will be able to produce the output in bit slices with only one iteration through the scaling array and no base extension. In the example used here, we arrange the moduli in the order { m ,= 29, m 2 = 27, m3 = 31, m4 = 321, and we plan to scale by 27 x 31. This will reduce the dynamic range to 32 x 29 % 2". A possible construction is shown in Fig. 9, with the gray blocks equivalent to five linearly connected cells and the solid block representing five cascades of five latches each. The data line shown superimposed on the gray blocks represents the cyclically rotated parallel data path with access to the top serial bit. The arrow between blocks 1 and 2 indicates that the serial bit for block 1 is obtained from the serial bit used for block 2, rather than from its own serial bit. This allows existing data paths within each cell to be used more effectively. The output is obtained at the bottom of the array, with the ordering of bits as shown. We first define 9, , 9,, and Q4:
9,, 9, and 94 can be calculated in advance. Using a set of constants, as shown in Eq. (50),the input to the decoder, X , , X,, X , , and X,, can be mapped to
146
GRAHAM A. JULLIEN
[Ol [I]
[91
b , b ,..., b FIG.9. Scaling and reverse mapping array for four 5-bit moduli.
x,,x,,x3,and X4 employing only a single BIPSP, cell for each input (this BIPSP,,,cell will be the first of a block of six cells).
X1 = (XI a m , Y ) O m , 91 X 2
= (X,O m , Y )
x, = (X,
Om3
x4
Y) Om31mi11rn3
= (X4O m , Y )
O m , 94
(50)
The addition constant y = rn,rn,/2 is used to allow rounding of the estimate to the nearest integer rather than truncation, as normally happens with exact division scaling techniques.
NUMBER THEORETIC TECHNIQUES IN DIGITAL SIGNAL PROCESSING
147
The decoding and scaling operation can be performed using mod m adder blocks if the free constant multiplication property of the cells are used. The details of the operation are:
a. A ScalelBinary Decode Example In order to observe the action of this decoder, we will take an example of an input number, X = 95,287, which has a residue representation ( X ,, X 2 , X 3 , X4)= (22,4,24,23).Scaling by 27 x 3 1, and rounding, produces the integer 114 = 0001 110010. The multiplicative inverses are given by the set ( ( m i'Im,, Imi ' I m 3 , Im; 'lm4, ImS I', ImS Imillm,> = { 14,23,19,15,31,10), and the initial constants are given by the set {gl,9,, ,!J4} = { 12,513) Table XVIII illustrates the steps in the process
TABLE XVIll SCALING AND
X, = 24
BINARYDECODING X4 = 23
148
GRAHAM A. JULLIEN
with y = 419 = (13,14,16,3). The table is arranged to follow the block configuration in Fig. 7. The binary output at the bottom is arranged in reverse order; the concatenation of the two 5-bit data streams (01001 11000)clearly represents the correct scaled and rounded output (0001110010 in normal order). Next we consider the implementation of an FIR filter using the BIPSP, cell, and finish with a detailed example. 4. A Bit-Parallel FIR Filter
From above we know that an Nth-order fixed coefficient FIR filter, computed over a finite ring, can be expressed as
We can now express the independent data, { X ) , as a binary number within the ring, and thus expand Eq. (52) to the form N - 1 B-1
I Vn)Irn= 1 ,1,2b @rn(ai o x(n - i)[’]). i=O b=O
(53)
A single linear array of cells can be used to implement Eq. (53), the addition being performed in a distributed manner as the partial results move along the array. The systolic cell required for this operation is identical to that of Fig. 6, with the addition of a single latch in the steering bit line. This single latch forms a complete word shift after traversing B cells, because of the cyclic shift of data as each cell is traversed. This accomplishes the time shift indicated by the term xrbl(n- i ) in Eq. (53).It is interesting to note that, although the original FIR equation assumed word-level operation, it is possible to have the individual bits embedded in the same structure. The use of single-bit systolic arrays is not new and has been reported several times in the literature (for example, McCanny and McWhirter, 1982). To visualize how individual groups of cells, forming IPSPrns,can be used along with a sliding latch to form a convolution sum, let us take an example of a two-coefficient FIR filter, where each coefficient is 3 bits long. The wordand bit-level relationships are
The filter coefficientsa(O),a( 1) are stored in a six-cell array as shown in Fig. 10. Each cell is assumed to compute the operations defined by Eq. (46) and to include latches on all of the outputs. The sliding latch on the steering bit line is explicitly shown as a filled square. The inputs are fed in sequentially (the y
NUMBER THEORETIC TECHNIQUES IN DIGITAL SIGNAL PROCESSING
149
FIG. 10. Data flow for the two-coefficient linear systolic FIR filter.
inputs are zero to initialize the accumulated products), and the partial results are also obtained sequentially. The inputs are assumed to be padded with zeros. The upper partial results are the accumulating y values for each cell, and the lower results are the steering bits for each cell. The effect of the sliding latch can be seen by the extra delay on the steering bits after the first three cells. The effect is to delay the entire word (all three bits) by one clock period. The final accumulated result ( y 6 )is the output. In order to verify that this linear systolic array produces the correct filtered output, Table XIX is constructed, showing the six cell outputs for nine clock periods. In order to limit space in the table we
TABLE XIX DATA OUTPUT FOR THE TWO-COEFFICIENT SYSTOLIC FIR FILTER
0 0 0
a, 2°x(0)'01
ao2'x( l)Lol ao20x(2)[01
0 0 0 0 0 0
a,x(O) u,x(l)
8 .,20x(0)'0'
u,x(~) 8 a,2°~(i)[01 n120x(2)[0'
0 0
NUMBER THEORETIC TECHNIQUES IN DIGITAL SIGNAL PROCESSING
15 1
TABLE XX
EVALUATION OF THE TWO-COEFFICIENT FIR FILTER EQUATION
temporarily drop the special symbol for ring multiplication, except at the output cell. We also assume that the array latches are initially filled with zeros. When a complete word has been assembled, this is written explicitly as a word (not as a bit-level description). Table XX shows the result of applying the word-level filter description (left-hand summation in Eq. (54)) for 0 I n I 3 over the summation index 0 I i I 1; the input is the same as that in Fig. 8. We see that the systolic array generates an identical set of outputs to the equation evaluation; the only difference between the two evaluations is the latency of six clock periods in the systolic array output. The efficiencies inherent in having fixed coefficient multipliers can also be used to advantage in binary arithmetic (Peled and Liu, 1974), but the resulting structure is not as regular as the systolic array approach discussed here, and cannot be used at such high throughput rates. More is said on the finite ring/binary comparison in the next subsection.
C. VLSI Implementations .for ROM Generic Cells The finite ring cell has been fabricated in a 3-pm p-well double metal CMOS process that is supported by the Canadian Microelectronics Corporation (CMC) (Jullien et al., 1989). The block diagram of the finite ring cell implementation is shown in Fig. 11. The circuit is a mixture of dynamic and static logic (Dillinger, 1988). In particular, the latches and the ROM storage are dynamic; this saves a great deal of area and is a natural implementation of a high-speed pipelined system. Note that the implementation of the latches is distributed, rather than lumped at the output, and the cyclic shift is performed at the output of the latched data, rather than before the latches. These changes provide for an efficient cell layout; they do not affect the functionality of the cell. 1. Circuit Operation There are two main sections in this design: the latches and the ROM. The selection of logic implementation is based on minimizing the (area period)
-
152
GRAHAM A. JULLIEN
I
I
01 FIG.11. Block diagram of the finite ring cell implementation.
product, where the period is the minimum time between adjacent clock pulses at which the circuit still functions correctly. It is interesting that even at the level of considering latch design, there are vital choices to be made. The diagram in Fig. 10 shows transmission gate latches; these are basically two parallel circuits of a single p-channel and single n-channel MOSFET, each driving an inverter (see Fig. 12a). By arranging the and (62 clock pulses to be non-overlapping, the latch operates by turning on (essentially an open circuit to a low-resistance switch) the first gate and keeping the second off (rp1 on and
+5v
+5v
In
t
-
I
G ov
t
ov
FIG. 12. Dynamic latch circuits: (a) transmission gate latch; (b) switched inverter latch.
NUMBER THEORETIC TECHNIQUES IN DIGITAL SIGNAL PROCESSING 153
a)* off), then reversing the clock pulses with no overlap. This has the effect of moving x data through the cell under control of the clock pulse. It is rather like moving boats through a lock system on a canal. The pulses are nonoverlapping in the same way that we do not want to open all the lock gates at the same time! A second type of latch is also available; this is the switched inverter (see Fig. 12b). In this circuit we construct a normal static inverter, but allow its output node to be isolated (this takes the place of the parallel n- and p-type transistors in the transmission gate latch). There are trade-offs with both approaches. The transmission gate inverter is a faster circuit because the output inverter continues to be driven by the charge on its input after the switch has been disabled; this latch, however, takes the greater area of the two designs. We therefore choose the switched inverter latch to minimize area where it is not in a critical path, and the transmission gate inverter to minimize propagation delay when it is in a critical path. All of the j;l data latches (except for the steering bit) are not in a critical path and use the switched inverter configuration. The steering bit line latch and the ROM data path latches use the transmission gate form. By placing the latches controlled by d1 before the ROM,some duplication of devices can be eliminated by using the inverters on the y input lines for both the latch function and decoder complement function. The ROM can be decomposed into four sections: the row decoder, the column decoder, the storage array, and the pull-up drivers. The ROM (see Fig. 13) is configured as eight rows by four columns and allows ring moduli of up to 32 to be implemented. The ROM is dynamic, in that the parasitic capacitance associated with the column select transistors is precharged, and the ROM output is subsequently evaluated by the row select circuitry. The ROM storage circuit is a matrix of nMOS transistors (see Fig. 14);the ROM is programmed by including or excluding the appropriate row
Pull up Driven
w
5 Bit Input FIG.
13. ROM block diagram.
154
GRAHAM A. JULLIEN Vdd
FIG.14. ROM storage circuitry.
transistor. The time periods during which the storage circuitry precharges and the row and column decoders evaluate are overlapped. This is not the usual operation for dynamic logic, since there is a path between the power supply and ground during the overlap, resulting in larger current than that associated with the parasitic capacitor charging current. From extensive SPICE simulations, however, it is found that the increase of current is only 300 pA when the time periods are overlapped. This represents an excellent trade-off between increased speed and increased current, since the overlapping time periods reduce the critical path by over 7 ns. It is important to note that although the number of transistors in the cell is large, they are placed at virtually the maximum density allowable by the design rules; this is not the case with random logic built from standard cell libraries. 2. Simulations
The full pipeline cycle of the finite ring generic cell is shown in Fig. 15. A maximum pipeline period of 43 MHz is predicted, and it has been verified to be close to this in laboratory controlled tests. The detailed SPICE simulation output for the complete cell is shown in Fig. 16. The simulation parameters were extracted from the final cell masks used to produce the final fabricated
NUMBER THEORETIC TECHNIQUES IN DIGITAL SIGNAL PROCESSING
155
02
-1
23ns (approximately43MHz)
+I
FIG.15. Timing diagram for a complete finite ring cell period.
5
0 2 1
5
01
3
1
0
20
40
60
80
1 10
Time (nS) FIG. 16. Simulation of the finite ring cell at 43 MHz.
circuit. The cell layout shown in Fig. 17. Several test cells were produced, including systolic array IPSP,,, configurations. One of the test cells is shown in Fig. 18; this implements a single 5-bit IPSP, and includes appropriate power supply lines, isolation for noise, static protection for the inputs, and high-speed drivers for the outputs. Both the correct operation and the operational speed have been verified.
GRAHAM A. JULLIEN
156
FIG. 17. Layout of the finite ring cell.
I
~
FIG. 18. Test IPSP,,, chip.
NUMBER THEORETIC TECHNIQUES IN DIGITAL SIGNAL PROCESSING
157
3. Comparative Study In a comparison study of the finite ring cell and a fast bit-level binary cell (gated transmission gate full adder), the ratio of the (area period) product for the finite ring cell to that for the binary cell was 1.94. A study (Jullien et al., 1988) was conducted into using both approaches to implement a fixed coefficient 64-tap FIR filter with 8-bit data, 8-bit coefficients, a 20-bit computational range, and a 10-bit output range. The RNS system used the moduli set {32,31,29,27} with 8-bit binary input data and scaled binary output data. The (area period) product ratio of the binary approach compared to the RNS approach was approximately 4, and the area ratio approximately 3.5. This certainly justifies using the RNS approach rather than the conventional binary implementation for very high throughput-rate digital signal processors. If one also considers the ease with which fault detection can be introduced into the design, and the simple schemes available for testing large arrays of the finite ring cell (Jullien et a!., 1989), the case for using RNS arithmetic at the silicon level is very strong. As a final point, we can leave aside the issue of arithmetic complexity and only consider high-speed pipelined arithmetic systems in terms of their inherent differences for fabrication on a VLSI circuit. The essential differences (besides the fact that they are computing over different arithmetic structures) are immediately obvious from considering the basic arithmetic operation of addition. The diagrams of Fig. 19 and Fig. 20 demonstrate this difference. The binary bit-level array requires B cells temporally spaced by B clock periods. This is to allow the propagation of the carry information: one bit per clock period. The data are skewed and deskewed using 2B(B-1) latches. For a
-
I31 X in
131 xout
J'Iin
42 FIG. 19. Binary bit-level systolic adder.
158
GRAHAM A. JULLIEN
FIG.20. Finite ring bit-level systolic adder.
sequence of adders the data can travel through subsequent cells skewed. Thus skewing and de-skewing only need to be performed once. The finite ring array consists of B cells without need for skewing and deskewing latches, since the data travel through the arrays de-skewed. There is no information communicated to other arrays in order to increase the dynamic range, as with the carry propagation in the binary array. The dynamic range is increased by computing with a set of independent arrays, each computing over a different ring. Table XXI shows the differences between the two systolic array approaches, based on the adder comparison. One issue not covered in the table is the conversion arrays (Taheri et al., 1988) required for interfacing the finite ring computation with binary processors. It is clearly important that many arithmetic operations be performed in between such conversions; a perfect example is the computation of long convolution or correlation sequences. On the same subject, there is also an overhead associated with binary arrays, and that is the need for skewing and de-skewing latches. We would, however, expect this overhead to be much smaller than the conversion overhead associated with the finite ring cell. TABLE XXI
FEATURE COMPARISON OF BINARYAND FINITE RINGADDERARRAYS ~~
Feature Dimensionality of array Extension of Dynamic range
Clock distribution
Arithmetic function
~~
Binary cell
Finite ring cell
Two-dimensional. One dimension corresponds to the dynamic range. Carry propagation along the dynamic range dimension.
One-dimensional. The dynamic range dimension is entirely within the cell. Independent extension of dynamic range retaining the one-dimensional structure of the array. Race-free clocking against the direction of data flow. No loss of speed in array vs. cell. Finite ring multiplieraccumulator.
Two-dimensional clock distribution slows down array compared to cell speed. Binary addition.
NUMBER THEORETIC TECHNIQUES IN DIGITAL SIGNAL PROCESSING
159
In summary, we see that the finite ring array removes the array dimension associated with dynamic range, which allows a more flexible clocking scheme to be employed and eliminates connectivity problem associated with large dynamic-range arrays. We also have the advantage that the multiplication overhead associated with binary arrays is eliminated in the finite ring cell; this assumes that the multiplier is fixed. The final advantage is that with finite ring computations, we are able to implement integer arithmetic; with binary arithmetic computations, we are not able to implement general finite ring arithmetic. Thus for the implementation of number theoretic transforms and quadratic residue systems (e.g., Jenkins and Krogmeier, 1987; Jullien et al., 1987), as examples, the study and application of special approaches to the implementation of finite ring arithmetic is essential.
IX.CONCLUSIONS This chapter has developed the theory and discussed the applications of number theoretic techniques for the design of high performance digital signal processing systems. We have developed the techniques from theoretical aspects to a study of some practical applications. In particular, we have discussed, in considerable detail, the application of number theory to the implementation of convolution processes using indirect convolution via number theoretic transforms. The discussion has ranged from basic Fermat transforms to the use of special extension rings for the efficient computation of complex convolution. The basic principles of residue number systems have also been introduced and have been illustrated by several practical examples. Both RNS and NTT theories have been combined to illustrate the ability to perform finite ring computations through the use of look-up tables. The lookup table approach has been extended to cover the implementation of integer inner product calculations using a generic cellular structure. We have discussed some of the practical VLSI considerations in implementing such cells, and we have illustrated the discussion with circuit diagrams and VLSI layouts. Although the mathematical theory of finite arithmetic is a very old subject, it has become the object of some study for more than thirty years. The newly emerging digital computer industry of thirty years ago carried out particularly intensive research into the subject, but applications of the concepts have never been successful in general-purpose digital computers where the deficiencies of nonbinary techniques tend to counterbalance the advantages. However, the modern requirements of real-time digital signal processing and the capabilities of VLSI implementation form an ideal setting for number theoretic
160
GRAHAM A. JULLIEN
techniques. Today, these finite arithmetic concepts are being used primarily by engineers in research and development, and only occasionally find their way into practical solutions to engineering problems. It remains to be seen if finite arithmetic concepts will provide important solutions for signal processing problems of the twenty-first century. The role of finite arithmetic in the implementation of high-speed computational circuitry has taken on different forms over the past three decades. It has always been driven by technology, and the situation has not changed. The technology of today is VLSI, and of tomorrow, ULSI, GLSI.. . . We have seen, in this paper, a new approach to implementing VLSI finite arithmetic circuits, and it is clear that our implementation techniques may have to change to fit this implementation vehicle. It is also clear that in the field of DSP, where very fast computations are required, finite arithmetic has a very definite role to play. The secret to using the arithmetic correctly is to allow the requirements of the medium to drive the search for more efficient architectures, not the “shoe-homing” of existing approaches, developed for discrete components, that so often marks our use of emerging technologies.
ACKNOWLEDGEMENTS The author acknowledges the financial assistance of several grants from the Natural Sciences and Engineering Research Council of Canada. In addition he thanks the University of Windsor for support in the form of financial assistance and a research professorship. Finally, thanks are due to his former graduate students for many hours of time devoted to the furtherance of the research work that forms the basis for much of this chapter.
REFERENCES Abramowitz, M. and Stegun, I. A., eds. (1968). Handbook of Mathematical Functions with Formulas, Graphs and Mathematical Tables. National Bureau of Standards, Applied Mathematics Series, 55,7th printing. Agarwal R. C. and Burrus, C. S. (1974).“Fast convolution using Fermat number transforms with applications to digital filtering,” IEEE Trans. Acount., Speech, Signal Processing. ASSP22(2),87-97. Baraniecka, A. Z. (1980). “Digital filtering using number theoretic tectiniques.” Ph.D. dissertation, University of Windsor, Windsor, Ontario, Canada. Baraniecka, A. Z. and Jullien, G. A. (1980).“Residue number system implementations of number theoretic transforms in complex residue rings,” l E E E Trans. on Acoustics. Speech, and Signal Processing. ASSP-28(3),285- 291. Baugh, R. A. and Day, E. C. (1961).“Electronic sign evaluation for residue number systems.”Tech. Rep. No. TR-60-597-32, RCA, Camden, New Jersey, and Burlington, Massachusetts.
NUMBER THEORETIC TECHNIQUES IN DIGITAL SIGNAL PROCESSING
161
Bayoumi, M. A., Jullien, G. A., and Miller, W. C. (1987). “A VLSl implementation of residue adders,” I E E E Trans. on Circuits and Systems. CAS-34(3),284-288. Blahut, R. E. (1985). Fast Algorithms /or Digital Signal Processing. Addison-Wesley, Reading, Massachusetts. Burton, D. M. (1970). A First Course in Rings and Ideals. Addison-Wesley, Reading, Massachusetts. Carhoun, D. O., Johnson, B. L., and Redinbo, G. R. (1983). “A Synthesis Algorithm for Recursive Finite Field FIR Digital Filters,” Proc. 1983 Int. Symp. on Circuits and Systems. 2,689693. Cheney, P. W. (1961). “A digital correlator based on the residue number system,” IRE Transactions on Electronic Computers. EC-1I , 63- 70. Cooley, J. W. and Tukey, J. W. (1965).“An algorithm for the machine computation of complex Fourier transforms,” Mathematics qf Computation, April, pp. 297-301. Dillinger, T. E. (1988). VLSI Engineering. Prentice-Hall, Englewood Cliffs, New Jersey. Dubois, E. and Venetsanopoulos, A. N. (1978). “The discrete Fourier transform over finite rings with application to fast convolution,” IEEE Transactions on Computers. C-17(7), 586-593. Dudley, U. (1969). Elementary Number Theory. W. H. Freeman and Co., San Francisco. Garner, H. L. (1959). “The residue number system,” IRE Trans. Electronic Computers. EC-8, 140- 147. Gold, B. and Rader. C. M. (1969).Digital Processing of Signals. McGraw-Hill, New York. Hatamian. M. and Cash, G. L. (1986). “High speed signal processing, pipelining, and signal processing,” Proceedings of the Int. Con$ on Acoustics, Speech, and Signal Processing, 11731176. Jenkins, W. K. (1975). “Composite number theoretic transforms for digital filtering,” Proc. 9th Asilomar Conference on Cir.. Sys. and Comp., 458-462. Jenkins, W . K. and Altman, E. J. (1988). “Self-checking properties of residue number error checkers based on mixed radix conversion,” IEEE Transactions on Circuits and Systems. CAS-35(2), 159-167. Jenkins, W. K . and Krogmeier, J. V. (1987).”The design of dual-mode complex signal processors based on quadratic modular number codes,” IEEE Transactions on Circuits and Systems. CAS-34(4), 354-364. Jenkins, W. K. and Lao, S. F. (1987). “The design of an RNS digital filter module using the IBM MVISA design system,” Proceedings of the Internarional Symposium on Circuits and Systems, Philadelphia, Pennsylvania, 122- 125. Jenkins, W. K. and Leon, B. J. (1977). “The use of residue number systems in the design of finite impulse response digital filters,” IEEE Trans. on Circuits and Systems. CAS-24, 191-201. Jenkins, W. K, Paul, D. F., and Davidson, E. S. (1985). “A custom designed integrated circuit for the realization of residue number digital filters,” Proc. 1985 Int. ConJ on ASSP. Tampa, Florida, 220-223, Jullien, G. A. (1978).“Residue number scaling and other operations using ROM arrays.” I E E E Trans. on Computers. C-27(4), 325-336. Jullien, G. A. (1980). “Implementation of multiplication, modulo a prime number, with applications to number theoretic transforms,” ZEEE Trans. on Computers. C-29(lo), 899-905. Jullien, G. A., Krishnan, R., and Miller, W. C. (1987). “Complex digital signal processing over finite fields,” IEEE Transactions on Circuits and Systems. CAS-M(4), 365-337. Jullien, G. A., Bird, P.D., Carr, J. T., Taheri, M., and Miller, W. C. (1989). “An efficient bit-level systolic cell design for finite ring digital signal processing applications,” Journal of VLSl Signal Processing. 1-3 (in press). Jury, E. I. (1964). Theory and Application of the z-Transform Method. John Wiley and Sons, New York.
162
GRAHAM A. JULLIEN
Kung, H. T. and Leiserson, C. E. (1978). “Systolic arrays (for VLSI),” Sparse Matrix Proc. 1978, Academic Press, Orlando, Fla., pp. 256-282. Leibowitz, L. M. ( I 976). “A simplified binary arithmetic for the Fermat number transform,” IEEE Trans. Acoust. Speech Signal Processing. ASSP-24(5)356-359. Liu, K. Y.,Reed, 1. S.,and Truong, T. K. (1976).“Fast number theoretic transforms for digital filtering,” Electronic Letters. 12, 644-646. McCanny, J. V. and McWhirter, J. G. (1982). “Implementation of signal processing functions using 1-bit systolic arrays,” Electronic Letters. 18(6), 241-243. McClellan, J. H. (1976). “Hardware realization of a Fermat number transform,” IEEE Trans. Acoust. Speech Signal Processing, ASSP-24 21 6-225. McClellan, J. H. and Rader, C. M. (1979).Number Theory in Digital Signal Processing. PrenticeHall, Englewood Cliffs, New Jersey. McCoy, N. (1972).Fundamenlals OJ Abstract Algebra. Allyn and Bacon, Boston. Nagpal, H. K., Jullien, G . A., and Miller, W. C. (1983). “Processor architectures for twodimensional convolvers using a single multiplexed computational element with finite field arithmetic,” IEEE Trans. on Computers. C-32(1 I), 989-1000. Nicholson, P. J. (1971). “Algebraic theory of finite Fourier transforms,” J . Comput. Sys. Sci. 5, 524- 547. Nussbaumer, H. (1976a).“Digital filtering using complex Mersenne transforms,” I B M J. Research and Development. 20,282-284. Nussbaumer, H. (1976b). “Complex convolution via Fermat number transforms,” IBM J. Research and Development. 20,498-504. Peled, A. and Liu, B. (1974).“A new hardware realization of digital filters,” IEEE Trans. Acousr. Speech Signal Processing, ASSP-22,456-462. Pollard, J. M. (1971).“The fast Fourier transform in a finite field,” Math. Comp. 25,365-374. Pollard, J. M. (1976).“Implementation of number theoretic transforms,” Electronics Letters. 12, 378-379. Rabiner, L. R. and Gold, B. (1975). Theory and Application of Digital Signal Processing. PrenticeHall, Englewood Cliffs, New Jersey. Rader, C . M. (1972a).“The number theoretic DFT and exact discrete convolution. IEEE Arden House Workshop on Digital Signal Processing, January (oral presentation). Rader, C. M. (1972b).“Discrete convolutions via Mersenne transforms,” IEEE Trans. Computers. C-21, 1269- 1273. Schilling, 0.F. G. and Piper, W. S. (1975). Basic Abstract Algebra. Allyn & Bacon, Boston. Schroeder, M. R. (1985).Number Theory in Science and Communication, second edition. Springer Series in Information Sciences, Vol. 7 Springer-Verlag, New York. Senoy, A. P. and Kumaresan, R. (1987).“Residue to binary conversion for RNS arithmetic using only modular look-up tables.” Submitted for publication to IEEE Trans. Computers. Vol. 38 NO. 2,292-297. Slotrick (1963). Soderstrand, M. A.. (1977).“A high-speed low-cost recursive digital filter using residue number arithmetic,” Proc. IEEE. 65(7) 1065-1067. Soderstrand, M. A. and Chang, B. (1986). “Design of a high performance FIR digital filter on a CMOS semi-custom VLSI chip,” Proc. 1986 ISMM Int. Con$ on Mini and Micro Computers, &VerlJl Ifills, CaliJornia. Soderstrand, M. A., Vernia, C. and Chang, J-H. (1983). “An improved residue number system digital-to-analog converter,“ l E E E Trans. on Circuits and Systems. CAS-N,903-907. Soderstrand, M. A., Jenkins, W. K., Jullien, G. A., and Taylor, F. J., eds. (1986). Residue Number System Arithmetic: Modern Applications in Digital Signal Processing. IEEE Press, New York.
NUMBER THEORETIC TECHNIQUES IN DIGITAL SIGNAL PROCESSING
163
Svoboda, A. (1957). “Rational numerical system of residual classes.” Stroje Na Zpracouani Informaci. 5 9 - 3 7 , Svoboda, A. (1958). “The numerical system of residual classes in mathematical machines,” Proc. Congreso lnternacional De Auromutica, Mudrid, Spain, October. (Also, Information Processing (Proc.of UNESCO Conference, June IYSY), pp. 419-422,1960.) Svoboda, A. and Valach, M. (1955). “Operational circuits,” Stroje Na Zpracouani Infbrmaci. 3. Szabo, N. S. and Tanaka, R. 1. (1967). Residue Arithmetic and Its Applications to Computer Technology. McGraw-Hill, New York. Taheri, M., Jullien, G. A., and Miller, W. C. (1988). “High speed signal processing using systolic arrays over finite rings,” IEEE Trunsaction on Selected Areas in Communications, VLSI in Communications 11/,6(3),504-512. Tanaka, R. I. (1962). “Modular arithmetic techniques,” Tech. Rep. 2-38-62-1A, ASTDR, Lockheed Missiles and Space Co.. Taylor, F. J. and Huang, C. H. (1982). “An autoscale residue multiplier,” IEEE Trans. Computer. C31(4), 321-325.
Taylor, F. J. and Ramnarayanan, A. S. (1981). “An efficient residue-to-decimal converter,” I E E E Trans. Circuits and Systems. CAS28(12) 1164- 1 169. Vanwormhoud, M. C. (1978). “Structural properties of complex residue rings applied to number theoretic Fourier transforms,” IEEE Trans. Acoust., Speech and Signal Processing. ASSP26,99-104.
Vinogradov, 1. M. (1954). Elements of Number Theory. Dover, New York.
This Page Intentionally Left Blank
Information Energy and Its Applications L. PARD0 Departamento de Esradistica e 1.0.Faculrad de Matematicas Universidad Complutense de Madrid Madrid, Spain
I. J. TANEJA Departamento de Matematica Universidad Federal de Santa Catarina Floriandpolis, Brazil
1. Introduction . . . . . . . . . , . . . . . . . . . . . . . . . 166 11. Information Energy and Information Energy Gain for Discrete Random Variables 167 A. Definition, Properties, and Characterization . . . . . . . . . . . . . 167 9. Joint and Conditional Information Energy . . . . . . . . . . . . . . 171 C. Solution to Logic Problems . . . . . . . . . . . . . . . . . . . 174 D. Information Energy Divergence. . . , . . . . . . . . . . . . . . 176 111. Information Energy and Information Energy Gain for Continuous Random Variables 176 A. Definition and Properties . . . . . , . . . . . . . . . . . . . . 177 B. Joint and Conditional Measures of Information Energy and Information Energy 178 Gain . . . . . . . . . . . . . . . . . . . . . . . . . . .
C. Information Energy for Different Probability Distributions . . . . . . . . D. Information Energy in the Field of Hyperreal Numbers . . . . . . . . . IV. Statistical Aspects of Information Energy . . . . . . . . . . . . . . . A. Comparison of Experiments in a Bayesian Context. Relation with Classical Approaches: Lehmann and Blackwell. . . . . . . . . . . . . . . . 8. Information Energy in the Design and Comparison of Regression Experiments in a Bayesian Context. . . . . . . . . . . . . . . . . . . . . . . C. Information Energy as a Rule of Sequential Sampling . . . . . . . . . . D. Information Energy in the Sequential Design of a Fixed Number of Experiments E. Information Energy of a Point Process . . . . . . . . . . . . . . . F. Information Energy, Information Energy Divergence, and Probability of Error G. Information Energy as an Index of Diversity . . . . . . . . . . . . . H. Markov Chains . . . . . . . . . . . . . . . . . . . . . . . V. Information Energy and Fuzzy Sets Theory . . . . . . . . . . , . . . . A. Quantification of Fuzzy Information . . . . . . . . . . . , . . . . B. The Information Energy Gain as a Criterion of Comparison between Fuzzy Information Systems. . . . . . , . . . . . . . . . . . . . . . C. Relation of the Information Energy -FIS Comparison Criterion with the Sufficiency and Lehmann Criteria . , . . , . . , , . . . . , . . . , .
179
181 188
188 195 200 209 212 216 218 223 224 224
226 231
165 Copyright 7; 1991 by Academic Press, Inc. All rights of reproduction In any form reserved
ISBN 0-12-014680-0
166
L. PARD0 AND I. J. TANEJA
VI. Weighted Information Energy . . . . . . . . . . . . . . . . . . A. Definition and Properties . . . . . . . . . . . . . . . . . . . . B. Conditional Weighted Information Energy . . . . . . . . . . . . . C. Noiseless Coding Theorems and Weighted Information Energy . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . .
. 234 234
. 235 . 236 239
I. INTRODUCTION Let X = {xl,. ... x,} and Y = { y l , .... y,} be two statistically independent experiments. Let pi = Pr(X = xi), i = 1, 2..... n and 4 j = Pr(Y = yj), j = 1,. ... m, be the probabilities associated with the outcomes of these experiments. Then, the property of additivity for the independent experiments states that
+
H(X, Y ) = H(X) H(Y), where the measure H ( X , Y) has the same structure as that of H ( X ) and H(Y) and is based on the joint probabilities piqj,i = 1, 2 , . ... n; j = 1,. ... rn. The class of measures satisfying the above additivity property are very famous in the literature and are known as the measure of uncertainty (or Shannon’s (1948) entropy) and Renyi’s (1961) parametric entropy. Sharma and Taneja (1975) considered a generalized form of additivity by introducing an additional factor given as follows:
H(X,Y ) = H ( X ) + H( Y ) + c H ( X ) H (Y ) . The above relation is famous as nonadditivity. Following Chaundy and McLeod’s (1961) approach, the above nonadditivity led to a measure considered by Harvda and Charvat’s (1967) parametric entropy. Following Renyi‘s (1961) approach, the above nonadditivity leads to measures considered by Arimoto (1971) and Sharma and Mittal(l975). For a brief review, refer to Taneja (1979). An alternative way to generalize the additivity is by considering generalized additivity of the following type:
+
H ( X , Y )= G ( Y ) H ( X ) G(X)H(Y), where G(X) and G ( Y ) are functions of the probabilities p i and q j values, respectively. Again following Chaundy and McLeod’s approach over the generalized additivity leads one to generalized measures such as, of the entropy type, parametric and trigonometric ones. (Taneja, 1975; Sharma and Taneja, 1977).
INFORMATION ENERGY AND ITS APPLICATIONS
167
In particular, when G becomes $ H , the generalized additivity leads to the simple relation
H(X,Y ) = H ( X ) H ( Y ) . Van der Lubbe et al. (1987) emphasized the above relation very much and came up with the measures called generalized certainty measures. They called them certainty measures rather than uncertainty, as in Shannon’s case, because some of the properties are of the reverse type. One of the measure belonging to this class was studied long ago by Onicescu (1966) and was called information energy, This measure of information energy, sometimes known as the Gini-Simpson index, has a very simplified form and found its applications to classical mechanics (Rao, 1973, p. 175; Onicescu, 1974). Here our aim is to emphasize the statistical applications of information energy. Applications to some related areas such as solving logic problems, connections with hyperreal numbers, noiseless coding, etc., are also made.
11. INFORMATION ENERGYAND INFORMATION ENERGYGAINFOR DISCRETE RANDOMVARIABLES A. Definition, Properties, and Characterization
Let
be the set of all complete finite discrete probability distributions associated with a discrete random variable X having a finite number of values x l , .. . ,x,. The measure of information energy associated with a probability distribution P is given by
c p’. n
Q ( P )=
i= 1
This measure of information was introduced by Onicescu (1966). It has many interesting properties given as follows: 1 . Positivity. For all P = ( p l , . . . , p , ) E An, B ( P ) > 0. 2. Unity. 8(P)= 1 iff P = P o , where P o E A, is a probability distribution such that one of the probabilities is one and all others are zero. 3. Normality. S(3,i)= 9.
168
L. PARD0 AND I. J. TANEJA
4. Symmetry, For all P = ( p l , , . ., p , , ) E A,,, & ( P ) is a symmetric function of its arguments, i.e., & ( ~ 1 , * .*
9Pn)
= b(Pr(l),. * -
tPr(nJ,
where z is any permutation from 1 to n. 5. Continuity. For all P = ( p l , . . . ,p,,) E A,,, 8 ( P )is a continuous function of P. 6. Expansibility. For all P = ( p l , .. . ,p,,) E A,,, &(PI
9
9 . ' .
P n , O ) = &(pi, *
9
* )
pn)-
7. Monotonicity. B ( l / n , l/n, ...,l/n) is a decreasing function of n. 8. Minimality. For all P = ( p l , . . , ,p,,) E A,,, & ( P ) is minimum for the uniform probability distribution, i.e., &(l/n,..., I/n) 5 & ( P I , . . .,p") 5 1, with equality on the L.H.S. iff pi = l/n V i and on the R.H.S. if P = Po. 9. Relative to maximum probability. Let pmax= max{ p l , . .., p , } , where P = ( p l , .. .,pn) E A,. Then the following hold:
5 (b)
- Pmax, P m a x h
5
5 Pmax.
Piax
10. Convexity. For all P = ( p , , . . .,p,,) E A,,, B ( P ) is a convex function in A,,, 11. Branching or recursive. For all P = ( p l , . . . ,pn) E A,,, we have
& ( P I ~ = * . ,= P ~&(PI)
+~
- (PI
2 ~ 33 7 . . . 7
+ Pi)'(
p1
1-
Pn)
& P1 +( Pz P1 A A + P2) , ),
+ p z > 0, n 2 3.
12. Productivity or multiplicity. For all P q m ) E A m , and P*Q = ( ~ 1 4 1 , - PIqrn,. -- *
(41 9..
9
a ,
= ( p l , . . .,pJ E A,,, Q = 9
Piqj,
0
.
.
3
P n q l , * * * Pnqm) 9
A,,,,,, we have &(p*Q)= &(W(Q).
13. Sum-representation. For all P = ( p l , . . . ,p,,) E A,,, we can write n
where h ( p ) = p 2 , 0 5 p 5 1.
E
INFORMATION ENERGY AND ITS APPLICATIONS
14. Relation with Shannon’s entropy. For all P a
€ ( P ) 2 1 - H(P)(log,a)Fl,
= (pl,.
169
. . ,p,,) E A,,,
> 0,
where
is the well-known Shannon’s (1948) entropy. 15. Some equations of information energy. Let e(x) = b ( x , 1 - x), for all x E [0,1]. Then we have: (a) 4 0 ) = e(l), (b) e(4) = 1,
for all x, y E [0, l), x (d) For all P
where P,
+ y I1.
= ( p l , . . . ,p,,) E A,,, it
= p1
follows that
+ ... + pl. t = 2, 3,..., n.
The following theorem gives an axiomatic characterization of information energy.
Theorem 1. Let €,,: A,,: -+ R (reals) be a function satisfying gnm(P
* Q)= 4 ( P ) g m ( Q ) I
(1)
for all P E A,,, Q E A,,,, and P * Q E A,,,,,, where n
and h: [0,1] + R is a continuous function. Then the function 8“is of the following form: n
&,,(P)=
pg,
c(
> 0.
i= 1
Proof:
Equation (1) together with (2) gives the functional equation
(3)
Again, if we take s = 1 in (9,and use (6),we get
h ( r / x y )= h ( r / x ) h ( l / h ) .
(7)
Similarly, putting r = 1 in ( 5 ) and again using (6),we get
h ( s / x y )= h ( l / x ) h ( s / y ) .
(8)
Finally, ( 5 ) together with (6),(7), and (8) gives
h(rs/xy) = h(r/x)h(s/y), i.e.,
h h ) = h(p)h(q),
(9)
for all rationals p , q E LO, 11. From the continuity of the function h, we can say that (9) is valid for all real numbers p and q such that 0 I p I 1 , O Iq I1. The nontrivial continuous solutions of the functional equation (9) are given by h ( p ) = pa.
a > 0.
(10)
Eqs. (10) and (2) together give the required result.
Particular cuse: In particular, when CI = 2, we have
Thus the above theorem characterizes information energy in this particular case.
INFORMATlON ENERGY AND ITS APPLICATIONS
171
B. Joint and Conditionul Information Energy Let ( X , Y ) be a joint experiment having a finite number of values, i.e., X = { x l , . .. , x , } and Y = { y l , . .., y m ) . Then the joint and marginal probability distributions are given by n
m
The following relations are well known in the literature: P(xi,Y j ) = P ( X i ) P ( Y j / x i ) = P ( Y j ) P ( x i / Y j ) , n
m
P(xi) =
1
~(xiqyj),
j= 1
and P ( Y ~= )
2
P(xi,yj),
i= 1
foreveryi= 1,2,..,,n ; j = 1,2,..., m. The joint, marginal, and conditional measures of information energy are defined as follows:
n
172
L. PARD0 AND I. J. TANEJA
j= 1
and n
& ( Y / X )=
1 p ( x i ) € ( Y / X = xi),
i= 1
with n
I ( x / Y= y j ) =
1 p(xi/yj12,
j = 1,2,.. .,m,
i= 1
and n
~ ( Y / X= xi) =
1 p ( ~ ~ / x ~ i) =~ ,1,2,. .. ,n.
i=l
In a similar way, we can write the above measures for more than two random variables, i.e., b ( X , Y , Z ) , S ( X / Y ,Z ) , b ( X , Y / Z ) ,etc. Let us denote
9 ( X )= 1 - &(X).
(1 1)
The measure 2 ( X ) is famous as quadratic entropy (Vadja, 1968), which is nonnegative in view of property (8). Now, we shall study some interesting properties of the above multivariate measures of information energy or of quadratic entropy.
Proposition 1. The following hold: (a)
(1)
2 ( X , Y ) 2 2 ( X ) or 2?(Y);
( 2 ) 2 ( X , Y / Z ) 2 9 ( X / Z ) or 2(Y / Z ) ;
(3) S ( X / Y )Iq x ) ;
(4) 2 ( X / Y , Z ) I A?(X/Y)or L?(X/Z); (5)
q x ,Y )I 3 ( Y )+ 9 ( X / Y ) ;
(7) 3 ( Y , Z / X )I % Y / X )
+ Q(Z/X);
( 8 ) 2!(Y / Z ) I 3(Y / X ) + 2 ( X / Z ) , (b) If X and Y are independent, then
qx,Y ) + 2 ( X ) S ( Y )= 2(X)+ q Y ) .
INFORMATION ENERGY AND ITS APPLICATIONS
173
(c) Let d ( X , Y ) = & ( X / Y )+ & ( Y / X ) . then (a) d ( X , Y ) 2 0; d ( X , X ) = 0; (b)
(c)
w,Y ) = d ( Y ,X ) ; d ( X , Y ) + d( Y, Z ) 2 d ( X ,Z ) ;
(d) I 2 ( X ) - 2 ( Y ) (s d ( X , Y ) ;
(4 I 9 ( X * / X 2 )- ~Z(Yl/Y,)lI d ( X l , X 2 ) + 4 Y 1 , YZ). Let us define c e ( X / Y )= 2 ( X ) - 2 ( X / Y ) and B ( X / Y ,Z ) = 2 ( X / Y ) - A?(X/Y,Z ) . Then it can equivalently be written as S ( X ; Y ) = B ( X / Y )- R ( X )
(12)
q x ; Y / Z ) = & ( X / Y ,Z ) - & ( X / Y ) .
(13)
and
The measures given by (12) and (13) satisfy the additive equality
q x ;Z ) + Y(X; Y / Z ) = q x ;Y ) 4-q x ;Z / Y ) . Measure (12) can also be written as n
Y )=
C P(xi)’D(py,x=.x, )I Py), i=l
where D ( P 11 Q)=
5 pfq;’
- 1,
i= 1
for all P, Q E An (qi > 0 W), is the well-known Pearson X2-divergence. Alternatively, let us define n
(14)
174
L. PARD0 AND I. J. TANEJA
Then
In a similar way, we can define *%(Xi Y / Z )= W
X Y , Z
1) P X , Z 4 , Z ) .
(16)
Based on the above definitions, we have the following result, whose proof can be checked easily. Result 1:
(a) $(X, Y ) 2 0 and *B(X, Y ) 2 0 with equality in both iff X and Y are independent, i.e., p ( x i ) p ( y j )V i = 1,. . . ,n; j = 1,. . . ,rn. (b) S(X; Y / Z )2 0 and *B(X; Y / Z )2 0 with equality iff X in both iff X and Y are independent given Z , i.e. p ( x i , y j / z k= ) p(xi/zk)p(yI/zk)
Vi = 1,. ..,n; j = 1,. ..,m;k = 1,. ..,1.
(c) q x ,Y ) s * q x ,Y ) . (d) D ( P 1) Q) is a nonnegative convex function in the pair (P, Q) E An. The measure Y(X, Y) has been studied by L. Pardo (1982a, 1983, 1984a, 1984b, 1987)and Pardo et al. (1985,1989). It is called information energy gain. Some application of these measures towards Markov chains having three or more random variables are given in Section 1V.H. C . Solution to Logic Problems
The determination of the number of necessary questions for specifying completely the answer of a logical problem of interest can be viewed in information theory by the name of “solutions to logic problems.” In other words, it deals with obtaining the minimum number of auxiliary experiments Y = ( Y , ,. . . , q ) (each Yl has rn c n possible results, and moreover the values are not necessarily independent) that are of the form such that it specifies completely the result x i of an experiment X , which takes the values x1,,. .,x, with probabilities p l , . , ,, p n , respectively. A classical example of the logic problem is to find the minimum number of questions needed to name a positive integer less than or equal to N that has been thought of by other person, if the person only replies “yes” or “no” to the questions. Excellent work in this direction using Shannon’s entropy can be seen in Yaglom and Yaglom (1969), P. Gil(l98 I), and A c A and Daroczy (1975). In this section, we analyze the solutions of the problems of this type by using
175
INFORMATION ENERGY AND ITS APPLICATIONS
the information energy. In order to do so, we need only two properties of information energy: (a) The minimum information in discrete random variables that are to be managed is achieved for the random variable with equiprobable events, and its value is l/n. (b) The maximum information is achieved in case of the perfectly determinated experiment, and its value is 1. The necessary information obtained through an auxiliary experiment to get a perfectly determined experiment X is the quadratic entropy, given by (1 1). If we consider an auxiliary experiment Y = ( Y , ,. . . , q),where I.;. are not necessarily independent and obviously Y is not independent of X, it shall be possible to get the information energy gain given by (1 2). The minimum number of components of the experiment Y = ( Y,, . . . , y k ) that is necessary to determinate perfectly the experiment X is k,, if it verifies
&(Y,,..., y , , / X ) - &(Y,,.. ., yk,,) 2 1 - qx). The difference &( Yl, ...,yko/X)- &( Yl, . . ., KO)is maximum if b( Yl, . . ., Y k 0 / X )is maximum and &( Y , , , , . , 6”)is minimum. How do we get the maximum of &( Y l ,.. . ,K / X ) ? By choosing Y = ( Y l ,. . .,&) in such a way that it is perfectly determined by X.How do we get the minimum of b( Y , , . . . , yk)? Obviously, by choosing the joint distribution of Y of the form that it is uniformly discrete. The following example clarifies the situations indicated above. Example 1: Let there be three cities A, B, and C such that the residents of A always speak the truth, the residents of B always lie, and the residents of C alternately speak the truth and lie. What is the minimum number of questions that can be formulated by the representative (investigator) if the residents reply “yes” or “no”?. In this case, the random variable is given by X Representative Probability
1 AA
2 AB
1
B
1
G
3 AC
4
BA
5 BB
8
9
4
1
6 BC
4
7 CA
8 CB
9 CC
v1
8
B
I
where, for example, AB represents a person one finds in A and a representative from B. As a(X)= $, the necessary information to determinate totally the random variable X through an experiment Y = (Y,, , . . , Y,) is 1 - &(X)= $. Each has two different equally probable results, and ( Y l ,..., &) has 2‘ results also equiprobable. If we can select questions of a form totally specified, one knows the value taken by X ;then the solution to the problem can be given by the minimum number k E N such that &(Y1,.
..)&/x) &( Y1,...,Yk) 2 1 - 2-k. -
176
L. PARD0 AND 1. J. TANEJA
Thus the answer turns out to be k = 4.In fact, it easily verifies the following four questions: 1. 2. 3. 4.
Am I in one of the cities A or B? Am I in the city C? Do you live in the city C? Do I live in the city A?
Some more examples of the same type have been extensively studied by Morales et al. (1987). rn
D. Information Energy Divergence From the convexity property of information energy, we can write
for all P, Q E A,. Then the difference
turns out to be a nonnegative quantity. The measure I(P 1 Q) given by (17) we call the information energy divergence. After simplifications, we can write
The following result can easily be verified. Result 2: I(P 1 Q) is a convex function of the pair (P, Q) in A,, x A,.
111. INFORMATIONENERGYAND INFORMATION ENERGY GAIN FOR
CONTINUOUS RANDOM VARIABLES
In this section, we analyze the notion of information energy to the continuous case and examine some of its properties.
INFORMATION ENERGY AND ITS APPLICATIONS
177
A. Definition and Properties
Let X be an absolutely continuous random variable having probabilities density function f ( x ) ; then the information energy is defined by
&(X)=
I
f ( x ) 2 dx,
provided the integral exists. The difference between the continuous and the discrete case is worth emphasizing-that is to say, the information energy in the continuous case is not a limit of the information energy in the discrete case. More precisely, let X be a discrete random variable having uniform distribution, i.e., p i = l / n V i = 1, 2,. . . ,n. Then
c p: n
8 ( X )=
= n(l/n)2 = l / n .
i= 1
As n increases, the distribution of X converges to a continuous uniform distribution in (0,l). Let Y be a continuous random variable with uniform distribution in (0,l); then we have b( Y) = 1, while limn+mb ( X ) = 0. Now, we shall show that the information energy may not be invariant with respect to change of variables. For this, consider y = g(x), V x E X , where g is a strictly increasing function of x . Since the mapping from X to Y is oneto-one, we have MXO)
= f*(YO)lS’(XO)l~
where g(x,) = y o . Accordingly,
From properties (1) and (8) of Section ILA, we see that in the discrete case the information energy lies between 0 and 1-i.e., 0 &(X)I1-while this is not so in the continuous case. In order to check this, let X be a continuous random variable with uniform probability distribution on the interval (0,c), c < 1; then b ( X ) = l/c 1. These differences between the discrete and continuous distribution warn us that the results holding for the discrete case cannot be extended directly to the continuous case. They need independent verification.
-=
=-
178
L. PARD0 AND I. J. TANEJA
The following two results can easily be verified by using standard inequalities holding for integrals. Result 3: Let X be a continuous random variable with density function f ( x ) , x E (a,b ) and f ( x ) = 0 for x $ (a,b). Then 1
a(x)2 b-a with equality sign iff
Result 4 : The measure of information &(X)is a convex function of a conm tinuous random variable X with density function f ( x ) .
Similarly to the discrete case, by using the convexity property of information energy (Result 4) in the continuous case, we can write the information energy divergence given by
B. Joint and Conditional Measures of Information Energy and Information Energy Gain
Let X and Y be two continuous random variables defined on the same sample space with a joint density function f ( x , y ) . Similarly, to the discrete case, the joint information energy of X and Y is defined by
a(x,Y) =
lR
[Rf(.,Y)’dxdY.
The conditional information energy of X given the value y of Y is defined by
INFORMATION ENERGY AND ITS APPLICATIONS
179
Finally, the conditional information energy of X given Y is defined by
4 X / Y ) = EA4X/Y = Y ) ) =
s.
f ( Y ) ~ ( X /=Yy ) d y
The joint information energy can also be written as
4x3 Y ) =
I
m 2 m / y= Y)dY
As in the discrete case, let us define the measure, popularly referred to as information energy gain (Pardo, 1982a),as follows:
q x , Y) = &(X/Y)- &(X).
(19)
The following result gives properties of the above bivariate measures of information energy. Result 5 :
(a) &(X/Y)2 &(X), i.e., Y(X, Y) 2 0 with equality iff X and Y are independent. (b) If X and Y are independent, then &(X,Y)= 6(X)d(Y). In the literature on statistics, there exist many special probability distributions that are useful for application purposes. It is quite interesting to find the value of the measure of information energy for these distributions. This is given in the following subsection.
C . Information Energy for Different Probability Distributions Distribution Name
Cauchy:
Density Function
f ( x ) = (A/n)(A2 + X 2 ) - l q
-~<<<<,I>O.
Information Energy
180
L. PARD0 AND I. J. TANEJA
Distribution Name Density Function Ie-"xl Double exponential: f ( x ) = ___ 2 ' -02
Chi-square:
Information Energy
I/4
< x < cc,I > 0. X(n/2)- le-x/2
f(x)= 2n/2r(np) ' x > 0, n is a positive integer.
Exponential:
f ( x ) = u-le-x'r,x, u > 0
F-Snedecor :
f ( x )= K(m,n)(;x)
( m / n )- 1
where k ( m , n )= B(;,
(1
f)(F),
+ s=
m+n -2
x > 0, m and n positive integers.
+ e-x)-2,
Logistic:
f ( x )= e-x(l
Lognormal:
f ( x ) = (ox(2l1)''~)-' e-clogx-m)'/(2a')
-cc < x < cc
,x>o.
MaxwellBoltzmann:
f ( x ) = (4n-112f1312)x2e-8x'.
Generalized normal:
f ( x ) = (2p"'2/r(a/2))x'-le-Bx', x , a, fi > 0.
Pareto:
f ( x ) = ak'/x"+', x 2 k > 0, a > 0.
aL
k(2a
+ 1)
INFORMATION ENERGY AND ITS APPLICATIONS
181
Distribution Name
Density Function f ( x ) = (.x/b2)e-xz"2b1J, x, b > 0.
Rayleigh:
Information Energy
(n) "( 46) -
Student-t:
-m < x < co, n is a positive integer Triangular:
Uniform : Weibull:
Gurnbel:
Normal multivariate:
Remark I : The above table recently has been extended by J. A. Pardo et al. (1990), for a measure of the form
which, interestingly, leads to entropy-type measures and some distance measures. D. Information Energy in the Field of Hyperreal Numbers
In Section III.B, we have seen that the measure of information energy for a continuous distribution is not ti natural extension of the information energy for a discrete distribution. If one tries to define the information energy of a continuous distribution as the limit of a discrete case, then it would take the value zero, which makes such a definition impossible. Therefore, the information energy of a continuous distribution has been defined by neglecting this null term. This rather artificial trick causes the information energy of a continuous distribution to have unnatural properties that the information energy of a discrete distribution does not have. Another problem, which is
182
L. PARD0 AND I. J. TANEJA
even more serious, is that the measures of information energy d o not lead to a natural definition of the information energy of general mixed discrete and continuous distributions, because separate definitions are adopted for the discrete and continuous cases. In this section we shall give, in a way similar to that given by Ozeki (1 980) for Shannon’s entropy, a new definition of the information energy based on nonstandard analysis, and we will show that it has natural properties such as invariance under transformations. As is known (Section IILA), it is not verified by information energy in the case of continuous random variables. In light of this new definition, the meaning of continuous information energy given in Section 1II.A will be more clear. We begin with conventional notations used by Keisler (1976) for nonstandard analysis. Denote by R the field of real numbers, and by R* the field of hyperreal numbers. An element x E [w* is infinitesimal if 1x1 < r for all positive real r. The natural extension of a real function g is denoted by *g. The natural extension of a set A c Iw is denoted by *A. For x, y E R*, we write x N y if x - y is infinitesimal. For a finite x E R*, the unique real r = x is called the standard part of x; in symbols, r = st(x). An integral sign always means the Riemann integral. Let X be a random variable defined on some fixed, but otherwise arbitrary, probability space (a,A , P), and let F be the distribution function of X . We suppose, initially, for simplicity that the probability distribution is defined in a finite interval [a,b]. Let Ax be a positive real number, and we consider the sequence of real numbers a = xo < x1 = a
+ Ax
< x, = a + n(Ax),
where n(Ax) is the least integer n such that a sequence by S:(Ax).
+ n(Ax) 2 b. We denote this
In this context, the information energy of the random variable X with respect to the sequence S:(Ax) is defined by
The mapping g that associates with each positive real number Ax the value is defined on (0,a); therefore, its natural extension *g(Ax) = g(Ax) = *gAX(X) is defined on *(O, co). Let 6x be a positive infinitesimal. The information energy of X,measured with respect to 6x, is defined in the following terms: Definition 1. The information energy of the variable X with respect to the positive infinitesimal 6x is defined by *&&(x).
INFORMATION ENERGY AND ITS APPLICATIONS
183
Now, we shall particularize the newly defined information energy in the following cases: (a) the random variable X has a discrete distribution; (b) the random variable has an absolutely continuous distribution; (c) the distribution of the random variable X has both a discrete part and an absolutely continuous part. (a) Let X be a discrete random variable taking on the values t l , ...,t, ,. . ., < t , 5 b) with probabilities p l y . . ,p,. In this case F can be written as r 0 forx <0 where s ( x ) = F(x)= pis(x - ti), 1 for x 2 0' i= 1
(a < t ,
-=
c
The information energy of the random variable X with respect to A x with 0 A x < m and m = min{(t, - tl), .. .,(tr - t,- ,)} can be rewritten in the following form:
-=
Hence (Keisler, 1976) every hyperreal solution of 0 c A x < m is a hyperreal solution of (21). Thus for any positive infinitesimal hx,
That is to say, in this case the information energy given by Definition 1 coincides with the information energy given by Section 1I.A. (b) Let X be an absolutely continuous random variable with probability density function f ( x ) . By the mean value theorem, there exist t i - E [ x i - x i ] , such that
F ( x ~-) F ( x ~1)- =
s'
f ( x ) d x = f ( t i - I ) ( x ~ - xi-
1).
i = 1, 2,. , n.
xi-1
Therefore, it follows that
for any A x > 0. Hence every real solution of A x > 0 is a real solution of (22). Thus for
L. P A R D 0 AND I. J. TANEJA
184
any positive infinitesimal dx, n(Ax) i= 1
=
i= 1
*(dx)*y(Ax),
where st(*y@x))= ,,"f(x)'dx. (c) Let X be a random variable with distribution function F(x) =. &(x)+ F,,(x),where t , , . . .,t, are the discontinuous points: &(x) =
2 pis(x - ti)
i= 1
f X
and F,(x) = J
f(t)dt. -m
Then,
pif(ti- Ax. where ti - E [xi- xi],and &Ax) = Hence every real solution of 0 < A x < rn is a real solution of (23). Thus, for any positive infinitesimal dx, we have
*$,(X)
+
= *(Ax)*(y(Ax))
i= 1
p;
+ 2*4(6x),
where *b(dx)is an infinitesimal and the st(*y(dx)) = j f : f ( x ) zdx. The following result shows that the newly defined information energy is invariant under transformations. Let k(.) be a continuous and strictly increasing function with both domain and counterdomain the real line, and X a random variable with distribution function F. We suppose that the probability distribution is confined to a finite interval [a, b]. We denote a' = k-'(a) and b' = k-'(b). Consider the sequence of real numbers xo = a', x 1 = a'
+ Ax,. . .,xn = a' +
AX),
INFORMATION ENERGY AND ITS APPLICATIONS
185
where n(Ax)is the least integer n satisfying x, 2 b'. This sequence is denoted by S$(Ax). The expression n
&LAX) =
1( F ( y i )- F(yi-
i =1
where
F ( y ) = P ( X Iy) and yi = k(xi),
i = l,.. . ,n,
is denominated information energy induced for the function k with respect to the sequence Si:(Ax). Based on the natural extension, *u(Ax)= * 8 i , ( X ) , of the function AX) = &i,(X), we can define the information energy of the random variable X induced by the continuous and strictly increasing function k with both domain and counterdomain the real line, with respect to the positive infinitesimal Sx, by * 4 , ( X ) . The following result justifies that the information energy *e;,(X)is invariant under transformations in the following sense: Proposition 2.
For any positive infinitesimal 6 x we have
* 4 m= *Q;,(k(X)), where k is a continuous and strictly increasing function with both domain and counterdomain the real line.
Since this holds for any A x
=- 0, it is verified for any positive hyperreal 6 x that *4&)
=
*G,(k(X))
The following results can easily be verified by using the non-standard analysis.
186
L. PARD0 AND I. J. TANEJA
Result 6: Let X be a random variable, then we get: (a) For any positive infinitesimal Ax, *Sl,,(X) 2 0. (b) Let 6x and 6x' be positive infinitesimals such that there exists c E Iw, c > 1 verifying Sx' 2 Sx; then if X is a continuous random variable, we have
*&,(X)2
*ag,(x).
Assuming the properties of the hyperreal numbers, it is easy to extend the definition of information energy to the case in which the random variable is not bounded.
Now we consider the bivariate case and define the joint information energy based on the nonstandard analysis. Let X and Y be two random variables defined on the same probability space (0,A, P) with joint distribution function F ( x ,y). We first consider the case where the domain of F is a rectangle A = [a,, b,] x [a,, b,]. Let Ax and Ay be positive real numbers. We partition the interval [a,, b,] into subintervals of length Ax and [a,, b,] into subintervals of length Ay. The partition points are Xo=ai,x,
=a1
+AX,
~ 2 = a +, ~ A x , . . . , x , = u , AX),
+ Ay,
+
+
2Ay,. ..,yn = a, n(Ay), yz = where n(Ax) and n(Ay) are the least integers where n and rn are such that x, 2 b , and ym 2 b,, respectively. We have partitioned the rectangle A into Ax by Ay subrectangles with partition points (xi,yj),1 I i i n, 1 Ij Im. This partition is denoted by $(AX, AY). Now we shall introduce the following notation: yo = a25 y , = a2
Pij=
F(xi,yj)-F(xi-,,yj)- F(xi,yj-,) + F ( x i - 1 ~ y j - ~ ) ~
pi,j = {:/p.j
if p e j # 0 if P . = ~ 0
In this context, the joint information energy of X and Y with respect to the sequence &(Ax,Ay) is defined by
The mapping g , that associates with each positive number Ax and Ay the Y) is defined on (0,co) x (0,co); therefore its value gl(Ax,Ay) = 6;Ax,Ay,(X, Y) is defined on *(O, a)x *(O, co). natural extension *g,(Ax, Ay) = *4Ax,Ay)(X,
INFORMATION ENERGY AND ITS APPLICATIONS
187
Given two infinitesimal 6x and 6 y , we define: Definition 2. The informational energy of the random variable ( X , Y) with respect to the positive infinitesimals 6x and 6 y is given by * ~ ; R x , s p , (Yx ), .
The following result gives properties of these bivariate measures of information energy.
Result 7: Y ) be a discrete random variable that takes pairs of values (a) Let (X, (xi,yj), i = I,. ..,n and j = 1,. . ., m with probabilities p l l , . . .,prim; then it is easy to prove that
(b) Let ( X , Y) be a continuous random variable with probability density function f ( x , y ) ; then
with (ti-
1)
E (Xi - . x i -
,] x (yi - yj- 11
and
= AxApY(A.u,Ay),
where
Hence, given two positive infinitesimals 6.x and Sy, it follows that *G;dx,ay)(X, Y ) = *(Sex6.~)*('f"(Ax, AY)X
where
188
L. PARD0 AND I. J. TANEJA
Now, we define the conditional information energy of X given Y with respect to the sequence $(Ax, Ay) by
The mapping g2 that associates with each positive number Ax and Ay the value g2(Ax,Ay) = S,,,,,,(X/Y) is defined on (0,m) x (0,oo); therefore its natural extension *g2(Ax,Ay) = *B;,,.,,,(X/Y) is defined on *(O, co) x *(O, a). Given two infinitesimal positive 6x and 6y, we define:
Definition 3. The conditional information energy of X given Y with respect to the positive infinitesimals 6x and 6 y is given by *c!&~,~~)(X/Y). The following results given properties of these conditional measures of information energy ( 'ardo 1985a): Result 8: Let (X, Y
(a) (b)
Iv. STATISTICAL ASPECTSOF INFORMATION ENERGY In this section we study some statistical applications of the measure of information energy. We analyze the importance of this measure in various aspects, such as the comparison of experiments in a Bayesian context, the design and comparison of regression experiments in a Bayesian context, the design of a rule for a sequential sampling, the sequential design of a fixed number of experiments, the analysis of diversity, the point process, bounds on the probability of error, etc. A . Comparison of Experiments in a Bayesian Context. Relation
with Classical Approaches: Lehmann and Blackwell
Let A be an experiment whose possible results 0 belong to a parameter space 0.Before we make any decision, let us suppose that the consequences of the statistic depends on the result 0 of the experiment A. We can observe the realization of the experiment X with statistical space (%, &,QeEe. The observation of X to proportionate information about 0 will help us to
INFORMATION ENERGY AND ITS APPLICATIONS
189
make a good decision. Let us suppose that 4 is absolutely continuous with respect to a countable measure, or with respect to Lebesgue’s measure, and let f ( x / 8 ) = ( d & / d i ) ( x )denote its density function or probability function. Let us associate with the space 0 a a-algebra 4 ,and over this a-algebra let us consider an absolutely continuous probability measure with respect to Lebesgue’s measure, or with respect to the countable measure, and p ( 8 ) = (dr/dv)(O)denoting its density or probability function. The predictive distribution of X,i.e., inconditional distribution of X,is given by f ( x ) . Once we realize the experiment X, the knowledge that we have about 8 represents the a priori distribution; it changes according to the observed value and is represented by the a posteriori distribution, p(fl/x). The information given by observing the value x of X can be quantified as the difference between the information we have about 6, after the realization of experiment and the information we had before doing the experiment. If we quantify this information using the information energy measure, the information gain about 8 given by the observation x of X is expressed as follows: g w ,P ( 4 x ) = &Pp(.)) - & ( P ( . / X ) ) ,
where
&(P(*))= and &P(*/X))
=
Im 2 I
w4
P ( W 2 dv(8).
The expected information about 6, before observing the experiment X is given by 4x3
P ( 4 = E,(&X,
P6). 4).
Therefore, the expected information gain about 6 is
W,~ ( 4 =)
IT (
~ ( O l xdvV3 )~
~ ( 6 ,d) W ~ .
(25)
In L. Pardo (1984a, 1984b) can be seen an analysis of expression (25) for normal and binomial populations. If we have to choose among different experiments associated with 0 in order to obtain information about 8, one natural question arises: Which experiment should we choose? In this context, we shall apply the information energy gain given in (25) in the comparison of experiments according to the
L. PARD0 AND 1. J. TANEJA
190
Bayesian approach. This comparison is done on the basis of the knowledge we have after and before the experiment, i.e., according to the information given by the experiment. Let X and Y be two experiments associated with 0 with statistical spaces (GPs , f ( x / % M land (W9Be4 f(YP))OE@? respectively. 7
Definition 4. We say that the experiment X is preferied to experiment Y with respect to a prior distribution p(O), denoted by X 2 Y, if and only if
9 ( X ,P(.)) 2 %( y, P ( 4 .
(24)
Also, we say that the experiment; X and Y a;e equivalent with respect to p ( 0 ) , denoted by &, if and only if X 2 Y, and Y 2 X . It is easy to verify that the relation 2 is complete preording. Based on the criterion given in Definition 4, we have the following results (Garcia-Carrasco, 1983): Result 9: (a) Let X be any experiment and N be bhe null experiment (i.e., the distribution is independent of 0 a.e. A); then X 2 N . (b) Given the compound experiment ( X , Y ) , where X and Y are the corresponding marginal experiments, then
(x,Y) 5 Y
(or X ) with equality iff f ( y / x , 0 ) is independent of 0 (respectively, f(x/y,O) is independent of 0). (c) Let X ( " )be the resulting experiment after observing X independently n times; then X(n) p y c n - 1 , , V n 2 2.
5,
(d) Let Y, and Z be three experiments on 0. Let Z be independent of X and Y. If X 2 Y for all prior distributions, then
5
(X,Z) (Y,Z) for all prior distributions. (e) Let X , Y, Z , and W be four experiments defined over 0 such that X 3 Y and Z W for all prior distributions. Also, we suppose that X is independent of Z , and Y is independent of W. Then
5
s
(X, Z ) (Y,W ) , for all prior distributions.
191
INFORMATION ENERGY AND ITS APPLICATIONS
( f ) Let X = (2Z,fiT,f(x/O))oEebe an experiment and (EJIENbe a partition of X for elements of a-algebra &-.Let us consider another experiment Y = (9,pBe,Qe)eEB, where pa is the o-Zlgebra generated by (Ei)isNand Q H ( E l )= l E , j ( x / O ) d A ( x ) ,V i E N . Then X 2 Y with equality iff p(O/x) = p ( f l / E , )a.e. I , V i E N . (8) For all siatistics, T = T ( X c n )based ) on the experiment X("), it is verified that X ( " )2 T, with equality iff T is a sufficient statistic. (h) Let S and T lp two sufficient statistics on X("'and Yc"),respectively; then X ( , ) 2 Y(")iff S 2 T. (i) Let the experiment X = (R,&,&)ese. Let T = T ( x ) be a strictly monotonic and derivable function from (w to R. Then X and T are equivalent experiments. (j) Let XI,, , . ,X, be experiments defined over (R, p,), and consider a transformation h = (hl, h,, ...,hm)from DB" to R" such that for every fixed (x,,,~,, . . . , x ~ - ~ the + function ~ x l , . .*
,xn - m ( X n
- m+
1 9 . .
. ,~
n
=) h ( x , , x , , . . -,x , )
is bijective (one-to-one and onto) with first partial continuous derivatives. If Yl = h , ( X , , . . . , X , , ) , .. . , Y, = h , ( X , , . . . ,X,), then the compound experiments X'") = (X,, . , .,X,,) and
Z(,'
. . ,X,,-,
=(X,,.
Y i , ., . ,
x)
whose probability functions are
f(xl ,..., x,/O) respectively, are the same.
and g ( ~ l r . . . , x n - m , y .l, ,y.m. / O ) ,
m
Now we analyze the relation between the criterion given in Definition 4 and the classical criterion of Lehmann. We also study the relation between the criterion given in Definition 4 and the two well-known Blackwell's criteria based on the sufficiency property and on the decision theory. Lehmann's (1959) definition of comparing two experiments is stated as follows:
Deption 5. The experiment X is preferred to experiment Y, denoted by X 2 Y, iff there exists an experiment V with known distribution independent of X , and also there exists a measurable function h(x, u) such that the random variable H = h ( X , U ) is equally distributed to Y for every 6 E 0. Based on the above definition, we have the following result (GarciaCarrasco, 1983); Result 4: Let (F,Pr, &)ese and (@, p?, Qe)oEebe two statistical experiments with the same parameter space. If X 2 Y, then X Y.
L. PARD0 AND I. J. TANEJA
192
Now we study the relation between the criterion given in Definition 4 and Blackwell’s sufficient criterion (1951); see also Blackwell and Girshick, 1954.
Definition 6. Let (3,&,&)e.e and (Y,bg, Qe)Beebe two statistical experiments with the same parameter space. Blackwell‘s method for comparing two experiments stttes that the experiment X is sufficient for the experiment Y, denoted by X 2 Y, if there exists a measurable transformation h: 3 x g + R satisfying: VY E g. (i) f ( y l 8 ) = jx h(x,y ) f ( x / O )d W , E 0, (ii) For every fixed x E %, h ( x , y ) is a probability density function on
w,
Pq).
(iii) jI h(x,y ) d l ( x ) < co,V y E g.
For the finite parameter space, Pardo (1983) established the following theorem:
Theorem 2. Let {3,Pr, f ( x l 8 ) ;8 E 0 } and {Y, Pq, f ( y / 8 ) ;8 E 0)be two statistical experiments defined over 0 = {01,O,, . ..,On}, and p ( 8 ) = (p(8,), p(8,), . ..,p(8,)) be a probability distribution on 0.If X 5 Y, then X Y. Proof: Since Y is sufficient for X, then there exists a function h that verifies the following:
(i) f ( x / O i )= j % h ( x , y ) f ( y / O J d A ( y ) (ii) J r h ( x , y ) d i ( x ) = 1
V i = 1, ..., nand X E X ,
v y E q,
and h(x,y ) d l ( y ) < 00 Consider the function
Then
V x E 3.
INFORMATION ENERGY AND ITS APPLICATIONS
193
and
Let 2 = (Ze,(y)= f ( y / O , ) , . . .,Z,,(y) I f ( y / O n ) )be a random vector, where the density of ZO#(i= 1,. . ., n) is given by
f ( Y l 4 = h k Y)/S, h(x,Y )W Y ) . We suppose that the random variables &,(i Consider the convex function
= 1,. .
.,n)
are independent.
Then
and
Applying Jensen’s inequality to the function 0,we obtain
Since
it follows that
Finally, integrating the expression (27) with respect to S and simplifying, we get the required result.
194
L. PARD0 AND 1. J. TANEJA
When the parameter space is not finite and the family of distribution functions associated with the experiment X is complete, we have the following result (Garcia-Carrasco, 1983): and (Y, b%,f(y/O)),,, be two experiments Result 11: Let (X,fip,,f(x/Q)),,. defined over 0 such that (k&Es is a complete family of distributions. If $ is preferred to the experiment Y according to Blackwell's criterion, then X 2 Y.
m Now we shall study the relation between Blackwell's criterion based on the decision theory and the information energy gain criterion, when the parameter space is finite and the prior distribution is uniform. Let 0 = {O,,fI2 ,...,0,) be a finite parameter space. Let X = (X,flBE, p B , J i = , , , _ , be an experiment for each i. Let us consider a pair ( X ,A ) , where A is a closed bounded convex subset of R" whose elements are terminal action points a = ( a , , a 2 ,. . .,a,), i.e., a, = L(Oi,d) ( i = 1, 2,. . .,n) is the loss from action ai, and d is an arbitrary decision function. When the state of nature is Qi,the risk is
R~ = R(ei,d) =
SI
L ( O ~ , ~ ) ~ P , ,( i(=~ i), 2 , . . ., n).
As d varies over all possible decision functions, for the risk problem we have
B ( X , A ) = { R = ( R , , . . . , R,)/d
E
D}.
According to Blackwell's definition, we say that the experiment X is more informative than Y, written by X 2 Y, if for every A c R", a closed, bounded, and convex set, we have E ( X , A ) 3 E( Y, A ) . Reduction to standard experiment gives a condition equivalent to X 3 Y. For any experiment X = (Y,Pr., P,,Ji= ,.,,,",let p,,(x) be the density of with respect to nPo = + + em.Let 3 be the set of n-tuples z = ( z , , z2,.. . ,zJ, zi 2 0,C;= z, = 1. For any Bore1 subset A of 2,let us define
ei
e,
mi(A) = P,,(',x E X / P ( X )= (pel(xL* * * ?pen(x))E A } )
so that mi(i = 1, 2,. . . , n ) is the distribution of z , where x has the distribution p e n .We now have a new experiment
X* = {~,bPu,mi)i=1,2,....nr called the standard experiment, and the measure 1 "
m, = - C mi, n i=l defined over (3, Pa), is called the standard measure.
INFORiMATION ENERGY AND ITS APPLICATIONS
195
The following result (Blackwell, 1951) is a valuable tool in the comparison of experiments: Let X and Y be two experiments with standard measures mx and m y , respectively. X 2 Y if and only if for every continuous convex function g(p ) , n
1
Garcia-Carrasco (1983) established the following result: Result 12: Let 0 = {Ol,02,.,,,0n} be a finite parameter space with the uniprm distribution, and let X and Y be two experiments. If X 3 Y, then x 2 Y. rn B. Information Energy in the Design and Comparison of Regression Experiments in a Buyesian Context
In this section we suppose that the experiments are of the following form: y = R ~0, = R k ,J;
k
= i=l
piaii
+ ej,
j = 1,2, . . . , n ,
where e = (e,,e2,. ..,en) is normally distributed with mean vector (0,.. . ,0) and precision matrix P. Also suppose that fi is normally distributed with mean vector In, and precision matrix Po. The n x k matrix A = {uij) is called the allocation matrix, and the rows of A are called the allocation vectors. We use the information energy gain to obtain a number of very interesting results: 1. The matrix A that maximizes the information energy gain is the one that maximizes I A'PA + Pol, where A' represents the transpose of A . 2. To achieve maximum information energy gain, it is not necessary to use more than $k(k + 1) of the given allocation vectors. 3. If there is homocedasticity, then the information energy gain is maximum when the design matrix is diagonal.
First we calculate the information energy gain provided by a regression experiment about the vector p, when the initial opinions about pare described by a multivariate normal density. Proposition 3. Let us consider the regression experiment Y = Afi + e, where e has a multivariate normal density with mean vector (0,.. . ,O), where the precision matrix P and the n x k matrix A are known. Suppose the prior knowledge of p is expressed by a multivariate normal with mean vector Po and
196
L. PARD0 AND I. J . TANEJA
precision matrix Po. Then & ( p ( * / ~ j )=) I PoI
I PO F
+ I I "2(~4)-(k'2),
and
%(Y,p(.))= IPo)-'/2(4n)-'k/2'(JP,'F + I J ' / 2 - I), where F
=
A'PA is the Fisher information matrix.
Proof: By the table in Section III.B, the information energy of a multivariate normal distribution with mean vector T and precision matrix C is given by
&(N(T,C ) ) = IC(1'2(n4)-"i2. Since the posterior distribution (DeGroot, 1970) is a multivariate normal with precision matrix A'PA Po, we have
+
d(p(*))= IF
+
POI^'^(^^)-^/' = ( 4 ~ ) I-Po~I '//'I ~PO F
+ I I' I 2 ,
and 9 ( Y , p ( . ) )= (47C)-k/21P01"21Po'F + 111/2 - ( 4 7 p l P 0 l " 2
+
= (4n)-k/2(Po(1/2((P,'F
- 1).
W
Remurks 2: (a) If we decide to make experiments until the information energy reaches a certain level, then the fact that E ( p ( . / y ) )is independent of y allows us to state in advance whether a particular experiment will give us the required gain. (b) If 4 = Ma is a nonsingular transformation, then %( Y , p ( . ) )remains the same, whether we consider information about p or about 4. In fact,
g ( Y , p ( * ) ) =~ P o ~ " 2 ( 4 ~ ) - k ' 2I\'/' ( ~ ~-~1)' ~ + = (4n)-k/21 ( M ' ) - 'P,M
'
=(41~)-~'~l(MP; ' M ' ) - 1 'I2(( = (4K)-k/21(~po 1 ~
+ I 11/2 - 1 ) MP,'M'(M')-'A'PAM- ' +
- I1/2( IM P , lA'PA M -
- 1)
' ) -11112
x (I(MP,'M')(AM-')'P(AM-')
- 1)
= $(Y,,p*(*)),
+
where Y, = A M - ' e, and the prior distribution p*(.) is a multivariate normal with mean vector PoM and precision matrix (XPO'M')-'. w
INFORMATION ENERGY AND ITS APPLICATIONS
197
Now we shall use the criterion for maximizing the information energy gain given in (25) to establish the linear regression experiment to know which is more informative, Y, = A,/3 + el or Y2 = A,fl + e2, when the prior distribution of fl is a multivariate normal.
Proposition 4. Let us consider the linear regression experiments Yl = A , P + e l and Y, = A2/3+ e 2 and suppose the prior knowledge over fl is given by a multivariate normal k-dimensional distribution with mean vector Po and precision matrix Po. Then Y, is preferred to experiment Y2 if and only if IF, + POI 2 IF,
+ Pol,
(28)
where F1 and F2 are the Fisher information matrices.
Proof: As
a(x,p(-)) = ( ~ x ) - ~ , ' ( I A : P+~ A ,
- IPOl'"),
it follows that Yl is preferred to Y2 if and only if
lA;P,A,
+ POI 2 IA>P,A, + POI,
~e.,
Remarks 3: (a) Criterion (26)can be used even when not all the pi are estimable, i.e., when F is singular, whereas in this case trace F becomes infinite. (b) It can be immediately proved that a necessary and sufficient condition in order for the linear regression experiment Y, to be preferred to Y2 for all positive definite matrix Po, is that matrix F = F , - F, be positive semidefinite. (c) If we suppose that Poand 6, i = 1/2, are nonsingular, criterion ( 2 6 )for maximizing the information energy gain permits us to compare the linear regression experiments for all precision matrix Po. In fact, IF,
+ Po( = (Po((P6'F1+ I ( = (Po((P6'F,((I + (Pti'F,)-'( = ( F , ( I I + Fi'Pol 2 IFII.
It follows that the linear regression experiment Yl is preferred to Y2 if and only if IF,/ > (F,I. In this case the criterion for maximizing the information energy gain gives rise to the D-optimal criterion. As pointed out by Stone (1959), the conditions under which it is valid are either (1) all the diagonal elements of Po are large, representing large prior uncertainty for all the
198
L. PARD0 AND 1. J. TANEJA
parameters, or (2)all the diagonal elements of F a r e large, which is usually so if n is large. (d) By Lemma 3.1 given by Stone (1959), the maximum information energy gain can be achieved immediately in a linear regression experiment where it is not necessary to use more than $ k ( k + 1) of the given allocation vectors. m
Proposition 5. If Po = P o l k , P = PI,, and laj,\ Ib, then the information energy gain is maximum when the matrix F = A 2 is diagonal.
Pro@ Since B = p/poA'A + 1, is a positive definite matrix, its determinant is not larger than the product of its diagonal elements (see Rao, 1973).It follows that
We shall now consider the modifications in the maximizing criterion of information energy gain imposed by the presence of nuisance parameters. Consider pl,. . . ,P k l as the parameters of interest ( k , c k ) , and pk,+ I , . . . , Pk as the nuisance parameters. Write Y = Alp:
+ A2PT + e,
where f l f = ( P 1 ~ . . - ~ f i k l ) , P T = ( P k l + l . . . . , p k ) , with dim(A,)=n X Kl, dim(A,) = n x ( k - kl), and e = ( e l , .. . , e,) is normal with mean vector (0,.. . ,0) and precision matrix I, i.e., I is an identity matrix. It is known that the maximum likelihood estimate for p = (fly, PT),is
P = ((AlAl)-'AI - (A;A,)-IA,Az(A;Q,Az)-'A;Q,) ( A> Q A z 1 'A> Q Y '1, 1
-
1
Y,
199
INFORMATION ENERGY AND ITS APPLICATIONS
where Q1 = I - A,(A;A,)-'A',. Furthermore, E ( p ) = p, and the covariance matrix is
(
-
(A;Q,A2)-1A;AI(A;Al)-1
Therefore the precision matrix for 8 is
Moreover, G is the information matrix. If we suppose that the prior distribution of fl is a k-multivariate normal distribution with mean vector Do and precision matrix
(
H Po = E'
E B).
the posterior distribution of P conditioned by is a k,-multivariate normal with precision matrix (G + Po). The marginal distribution of PI is a k , multivariate normal with precision matrix P8
= (H-'(l - E(B - €X-'€)-'E'H-'))-',
and the posterior distribution of normal with precision matrix
P* = ( A i A , - H)-l(I - (A\A,
P1, conditioned -
by
P, is a &,-multivariate
E ) L ( A > A ,- E ' ) ( A > A ,- H ) - ' ) - ' ,
where L
= ((AiA,-
B ) - (A>A, - E)(A;Al - H)-'(A;A2
- E))-'.
Now we prove the following result: Proposition 6. Consider the regression model Y = A,P: + A l p ; + e, where Y is normal with mean vector Alp: + A2PT and precision matrix I , fit = ( f i l ? . . . ? f l k , ) , b T = ( f l k , + l , . . . > P k ) . and suppose P : = ( P k l + l r . . . r P k ) are nuisance parameters. If the elements of matrix IP,G) are small, then
r ( Y , p ( / Y ) )= ( ~ Z J - 2((lA;A2111G1)112 ~ - IpoI' '1, where G is the Fisher information matrix, and the distribution of
p* =
(PT, /jT) is a multivariate normal with mean vector Po and precision matrix H
E
" = ( t 'B )
200
L. PARD0 AND I. J. TANEJA
Proof: Since the posterior distribution of conditioned by multivariate normal distribution with precision matrix
is a k , -
P* = (A;Al - H ) - y I - ( A i A , - E)L(A>A, - E’)(A>A, - H)-1)-1, then
9(Y,P(PT))= (474-k / 2(Ip*11/2 - IP01”2). On the other hand, (G
+ Po) = G(I + G-’Po) = G(I + ( P i ’ G ) - ’ ) .
If the elements of the matrix {P,G)-’ are small, then (G
Thus, we have (P*)-’
+ Po) z G.
= A, with
Remark 4 : (a) The conditions under which the elements of (P;’G)-l are small are either (1) all the diagonal elements of Po are small corresponding to large prior ignorance of all the parameters, or (2) all the diagonal elements of G are large, corresponding to a “strong” experiment. (b) In order to compare the experiment Yl= CIPl + C2b2+ el with the experiment Y2 = HIP1 H2P2 + e 2 ,in the previous hypothesis, it is necessary only to calculate
+
(I c;c21/1 G1I) and (l~;H2l/l G*D. C. Information Energy as a Rule of Sequential Sampling
If the statistician can take observations and at each stage must decide, at the sight of the amount of information obtained about 6, whether to stop or to
INFORMATION ENERGY AND ITS APPLICATIONS
20 1
continue and take the next observation, then the following stopping rule based on the information energy (Pardo, 1984a) is defined:
Definition 7. The sequential observation rule states to stop observing after the values XI= xl, X , = x 2 , .. .,X,, = x, have been observed if b ( p ( . / x , , ...,x,)) 2 6,
6 > 0,
where 6 is a constant that depends on the amount of information required in each particular problem by the statistician, according to subjective criteria, and to continue observing if
8(p ( ’/XI,* . . ,X”)) < 6. This rule is called the “sequential sampling plan based on the information energy” (SSPIE). The stopping rule based on the information energy continues sampling until the information given by the posterior distributions exceeds a value preassigned by the statistician. Lindley (1956, 1957) proposed in this context a stopping rule based on Shannon’s entropy that takes into account the “precision” of the posterior distribution. El-Sayyad (1969) studied the rule proposed by Lindley with the exponential distribution. The expression & ( p ( - ) ) given in (24) is not invariant under a change of description of the parameter value. Now we adapt the expression for the SSPIE to those problems where the objective of the research is to make inferences about the value of (b = 4(6),with (b being a monotone function of 8. Specifically, if 4 = 4(8) is a monotone function of 8, then the probability distribution, q(4),of (b is given by
q(d4d4 = P(@d& and the information energy is given by
I t follows that if a sampling scheme is adopted in which we try to obtain a prescribed amount of information about 4, it will be, in general, different from a scheme relevant to 8.
Example 2: Suppose that X,,. . . ,X , is a random sample from a Bernoulli distribution with an unknown value of the parameter 0. Suppose also that the prior distribution of 0 is a beta distribution with parameters a and b such that a > 0 and b > 0. It is well known that in such a case, the posterior density of H is also a beta distribution with parameters a + Cf= xi and b + n xi.
,
L. PARD0 AND 1. J. TANEJA
202
By the table of Section 1II.C’the amount of information energy contained in a random variable beta with parameters a and b is
Application of the sampling rule proposed in Definition 7 involves the sampling continuing until the values of a and b obtained are such that (30)has attained a prescribed value. The expression (30)is too complicated to be easily understood, but considerable simplification is possible by use of the standard asymptotic Stirling’s formula. Using the standard asymptotic Stirling’s formula, we have
for large values of both a and b. It follows that the boundary in the (a, b) diagram is a curve of the form (a
+ b)3 = 62471ab
(Descartes’ Folium).
(31)
Suppose that the prior distribution has a = ao, b = bo. Then after a sample of size n has the cumulative sum r = Cy= x i , the posterior distribution will have a = a, + r, b = b, n - r. The experimentation can be represented in the (a, b) plane by a path that starts at (uo,b,), and it is built in the following way: For each observation x i = 1, we move one unit along the a-axis and zero units along the h-axis. Sampling will cease when the path intersects the curve given by (31). Figure 1 illustrates, for 6 = 3, two binomial sampling schemes. The continuous path represents an SSPEI when the prior distribution is a beta distribution with parameters a, = 16, b, = 16 and the results of successive observations are 11 11000000000111 111 1100000000011000000000000110000 11 110000.One may see that 6 = 3 is obtained with n = 64. The discontinuous path represents an SSPEI when the prior distribution has a, = 4, bo = 4 and the results of successive observations are 00001 11 1 11 1 1111 11 11 1 110000000 000000000000000OOO00000001111111111111100000OOO0000000. The value 6 = 3, in this case, is obtained with n = 86. By (30),we have
+
+
8 ( B ( a o r, bo -
+ n - r)
r(2aO+ 2r - l)r(ab, + 2n - 2r - 1) (T(u, + b, + n))2 T(ao2 + 2bo + 2n - 2) (T(u, r))’(r(b, n - r )
+
+
INFORMATION ENERGY AND ITS APPLICATIONS
203
40 40
37 24
I6
a (1,1)
FIG.1. Two binomial sampling schemes.
When n -+
CCI
and r / n
-+
k by use of Stirling's formula, we have
which depends only on n, i.e., on the sample size. This means that, for large samples, there is not much to be gained by sequential sampling.
Now we study the SSPEI when we have interest about 4(0) = 2 arcsin 0'12. Here dtJ/d+ = ((I( 1 - 0))'12,and evaluating expression (29), we have
&e,(d4)) =
r(a+ h ) ~ r ( -2 ~1/2)r(2b - 112) .
+
(r(a))2(r(h)1~1-(2~ 2b - 1)
By use of the standard asymptotic Stirling's formula, we have
Therefore, the boundary in the (a, b) diagram is a curve of the form (a+ h) = 16xzS2, which is a fixed sample size scheme (if the prior distribution has a = u,, b = bo, then a b = a, + h, + n). Then the SSPIE turns out to be a fixed sample size rule. One can also use the posterior variance as a stopping rule: to sample until the variance of the posterior distribution is sufficiently small. This agrees with the SSPIE result. In fact, the posterior variance of 4(0)= 2 arcsin P I 2 is, if both u and b are large, equal to (a b ) - ' , and when it is held constant, produces the boundary (a + h) = k.
+
+
204
L. P A R D 0 A N D I. J. TANEJA
Now we study the SSPIE when we have interest about p ( 0 ) = ln(0/( 1 - 0)). Here d e / d p = O( 1 - e), and evaluating expression (29), we have
By use of the standard asymptotic Stirling’s formula, we have a 1/2b112
%6)(q(p)) = 2(a + b)1/2a1/2
‘
Then the boundary in the (a, b) diagram is the equilateral hyperbola ab = + b)ad2 (if both a and b are large). The posterior variance of p ( 8 ) = In(e( 1 - 0)) is, if both a and b are large, approximately (a + b)/ab and, when it is held constant, gives (a + b) = kab. This result establishes the equivalence “Between” of the use of the posterior variance as a stopping rule with the SSPIE. The above results coincide with the results obtained by Lindley (1957)for utilizing Shannon’s entropy. L. Pardo et al. (1985)studied the SSPIE with the exponential distribution. The results obtained coincide with those obtained by El-Sayyad (1969) for Shannon’s entropy. Now we shall study the behavior of the SSPIE when 0 = {el,O2 >.After the values XI= x l , ...,X, = x, have been observed, the information energy can be described as follows:
4(a
& ( P ( ~ / x.., xn)) = ~ ( 6 l / x l 9
.?
9..
.7
xA2 + ( 1 - P ( $ ~ / x ,* ,* * x,))~. 9
Proposition 7. The stopping rule based on the information energy, when 0 = { O , , 0 2 } ,indicates sampling is terminated after the values XI= x,, . . . , X , = x , have been observed if 1/2(1 - (26 -
1)1’2)
< p ( t ) , / x , ,. . . ,x,) < 1/2(1 + (26 - 1)1’2),
1/2 I 6 I1 , (32)
where B is the constant that depends on the amount of information.
Proof: Since
+
4 P ( e / x l , . . .,x,)) = P ( W 1 , . . . 4J2( 1 - P ( O l / X , ,
. . ., X,)l2
is a convex function with d(p(f?/x,,. . . ,x,)) = 1 iff p(O1/x,,. . . ,x,) = 0 or 1, then the scheme corresponds to continuing sampling, iff
c
(33)
INFORMATION ENERGY AND ITS APPLICATIONS
205
where c verifies the equation c2
+ ( 1 - c)’ = 6,
with 1/2 5 6 5 1
Therefore p ( % / x , ,. . . ,x,) verifies (32). Now we prove that the sampling scheme based on the information energy is equivalent to a scheme used in a Wald sequential probability ratio test for testing a simple hypothesis H,:0 = O1 against simple alternative H,: 0 = 0, (Rohatgi, 1976, p. 612).
Theorem 3. The regions for continuing sampling given by the sampling scheme based on the Information Energy coincide with the regions for continuing sampling given by the Wald sequential probability ratio test (SPRT) of testing H,: % = 0, against H,: 6 = 6 2 with stopping bounds A=
p(O1)(l - (26 - l y ) and p(8,)(1 (26 - 1)”2)
+
B=
+
p(O,)(l (26 - 1 ) q p(e2)(1 - (26 - 1 p 2 y
where 6 is the amount of information required. Proof: Expression (33)may be written in terms of the ratio of posterior probabilities for 6, and 02, in the form
-
1 --c
P(W,
3 . .
., x,) < -1, - c
P(~l/Xl>*..,X,)
c
i.e., p(%,)(l - (26 (26 -
p(e2)(1
+
1)lI2) 1)1/2)
< f ’ ( x 1 , .. .,x,/O,) j-(.x,, . . . ,x,/e,)
+
< p(%,)(l (26 - 1)”2) p(e2)(1- (26 - i)l/2)
It is now apparent that the SSPIE is equivalent to a scheme used in a Wald sequential probability ratio test for testing a simple hypothesis H,:0 = 0, against simple alternative If,: t9 = 0,, with stopping bounds A=
p(e,)(i - (26 - i p 2 ) and p(t92)(l (26 - 1)1’2)
+
and strength (a, B), where
and
B=
+
p(8,)(1 (26 - l y ) p(OJ(1 - (26 - 1)1/2)
L. PARD0 AND I . J. TANEJA
206
with
and
with N being a random variable that denotes the number of observations needed to reach a decision by using the SPRT. In this case, as the SPRT terminates with probability 1 under both Ho and H , ,it follows that the sampling scheme based on the information energy terminates with probability 1, this is to say, it is feasible sampling to reach the amount of information preassigned by the statistician. We observe that utilizing the approximations to A and B given by Wald, the SSPIE is equivalent to a scheme used in a Wald sequential probability ratio test for testing a simple hypothesis H,: 0 = 8, against simple alternative H,:0 = 0,. with strength a = (p(e,) - p(el)
+ (26
-
i p 2 ) ( i - (26 - i)/4p(e1)(26 -
and
p = (p(e,)- p ( e 2 ) + (26 - i p 2 ) ( i - (26 - i)/4p(e1)(26 - i p . An approximate expected sample size of the SSPIE is given by
x (1 - (26 - l)1/2)log(
I)
p(B,)(l - (26 - 1)1’2) p(8,)(1 (26 - 1 ) q
+
’
where 2 = log[f(x/~,)/f(x/~,)],and 6 is the amount of information required.
Remark 5 : (a) For 0 = {el,8, 03},it follows that
INFORMATION ENERGY AND ITS APPLICATIONS
207
Then
+
~ P ( o , / x ., ., . . x , ) ~ 2 p ( 0 2 / . ~.,.,. ,x,)2 -
2p(02/x,,. . . ,x,)
+2P(~,/X,, *
-
7
2p(e,/.~,,. . . , X n )
xn)p(8,/.x,,. . . ,X , l 2
+ 1 - 8(p(6/x1,...,X " ) ) = 0
(34) is a quadratic form. Furthermore, it can immediately be proved that it is an elliptic paraboloid. Intersecting the elliptic paraboloid (34) with the plane Q(p(-/.x,, . * *
3
.X,N
= 6,
we obtain the ellipse given by 2p(O,/x,, . ..,x,)2
+ 2p(t),/x,,. . .
,X")2
+ 2p(O,/x,,.. . ,x,)p(B,/x,, . .
*
- 2 p ( O , / x , , . . ., X n )
2 ,Xn)
+I
-
2p(O,/.u,,. . .,x,)
- d = 0,
whose axes are 3 p ( e , / X , , . . . ,x,,) + 3 P ( ~ 2 / X I ., . ,x,) - 2 = o
and p(O,/x,,..., x,) - p ( d , / x ,,... ,x,) = 0 and whose center is at (3,f). For 6 = 0.72 and 6 = 0.57, the above ellipses are given in Fig. 2, and we continue observing a new variable X,,, if
,
( p ( ~ i / x i , . . . , x n ) , ~ ( t ....,-~" ) 2 / ~ , ))€ A U BU C,
where A , B, and C are given in Fig. 2.
'\
C
. n
-ma
208
L. PARD0 AND I. J. TANEJA
(b) For 0 = {Ol,02,03,d4),it follows that
The intersection of b(p(./xl, , . . ,x,)) with &( p ( . / x , , . . , ,x,)) = 6 provides the real ellipsoid
(a,; a).
and whose center is at Then the region to continue observing is given by the intersection of the real ellipsoid (35)with the planes
and
INFORMATION ENERGY AND ITS APPLICATIONS
209
D. Information Energy in the Sequential Design of a Fixed Number of Experiments
Let V be a class of statistical experiments X . In this section our aim is concentrated on the following problem: A fixed number, say m,of experiments are performed, and it is desired to select the experiments X , , . . .,X , E W sequentially so as to maximize
\
\
All observations are assumed to be independent in the sense that given the experiment that we want to perform at some stage, and also given the true value of 8, then the outcome of the experiments is independent of all previous observations. Moreover, we assume that one can take independent observations at different stages on the same random variable X E %. DeGroot (1970) studied this problem utilizing Shannon's entropy, while we do it here by using a measure of information energy. By the familiar dynamic programming technique of working backward from the last stage of experimentation, an explicit rule for the construction of the optimal design can be given. The construction of the optimal selection procedure of X , , , . ,,X , by backward induction is stated as follows: The optimal sequential procedure must satisfy the requirement that at any stage of the procedure, if the values X , = x 1,. . . ,X j = x , (~j < m) have been observed, then the continuation of the procedure must be the optimal sequential procedure for the problem where the prior distribution of 0 is p(O/x,, . ..,xi) and the maximum number of observations that can be taken is m - j . In the following derivation, it is assumed that the maximum taken over the class %? is actually attained at some X E %?.
210
L. PARD0 AND 1. J. TANEJA
Suppose after having performed the first m - 1 experiments, the posterior distribution over 0 is p(8/x1,,, .,x,). Then, obviously, the best choice for the final experiment X is an experiment X, E V such that * . ., x m - 1 , x m ; p ( * ) )=
jgm
g ( p ~ / x ,* xm))f(xm/x,
3..
3
9 .
*
,Xm
-1)dl(xm)
. .,x , - 1 , x ;p(*)).
= sup € ( X i , . X€Y
For each p(O/x,, ...,x,- ,), the above expression defines the optimal choice of X , with respect to a prior distribution p(O/x,, . . .,x,). We write ( ~ l ( x 1 , a 3. .X m - 1 ;
P(.))
= &(XI,
*
*.
9xm- 1, X m ;
P(.)).
After performing the first m - 2 experiments, suppose that the posterior distribution over 0 is p ( 8 / x 1., . . ,x,- 2). Then the best choice for this stage of the experiment X,- is given by
,
Ex,,,
, X m - 2 7 X m - 1;
I((P~(x~,
~
-
6..,
( ~ l ( x 1* ,. 7 x m - 1;
P(.))) p ( * ) ) f ( x m -1/x1,.
3
xm-
z ) d L - 1(xm-
1)
= SUP E x ( ( ~ l ( x 1 ., ., . x m - 2 X ; P(*))). X € I
We write
.,
( ~ 2 ( ~ 1 , *x.m - 2;
P ( * ) ) = Exn ,(~pl(xl ..* , x m -
2 .Xm - 1;
~
Continuing this way, the best choice at the stage j such that
P(.))).
1 of the experiment X , is
=
We write ( P ~ ( P ( . )=) Ex,(cPm- I ( X I ; P ( * ) ) ) .
In this way, an optimal procedure at the stage j = 1 consists of selecting implicitly the random variable X , E V for which q m ( P ( * ) ) = E X l ( V m - I(XI;P(*)))* The optimal choice for the stage j = k + 1 after having performed the first k experiments X , = x,,. ..,X k = xk consists of selecting a random variable xk+ E for which (Pm - k ( X 1 ? .
. x&;p(')) = EX, * 7
+
((Pm - k
- I(x1I . .
-
9
x& xk + 1 ;p(*)))*
INFORMATION ENERGY AND ITS APPLICATIONS
21 1
Now we shall prove that if there exists an experiment X E % that is sufficient for every other experiment in %, then in accordance with the information energy maximization criterion (36), the optimal selection procedure of X,,. ..,X, consists of taking all the m observations of the random variable X.
Proposition 8. Let Y be a sufficient experiment in %, and 0 = {al,.. . ,On]. The optimal sequential selection procedure of X,, . . . ,X,,, according to the information energy maximization criterion (36) consists of taking all m observations on the random variable Y. Proot After having performed the first m - 1 experiments, the posterior distribution over 0 is p ( B / x , , . . .,x,- ,), whereas the statistical spaces associated with theexperimentsxand Y are(%,%,T,f(xl,.. . , ~ ~ - ~ / B ) ) ~ ~ ~ a n d (f?l,399,g(yl,. . . , y m - l/B))oEe, respectively. By Result 9(a), we have
Thus, the nth observation should always be made on the random variable Y, regardless of which random variables have been selected at the earlier stages and regardless of their observed values. Hence, for any distribution p(B) E O*,
n
and the (m - 1)st observation is made on the random variable Y. When the same argument is used at each stage, it follows by induction that all the rn observations should be made on Y. Proposition 9. Let Y be a sufficient experiment in 4L3 with associated statistic space(%, pry,f ( ~ / 8 ) )and ~ ~suppose ~, that { f (y/8), 8 E 0)is a complete family of distributions. The optimal sequential selection procedure of X,,...,X, according to the informational energy maximization criterion consists of taking all m observations on the random variable Y. The proof follows on the lines of that of Proposition 8. E. lnformation Energy of a Point Process
This section deals with the information energy of a point process. After studying the properties relative to the rate of change of information energy, we prove that, given the class of stationary point processes, the information energy is minimum for the Poisson process. Let a sequence of random events occur at the instants T,, where i 2 1, J + > q, and > 0. Let the probability of an event in ( t , t + d t ] be P(t)dt, with the probability of multiple events being of higher order, and P ( t ) > 0 for all t > 0. P ( t ) is called the intensity. Let the number of events in (0, t ] be N ( t ) , i.e., N ( t ) is the largest value of n for which T, 5 t. This random variable obeys the probability law P ( N ( r ) = n) = p(n, t). Consider those sample functions for which N(t)= n. Under this condition, let K,(u,t) be the joint density of the n variables (Tl,..., q),which are the first n components of a random vector U. Correspondingly, the assumed values ( t l , . ..,t,) are the first n components of the vector u. Thus the joint probability that N ( t ) = n and that these events occur in the intervals ( t l , t , d t , ] ,..., ( t , , t , dt,], 0 < t , < . - .< t , 5 t, n 2 1, is given by
+
+
p(n,r)K,(u,t)du, = P(T1 ~ ( t 1 tl,
+ dt11, ..., T , E ( t n , t, + dtnI, T,+I > tX
where dun denotes the infinitesimal volume d t , . . . d i n .
INFORMATION ENERGY AND ITS APPLICATIONS
213
We assume that all the n-dimensional densities k,(u,t), n 2 1, exist everywhere in the defined regions for all r > 0. This restriction rules out not only multiple events but also periodic processes. Now we define the information energy & ( t ) of a point process in the interval (0, t ] : Definition 8. We call the information energy of a point process in the interval (0,r ] by the expression
where R,(t) denotes the region 0 < t , < ... < f,It , ko(u, t ) = 1. Expression (37) may be rewritten in another form:
where
f
KJu, r)* do,
R,(rl
is the expression of the information energy of the random variable U conditioned by N ( t ) = n and represents the information associated with the locations of the events in (0, t ] . In order to investigate the derivate 6"(t), we shall first represent our point process as a generalized birth process. Let IJu, t )dt be the conditional probability of an event in ( t , t + d t ] , given that N ( t ) = n and that T, = t,,. . ., T, = t , , 0 < t , < - . I < t , It. In the case of n,(u,t), the only condition is that N ( t ) = 0. Then we can write two continuity equations as follows: p(n, t
+ dt)K,(u, t + dr)dv, =
+
[1 - i,(u, t ) dr]p(n, t)k,(u, r ) d t ~ , O(dt ~ L J , ) , t , < t ,
(38)
and p ( n + 1, t =
+ dr)k,+I(u"'+l', f + dt)du,dt
~ J u t,) dt p(n, t)k,(u, t )dun
+ O(dt dun)
(39)
for 11 2 1, where u("+" is the vector u with components t,, t 2 , ... and the ( n 1)st component f,+, is replaced by t . If the functions are differentiable with respect to t , then we may let dt -+ 0 and dun + 0. The special cases for II = 0 are obtained similarly. For all n 2 0, we have
+
L. PARD0 AND I. J. TANEJA
214 and
The derivative of 8(r)is given by
where the first term results from the differentiation of the multiple integral with respect to the outer limit t, and the second term results from the differentiation of the integrand. Next we shift the index in the first sum and make use of the "birth" equations, (40) and (41).Then
8'0)=
5
n=O
x
(An(U,
IRn([)
2 m t)Kn(u, t )
r)p(n,t)Kn(u, r)) d u n
2jR"([) m
-
2
(&,(u, t M n , t)K,,(U,tI2 dun - n = 0 R,(t)
(P(K t ) K n ( ~~))'(Z(U, , t ) - 2An(u,t ) ) d u n
= E ( p ( N ( t ) ,t)K,(,)(U,~)O"&t)W t ) - 2 4 d J 9w
9
where AN(tl(U, t) is a random variable depending on the other random variables N ( t ) , TI,.. ., TN([), with sample values &(u, t). In other words, A,,,,(U, t ) d t is the conditional probability of a birth, expressed as a random variable. Based on above considerations we have the following results (Morales et al., 1985): Result I3 (a) On the class of point processes describable by the generalized birth equations (40)and (41), with a fixed intensity function fl(t),for any given t, 8(t) is increasing (decreasing) t if and only if /?(t)> 2 ( P ( t ) < 2), and the process is the Poisson process, except for a set of events of probability zero. (b) Let {N(t))t,Obe a homogeneous Poisson process with P ( t ) = /?.Then
4 0 = exp(tP(P - 2)). (c) The information energy of a point process in the interval bounded by the expression
(O,t]
is
215
INFORMATION ENERGY AND ITS APPLICATIONS
(d) If the number of events that can occur in the interval (0, t ] is less than or equal to n, the above bound is minimum for the probability distribution
ai
with a = ti- '/i!, i = 0,. . . ,n.
C,"= 0 ai
Let us now consider a special class of stationary point processes, namely that for which all the density K,,(u, t), n 2 1, are uniform over R,,(t)for all t > 0. Nawrotzky (1962)has shown that this class is equivalent to the mixed Poisson process. The mixed Poisson process is a mixture of Poisson processes with intensities A, where A is a nonnegative random variable with E(A) = p. Based on this, we have the following result:
Proposition 10. For the mixed Poisson process with intensity E(A) = p, the information energy, &'(t),for a given t is minimum if and only if the process is Poisson, except for a set of sample functions of probability zero. Proof: The information energy of a stationary point process is given by &(t)= E(e'"'*-2'
1.
Let S,(x) = er(x2-2x).Then s;(x) = r(2x - 2) P22x), ~
and Sl'(x) = t2(2X - 2)2 e ' ( X 2 -2-x)
+
2t & ( X 2
- 21)
> O V t > 0,
vx.
Thus a straight line that is tangent to S,(x) at the point x = B must lie below S,(x) except at that one point. Then
Thus the unique minimum of B(t) occurs when the process is Poisson, except for a set of sample functions of probability zero.
216
L. PARD0 AND I. J. TANEJA
F. Information Energy, Information Energy Divergence, and Probability of Error The problem we shall be dealing with is that of estimating the class (state) 8 of a given pattern (observation) x. It will be assumed that the patterngenerating mechanism is adequately described by the following statistical model: Let there be n possible pattern classes 0 = {O,, 8, ,...,On} with prior probability p(8,) = Pr(O = O,), i = 1 , . . .,n. Let X be the pattern space and suppose that, for a given O,, x has a class-conditional distribution f (x/8,), i = 1,. . . ,n. We assume that p ( @ ) and f (x/Oi)are completely known. In this context, the decision rule that minimizes the probability of error is the Bayes decision rule, which, for a given x, chooses the hypothesis (pattern classes) that maximizes the posterior probability of 8; ties are broken arbitrarily. Using this rule, the partial probability of error for a given x is expressed by p ( e / x ) = 1 - max{ P ( 8 , / 4 , P ( W X ) , * ' . > P(8k/X)}.
Prior to observing X , the probability of error P, associated with X is defined as the expected probability of error, i.e.,
p, = E x ( p ( e / x ) )=
!J(e/x)f(x)dx. I X
In recent years, researchers have paid attention to the problem of bounding this probability of error for two or multiple-class problems, taking some information, divergence, and distance measures into consideration (Kailath, 1967; Kanal, 1974; Chen, 1976; Boekee and Van der Lubbe, 1979; Taneja, 1985,1989).Our aim here is to give bounds on the probability of error in terms of information energy and information energy divergence. 1. Information Energy and the Probability of Error
This subsection deals with the relationship between the probability of error and the information energy. We have the following results (Devijver, 1974): Result 14:
(a) P, 5 1 - B ( @ / X ) , (b) 1 - S ( O / X ) l i 2I Pe, (c) f(1 - & ( @ / X ) )I Pe r
INFORMATION ENERGY AND ITS APPLICATIONS
217
where
Remark 6 :
(a) In order to achieve equality in Result 14a, either one of the posterior probabilities p(8Jx) is equal to unity and all others are zero, or all p(Oi/x) are equal. Clearly this means that if there exists some subspace of X over which more than one posterior pattern class probability is different from zero, then all pattern classes have to be equally a posreriori probable over that subspace. (b) The upper bound on P, given by the information energy is as tight as possible since the sign of equality in Result 14a may hold for any value of P,. This conclusion does not apply to the lower bounds derived in 14b and 14c. For these bounds, the sign of equality only holds if P, = 0. It is the purpose of the following proposition to establish the class of lower bounds that are as tight as possible for given finite n. (c) Devijver (1974) called information energy the Bayesian distance. W Result 15: l J e 2n --(1l
-(
n
n & ( @ / x )n-l
9
with equality if and only if p(O,/x) = maxp(Hi/x) = 1 - P, l
and
2. Information Energy Divergence and the Probability of Error
Now, using the divergence measure of information energy defined in (18), we shall present bounds on the probability of error. We consider two pattern classes 0 = (el,0,). The information energy divergence measure between the two class-conditional distributions f(x/O,) and f(x/O,) is given by
where
Therefore
and
G. Information Energy as an Index of Diversity
When the observations from a population are classified according to several categories, the uncertainty of the population may be quantified by means of several measures in information theory. The diversity of the population is intutively intended as a measure of the average variability of classes in it, based on the number of classes and their relative frequencies. Consider a finite population of N individuals that is classified according to a classification process of factor X into M classes or species x l , . . . ,x M . We denote by 2T the set of all categories or classes
2T = {x ,,..., xy}. Rao (1982)established that a measure of diversity is a function
INFORMATION ENERGY AND ITS APPLICATIONS
219
satisfying the following conditions: (a) O(P)2 0 V P E A M , and O ( P ) = 0 iff P is degenerate; (b) O is a concave function of P in A M . We shall refer to @ ( P ) as the diversity measure within a population % characterized by the measure P. Condition (a) is a natural one, since a measure of diversity should be nonnegative and should take the value zero when all individuals of a population are identical, i.e., when the associated probability measure is concentrated at a particular point of %. Condition (b) is motivated by the consideration that the diversity in a mixture of populations should not be smaller than the average of the diversities within individual populations. The quadratic entropy measure given in ( 1 1 ) satisfies both the conditions. Thus, it can be used as diversity index. In this section we use the quadratic entropy (1 1) as a diversity index. This diversity index, denominated the Gini- Simpson index, was first introduced by Gini (1912) and later by Simpson (1949). For a brief, interesting history of this index, see Good’s comments in Patil and Taillie (1982). For several interpretations of this index, see Rao (1982). Bhargava and Uppuluri (1975) and Rao (1982) gave characterizations of this index. For applications and further discussion, see Agresti and Agresti (1978), Bhargava and Doyle (1974), Lieberson (1969), Nei (1973), and Nayak (1985). Now we analyze the asymptotic distribution of = 2(p)as well as its applications to testing hypotheses, where @ is the vector of observed relative frequencies. Assume that a sample of size n is drawn at random from the population. Let there be Y, observations in the first category, Y2 observations in the second category, and so on to Y, observations in the Mth category, such that Yl + Y, + . + .+ Y,, = N. We assume that ( Y , , Y,, ..., YM- 1) follows a multiThen the MLE nomial distribution with parameters (n, p l , p 2 , . . . p M (maximum likelihood estimator),
ti= x / n , is a consistent estimate of pi for i = 1,2,.. . ,M, and the estimate T, = h(Y,/n, Yzln,.. . , YMln)
is also consistent for h ( p l , PZ 9 . .
.P M )
if h is continuous. Bickel and Doksum (1977, p. 135) have shown that if
220
L. PARD0 AND 1. J. TANEJA
exists and is continuous for all i = 1,. . . ,M, then n112(T,- h(p1,p2,...,pd
L
ntm JWa’),
where
We have the following result:
Result 16: If we consider the estimate jobtained by replacing pi values by the observed proportions f i i = x / n , i = 1,. . .,M , then
n””3
- 9(p,,p2,. .. , P M )
L
.mkaZ),
where az = 4(if1 P’
-
(j, p:>’).
This result is used by Agresti and Agresti (1978) for testing the following hypothesis: (a) H , : Q(P)= Do against one-sided or two-sided alternatives, i.e., the diversity of a population equals a specified value against one-sided or twosided alternatives. Under H , , the statistic n112(4- D,)
Z=
(52)1/2
has approximately a standard normal distribution for sufficiently large n. Clearly, large values of 3 support 2 > Do, so that large values of 2 tend to discredit H,: 2 = D o against H,:2 > Do.In this case we reject H, at level c1 if z > z,, where z , is such that P(Z 2 z,) = c1. Similar arguments may be applied in the remaining cases. (b) H,:D, = D2(diversities of two independent populations are equal) against one-sided or two-sided alternatives. Now under H,,the statistic
has approximately a standard normal distribution, where the subscript i has been used to denote population i, and nidenotes the sample size in population i (i = 1, 2). (c) H,: D, = D2 = . . . = D,against at least two populations with different diversities. Let us consider several (r) populations. Suppose we have a
INFORMATION ENERGY AND ITS APPLICATIONS
22 1
random sample of size ni from the ith population. By an application of Cochran's theorem (Nayak, 1983), we get
3)*
C Hi(-& 6' r
T=
-
'
i=]
and it is asymptotically distributed as a Xz-variable with (r - 1) d.f., where 3, and 3; are the estimates of the quadratic entropy 9iand the variance c: of population i respectively, and
Thus, we reject H , at a level c1 if T > x,'Remark 7: From Result 16 an approximate I - a level confidence interval for d ( P )is given by
Furthermore, the minimum sample size guaranteeing a specified limit of error with a small risk is
I>:[
E
n* =
O-Z&
+ 1.
We can obtain a test for deviation from uniformity. This test is based on the following result (Nayak, 1985): Result 17: If we consider estimates of population entropies obtained by replacing p i values by the observed proportions ji= yi/n, i = 1,. . . ,M , then under H , = p1 = p z = ... = PM = 1/M,
T ( i , ,. . . , E M ) = - n M l ( P ) is asymptotically distributed as a x z -
+ (A4 - 1 ) M
,.
As a consequence of Result 17, we reject H , = p1 = p z . . 1.e.. I f at the level c( if T > 1;-
2(P)< (&-
'*a -
Since in the non-null case n''2(d(F)- PI) -
0
( M - l))/nM.
L
nt m
YY(0,1),
= ... = p M = 1/M
L. PARD0 AND I. J. TANEJA
222
the asymptotic power function, is given by
we denote P ( X s x), when X has a normal distribution with where by FN(o,I)(x) mean 0 and variance 1. Now we suppose that the population can be divided into r nonoverlapping subpopulations, called strata, as homogeneous as possible with respect to the diversity associated with X . Let Nk be the number of individuals in the kth stratum (so that C;=, Nk = N ) ,and let pik denote the probability that a randomly selected individual in the kth stratum belongs to the class or spe1 pik = 1. cies x i (i = 1,. . .,M,k = 1,. . .,r). Thus, cE I Pik = Nk/N, EE 1 Let p i . be the probability that a randomly selected individual in the whole Pik, i = 1,. . .,M). Then the population will belong to the class x i ( p i . = Gini-Simpson population diversity associated with X is given by
c;=
Assume that a stratified sample of size n is drawn at random from the population independently in different strata. We hereafter suppose that the sample is chosen by proportional allocation in each stratum. Assume that a sample of size nk is drawn at random with replacement from the kth stratum, where n k / n = Nk/N. If fik denotes the relative frequency of individuals belonging to the class xiin the kth stratum (and hence 1;.k = nk/n),and A. = f i k , then the diversity in the sample with respect to the classification process or factor X could be quantified by means of the analogue estimates, the Gini-Simpson sample diversity, is= &(X). Following the ideas in Nayak (1985), M. A. Gil(l989) established the following result:
xp=,
& is asymptotically distributed Result 18: The random variable r ~ ~ ’ ~-( 2?J (as n k -+ 00, k = 1,. . . , r ) according to a normal distribution with mean zero and variance equal to
In general, in a heterogeneous population the stratification may produce a gain in precision in the estimates of characteristics of the whole population. When we try to estimate the diversity in the whole population by means of a
INFORMATION ENERGY AND ITS APPLICATIONS
223
large sample, one can ensure a gain in precision from stratified random over simple random sampling, whatever the value for stratification may be. O n the other hand, it may also be guaranteed that such a gain in precision is small unless the inaccuracies between the frequency vector in each stratum and that in the whole population differ greatly from stratum to stratum. In the following result, M. A. Gil (1989) formalized the comments above for the asymptotic variances: Result 19: It is verified that r~iI o 2 with equality if and only if there is only one stratum, or zE ,( - 2 p i . ) ( N / N , ) p , does not depend on k ( k = 1,. . . ,r). a
On the basis of Result 18, we could now construct procedures (a) to select the minimum sample size guaranteeing a specified limit of error with a small risk; (b) to define confidence intervals with a specified confidence coefficient; (c) to define tests of hypothesis. According to Result 19, if we deal with large samples, we could respectively guarantee (a) a decrease in sample size, (b) a decrease in the length of the confidence interval, and (c) an increase of the test power, from the stratified random to the simple random sampling. H. Markoc Chains
This section deals with the applications to Markov Chains of the measures of information energy given in Section 1I.B. Definition 9. A finite or infinite sequence of random variable X,,X,,. . . . forms a Markov chain, denoted by X , X , . . . . . if for each i, the random ... ,Xi- given Xi. variable X i + is conditionally independent of (X,, I
L
Based on the above definition, the conditional information energy and information energy gain given in Section 1I.B satisfy the following result. Proposition 11. (a) XI,X2,. ... forms a Markov chain, i.e., X, n X, 0 .... iff gw19
.
x, ...,xi-l;xi+,/xi)=o
or *Q(XI, x2 9 ...,
xi-l; Xi+I/Xi) =0
For each i, where Y and *9are given by (12) and (14), respectively.
224
L. PARD0 AND 1. J. TANEJA
(b) If ( X , Y,Z ) forms a Markov chain, i.e., X
0
Y 0 2, then
(2) b(X/Y) Ib ( X / Z ) . (c) If ( X , Y,Z, V ) forms a Markov chain, i.e., X (1)
0
Y Z 0
0
V, then
qx;V ) 5 9 ( Y ;Z ) ,
(2) * q x ; V ) 5 *Y(Y; 2).
Proof’: Part (a) follow from Result l(a). Part (a)(1) follows immediately from part (a) and equality (15). Part (b)(2)follows from part (b)(1).Part (c)(l) follows from part (b)(l)by considering two sub-Markov chains (X, 2, V )and (X,Y , Z ) . Part (c)(2) follows from the convexity of D(P 1 Q) given by Result l(d).
V. TNFORMATION
ENERGYAND FUZZYSETSTHEORY
A . Quantijcation of Fuzzy Information
Experiments will now be considered in which the person responsible for observation cannot always crisply perceive their outcomes, but each observable elementary event may only be associated with a fuzzy subset of the sample space (Zadeh, 1965) or, more precisely, with fuzzy information, as intended by Okuda et al. (1978), Tanaka et al. (1979), and Zadeh (1968). It is defined as follows:
Definition 10. A fuzzy information, X, from the experiment (X, 6%.G)#.@,is a fuzzy event on F that is characterized by a Borel-measurable membership function p X , which associates with each exact observation x on X a real number in [0,1] with the value fix(x) representing the “grade of membership” of x in X (Tanaka et a/., 1979). The scheme in Fig. 3 explains the mechanism that leads to the obtainment of fuzzy information according to the notation of Definition 10. We shall suppose that the family of probability measures is dominated by a a-finite measure, A, so that they may be described through their density functions, f (x/O), with respect to this measure A. In addition, assume that the set of all available fuzzy observations from the experiment (3,jr,Pe)s.B satisfies the “orthogonality constraint” determining
225
INFORMATION ENERGY AND ITS APPLICATIONS
e True paryneter v a ue
X
PJX)
exact
observation
information
RANDOMNESS FUZZ I NESS FIG.3. Process leading to fuzzy information associated with a random experiment.
a fuzzy information system, which is defined (Tanaka et ul., 1979)as follows:
Definition 11. A fuzzy information system, X*, associated with the experiment (3,&, f&.e is a fuzzy partition (orthogonal system) of X by means of fuzzy events, X on X,i.e., CX,,,fix(x) = 1 for all x E 3,and X E X*. From now on, we consider the Bayesian framework, which supposes the existence of a prior probability measure r(0) on a measurable space (0,be) where is a a-field on 0.(We usually assume that 0 is a subset in a Euclidean space.) We denote by p ( 6 ) the probability density function with respect to a o-finite measure v . The mathematical model for a random experiment containing fuzzy observations may be completed by the introduction, based on Zadeh’s approach, of the conditional probability distribution on X*, given the state or parameter value 0 E 0, by
qxle) =
I
px(x)j-(x/e) d,qx),
3 E x*;
the marginal probability distribution on X by 9(X) =
I
9 ( X / # ) p ( 6 )d l ( x ) ;
and the posterior probability distribution on 0 given the fuzzy information X E X*, by
With these concepts it is possible to establish an operative model for a random experiment with previous probabilistic uncertainty (randomness in the experimental outcomes) and actual fuzzy imprecision (fuzziness in the observation). Thus, although the probabilistic framework is not enough by itself to provide us with a suitable model characterizing such a random experiment, the theory of fuzzy sets complements the probability theory and
226
L. PARD0 AND I. J. TANEJA
supplies concepts permitting us finally to construct that model in the probabilistic setting. More precisely, the approach based on the assimilation of each imprecise observable event with fuzzy information, and involving the notion of fuzzy information systems and Zadeh’s probabilistic space definition, will allow us to pass from the original probability space (X,flr,e) to new probability space (X*,Px.,9’(X/O)), where jx.is a o-field on the (nonfuzzy) set X*. The main advantage of this approach is that many statistical problems with imprecise data can be mathematically handled as statistical problems with a finite number of exact data (although the first problem is essentially an extension of the second one). On the basis of this argument, several measures have been extended in previous papers (M. A. Gil, 1988; M. A. Gil et al., 1984, 1985a, 1985b; Menendez, 1986; L. Pardo et al., 1986a, 1986b). Now we consider the extension to this context of the information energy gain. Definition 12. The quantity of information of the fuzzy information system X*, concerning 0, is defined as the value
By using Jensen’s inequality it is easy to check that S(X*,p(.))2 0 with equality iff f ( x / O )is independent of 8. B. The Information Energy Gain as a Criterion of Comparison between Fuzzy Information Systems
In several papers (M. A. Gil, 1988; M. A. Gil et al., 1984,1985; MenCndez, 1986; Menendez et al., 1989; L. Pardo et al., 1986a, 1986b), the problem of comparing two random experiments has been considered when the available experimental information on which these conclusions will be based is not exact, but rather may be described by means of fuzzy events. In other words, well-known criteria to compare experiments have been extended when the “previous information” concerning the experimental outcomes involves probabilistic uncertainty due to randomness, and the “current available information” after the experimental performance contains fuzzy imprecision. Thus, for instance, assume that a drug manufacturer has developed a drug that supplies an unknown fraction 8 of cured patients. To make posterior inferences about 8, the director of a clinic considers the experiment consisting of observing the drug’s effectiveness in a patient drawn at random from the population of patients in the clinic. This Bernoulli experiment may be characterized in terms of the probability space (X,jBs,6),where % = {0,I}
INFORMATION ENERGY AND ITS APPLICATIONS
227
(0 = noncured patient, 1 = cured patient), & = smallest Bore1 a-field on {0, I ) , G(0)= 1 - 8, & ( l ) = 8. If the director has not time enough to obtain an exact conclusion about the effectiveness of the drug, but can only indicate that V = ((the patient is more or less cured)), or 5? = ((the patient is more or less not)), then the available experimental information could be easily assimilated with fuzzy events on %. In this section we give a definition of comparing t w o fuzzy information systems (FIS) based on the definition of the amount of information given in (42) (L. Pardo et a/., 1986a),and we analyze its properties. Definition 13.
The FIS X: is preferred to FIS X:,
only if
written X:
k X:,
if and
wq,P ( 4 2 V T , P(-)), where X:, X$ E E* (set of f u z y information systems) and p ( 0 ) is the prior distribution. We say that the FIS XT is indifferent to FIS X:, written X? X:, G iff X: 2 X t and Xy 5 Xy. We call this criterion the injbrmation energy- FIS comparison criterion. The information energy-FIS comparison criterion given by Definition 13 admits the following properties: Proposition 12.
(a) The relation 3 determines a partial preordering on the set of fuzzy information systems E*. (b) M * E E* is a fuzzy information system on a null experiment N , then X* 2 ,Y*VX* E E*. (c) Let X:, X; E E* be two fuzzy information systems on TI,X2E E, G respectively. Then X: x X: 2 X:, where X: x X: = {(X,,X2)/X, E X:, X2 E X: and p(x,.x2)(x,,x2) = px,(xI)px2(x2))is called a combined FIS. Furthermore (X: x X;) 1: X: if and only if B(X,/X2,0) does not depend on 8 vx, E xy, vx2 E x;. (d) Let X* be a FIS and let X;,, n E N be a fuzzy random sample @ XY,,, where (associated with the random sample X'"))from X*. Then XY,+l) 2 XY,,, is the set consisting of all combined fuzzy information systems of n elements in X*. (e) Let X:, Xz, and Xf be FIS on S;, T2, and T3, respectively, such thtt X$ is independent of %: (9(X, x XJO) = ."p(X,/O)P(X,/O)) and X:. If X: 2 X;, Vp(O), then X: x Xf 3 X; x Xf Vp(8). ( f ) Get X:, X$, Xf, agd Xg be FIS on X,, X 2 ,X3, and T4,respectively, such that X: 2 X? and Xf 2 Xx, for any prior distribution on 0. If Xf is
if
228
L. PARDO AND 1. J. TANEJA
independent of Xf, and Xf is independent of Xx, then
x:
x
x:
g x;
x
x.;
(g) If X* = {Xm/m E M } is a refinement of the fuzzy information system X,* = {XA/j E J } (X* is a refinement of Xz if there exists a subset J(m)c J such that pxm(x)= ZjEJ(,,,)pX&x) with {J(m),mE M } a partition of J ) , then X,* X*. Let X* E E* and let X&(n E N) be a fuzzy random sampl%from X*. (h) Let To be a mapping from X& so that To(X&) E E*. Then X& 2 To(X,*,,). Furthermore, if P(X1,, , , ,X"/& t o ) = 9(X1,. . . ,X"/to)Vto E T(XT,,), then XT,, N TOO(,*)).
3
Proof: (a) It is immediate that the relation 5 is reflexive and transitive. (b) Since P(N'/e)= P ( N )VO E 0, we have
~(o/..v.) = p(e),
vo E e,
hence
q N * , p ( * ) )= 0; therefore
x* 2 "V*
vx* E E*.
(c) For each 8 E 0 and X, E X: fixed, we define a random variable taking on the values P(0/X1,X2), X, E X,: X, E X; with probabilities .9(X,/x1). Let us consider the convex function &(x) = x2;
then by using Jensen's inequality, we have
Furthermore,
hence
INFORMATION ENERGY AND ITS APPLICATIONS
229
Therefore
Wl,P(.)) s 'wq
x
x:,
p(.)).
The equality holds from Jensen's inequality if and only if P(0/X1,X2)does not depend on X, and X,, i.e., Y(X,, XJO) does not depend on 8. (d) This is immediate from property (c). (e) Since X: 2 Xy Vp(ll), we may take p ( 8 ) = P(e/X3) to obtain qx:,
P(U/X3))
2
qx;,
9(6/X3)),
i.e.,
Multiplying by 9 ( X , ) and summing over Xf, we get
Therefore 9(XT x xy,p(.)) 2 qx; x Xf,p(.)).
( f ) This is immediate from property (e). (8) For each 6 E 0 and each X", we define a random variable
zo,xm(xi) = .?(O/X',) with probabilities 9(X$)/9(Xm).Then
Let us consider again the convex function 4(x) = x2.
Then we can write, using Jensen's inequality,
Multiplying by 9'(Xrn) and summing over M,we get
230
L. PARD0 AND 1. J . TANEJA
Integrating with respect to 0,we obtain 9(X*, P(” 5 WG,P(.)).
(h) The proof is similar to that for property (g). The following intuitive fact is formalized: The presence of fuzziness (which leads to an absence of exactness) in the observation of outcomes from a probabilistic information system entails a loss of information. Such a formalization is carried out by means of 9 ( X * , p(*))and 9 ( X , p(-)).
Proposition 13. Let X* be an FIS on the (%, &, 8)e.e. Then
w(*, P(*))I B(X,P(*)),
whatever the prior distribution on 0 may be. Proof: For each 8 E 0, we define a random variable Ze(x) = p(O/x) with density function
Let us consider again the convex function +(x) = x2.
Then we can write
+(E(Ze(X))= S(@/X)23
Applying Jensen’s inequality, we have
INFORMATION ENERGY AND ITS APPLICATIONS
23 1
whence
C. Relation of the Information Energy- F I S Comparison Criterion with the Suficiency and Lehmann Criteria DeGroot (1970)has stated a preference relation between two probabilistic information systems when the information available from them is exact. We now proceed to extend DeGroot’s method to the more general case in which the available information from a potential experiment is “fuzzy.” Let X, and X, be two experiments with associated statistical space (% ,Pr,, f ( x l / @ ) e Eand e (X2, PS2,f ( x 2 / e ) ) e s e 9respectively. Suppose that the divisions of exact information x1 and x2 can be obtained with the conditional probabilities f ( x l / O ) and f (xJO), respectively, where 6 is the true state of nature. Suppose that the available information from Xiand X, belongs to the fuzzy information systems X: and X t , respectively. Now, we define the idea of sufficiency of fuzzy information systems (M. L. Menendez et al., 1989);
Definition 14. The FIS X: is sufficient for the FIS Xf, written Xf 5 X:, if and only if there exists a nonnegative function h on the product space X: x Xf for which the following two relations are satisfied:
and
c
x, EX:
h(X,,X,)
=
1,
vx,
EX?.
(44)
It is intuitively clear that if X f is a sufficient fuzzy information system for the fuzzy information system X:, then the statistician should never observe the fuzzy information system X: when Xy is available, because observing X: is equivalent to observing Xy and then subjecting the outcome to a nonnegative function h that can only obscure any information about the value of 0 that may be contained in that outcome. The following example illustrates the application of this criterion. Example 3: In an immunology process, a quarter of a large population of mice received a standard dose of a bacterium determining a character C , whereas half of the same population received a standard dose of another bacterium determining character D.Consequently, the proportions of mice with characters C and D are, respectively, 0.25 and 0.15. Suppose that the proportion 0 of mice having both characters is unknown.
232
L. PARD0 AND I. J. TANEJA
On the other hand, assume that the mechanisms of analysis for presence of characters C and D in the population are not quite exact. More precisely, assume that the analysis of each mouse for presence of character C only permits us to distinguish between the fuzzy observations V = “the mouse seems more or less to have C” and @ = “the mouse seems more or less not to which the investigator assimilates with the membership have C (or have functions pV(C)= 0.75, p,(c) = 0.25, pg(C) = 0.25, p g ( c ) = 0.75. Assume also that the analysis of each mouse for presence of character D only permits us to distinguish between the fuzzy observations 9 = “the mouse has D quite sharply” and 3 = “the mouse has not D (or has 0 )quite sharply,” which the investigator assimilates with the membership functions p,(D) = 0.9, p g ( D )= 0.1,p g ( D ) = 0.1, p@) = 0.9. Let X denote the experiment in which a random individual leading to the fuzzy information V, in the analysis for presence of character C , is observed for presence of character D. Let Y denote the experiment system in which a random individual leading to the fuzzy information 9,in the analysis for presence of character D,is observed for presence of character C. Then, the (conditional given %?)probabilities associated with X are given by
c),”
pB(1) = (48
+ 1)/3, pB(0)= 1 - pB(1) = (2 - 48)/3
(where (X = 1) is D,and (X = 0) is D),and the (conditional given 9)probabilities associated with Y are given by
QB(1) = (3.28 + 0.1)/2, Qs(0) = 1 - Qe(1) = (1.9 - 3.28)/2
c).
(where ( Y = 1) is C , and ( Y = 0) is The fuzziness in the available information for the experiments X and Y leads, respectively, to the fuzzy information systems 9*= { 9 , g } a n d V * = {%‘, @),whose probability distributions are given by
%(9) = (3.28 + I . I ) / ~ , go@) = (3.28
+ I.I)/~,
%(G)= (1.9 - 3.28)/3, so(@) = (2.9 - 3.28)/4.
Consequently, the function
h(%?/9)= 2,
h(5?/9)
h ( % / 3 ) = 0,
h(G?/$) = 1
= $,
satisfies conditions (43) and (44) in Definition 12.
w
The following results indicate that, under certain conditions, the criterion given in Definition 11 is more widely applicable.
233
INFORMATION ENERGY AND ITS APPLICATIONS
Proposition 14. (a) Let X:, Xz be two fuzzy information systems. Let p ( 0 ) be a prior probability on 0 = {fll,.. . , O n ] . If the fuzzy information system Xz is sufficient for the fuzzy information system X:, then go(:, P(.)) 5 gOG, P(-)).
(b) Let XT be a sufficient fuzzy information system for any fuzzy information system Xf E E*, with parameter space 0.If
for all mapping g: Xz + R*, then y(X,) = 0, VX2 E Xz, implies that for any prior distribution p ( 0 ) on 0, we have
boq,P ( 4 I4 X T , P(.)). The proofs are similar to those given in Theorem 2 and Result 11, respectively. Now we establish the relation between the previous criterion and Lehmann's (1959) criterion. Definition 15. The FIS X: is preferred to ththfUzzy information system X t , according to Lehmann's criterion, written X:, > X?, if there exists an FIS 4* with ~ x , , J & / Q )= q x 1 / u ~ ) q ~ ) , ve
o, vx, x:,
and a mapping defined over X: x & * such that with
a(qe)= ( X I , % )C
Y*= s(X7
WL
E
&*,
x %!*) is an FIS
d ( x , x u&/e),
EN2)
and
where A ( 3 ) = {(XI,@) E X: x &*/S(X,,&)
= Y},
such that the conditional probability distribution over X r given 0 coincides with the conditional probability distribution over 2?* given 8 E 0. The following results indicate that, under some conditions, the criterion given in Definition 11 is more widely applicable.
234
L. PARD0 AND I. J. TANEJA
Proposition 15. Suppose that the FIS XT is preferred to the FIS X;, according to Lehmann's criterion. Then
x: 5 x;. The proof is similar to the one in Result 10. P. Gil et al. (1990), under a Bayesian approach, analyze the relationship between the criterion based on Blackwell's sufficiency and some criteria based on well-known information measures. L. Pardo et al. (1988) first consider the following problem: For any prior distribution on 0,for any class F of experiments and for any fixed number of divisions of fuzzy information n, we determine a procedure that maximizes the terminal information energy given by b(x: ,...,x;; p(*))=
c
0
X'EXf
.
9(X',
.
...,X")
X"€X*,
r
s(xn/xn- 1,.
.I].
..,X')]]. .
More studies in this direction can be seen in the works of L. Pardo, (1984a, 1984b) L. Pardo et al. 1986b; Menendez et al., (1989); etc.
VI. WEIGHTED INFORMATIONENERGY A , Definition and Properties
The information energy of a random variable X, b ( X ) , depends only on the probabilities with which various outcomes occur. In order to distinguish the outcomes xl,., , ,x, of a goal-directed experiment according to their importance with respect to a given qualitative characteristic of the system, we shall associate to each outcome x i a positive number ui> 0 directly propor-
235
INFORMATION ENERGY AND ITS APPLICATIONS
tional to its importance. We call ui the weight of the outcome xi.A probabilistic experiment for which we assign a weight to each result xi will be called a weighted probabilistic experiment. For all P = ( p l , . ..,p , ) E A,, we write U = ( u l , . . . ,u,), u1 > 0, i = 1, 2,. , , ,n, the set of weights. For a weighted probabilistic experiment, Theodorescu ( 1977) gave the idea of weighted information energy, and Pardo (1981a) modified it by introducing a denominator, given by
u i p i . Here it is understood that all weights and probawhere E ( U ) = bilities involved are positive. The weighted information energy (45) satisfies many interesting properties, given as follows: 1. b ( p , , . . . , p n ; u,,.. . ,u,) is a symmetric function with respect to all the pairs ( u i , p i ) ,i = 1,. . . ,n. 2. & ( p , ,. . . , p n ; u,,. ..,u,) is invariant by positive homotethies with respect to the weights, i.e.,
A > 0.
& ( P I , .. . . p n ; h 1 , . . . ,h,)= %& . . . ,I p, n, ; u1,. . . ,u,) 3. &(l/n,. . ., l/n; u 1 , . . .,u,) = l/n. 4. (Weighted branching property): * . , p i -1 9 P P I
&(PI,
*
9
-
+
I, 9
P j + 1,
+
*.. > P n ; ~
., u j -
1 , ..
1,
u’,u”, uj+ 1,. * *
( p ’ p ” ) ( p’u’ p”u”) 1-6 --;u‘, p’ p” ’ p’ p“ E(U)
[
+
+
9
Un)
.)I,
where E ( U ) = p , u l + . . . + p j - l u j - l + p ’ ~ ’ + p ” u ’ ’ + p ~ + ~ u ~ ++~ P n U , * + . . a
This property shows how the weighted information energy behaves when two elementary events are replaced by their union. The weighted information energy (45) has been axiomatically characterized by J. A. Pardo (1985). L. Pardo (1986) introduced and characterized a scalar parametric weighted information energy having a scalar parameter. B. Conditional Weighted Information Energy
If we know the result of a random variable Y related with the random variable X , the probabilities are modified, but the weight of the outcome x i ( i = 1,. ..,n) remains unchanged, Hence, if the random variable Y takes
236
L. PARD0 AND 1. J. TANEJA
on the values y l , . . . , y m , we have for each vj a probability distribution p ( x l / y j ) ., . . ,p ( x n / y j )and , we can define the conditional weighted information energy, given the value yj of the random variable Y , by the expession
where
foreachj = 1, ..., m. The conditional weighted information energy of X,given the random variable Y , is defined by the expression
where
Thus we can write
Pardo (1981a)established that € * ( X / Y ) 2 €%!(X). C. Noiseless Coding Theorems and Weighted Information Energy
Let us assume that we have set of n code words wi with probabilities p i and lengths Ni for each i = 1,2,...,n. The code words are built from the code alphabet S = ( a , , . . .,a D ) ,D 2 2. It is well known that there exists a uniquely decipherable instantaneous code with the lengths 4( i = 1,. . .,n) satisfying the Kraft inequality
In order to distinguish the code words wi of a goal-directed experiment according to their importance with respect to a given qualitative characteristic of the system, we shall associate to each code words wi a weight ui > 0 directly
INFORMATION ENERGY AND ITS APPLICATIONS
237
proportional to its importance. Accordingly, we define the following weighted mean code-word length:
In particular, when all the weights are equal, i.e., ui = u for each i = 1,2,. . . ,n, then the mean code word length stands as follows:
Based on the above considerations, we have the following theorem.
Theorem 4. Let Ni (i = l , . ..,n) be the length of code words w i for each i = 1,. . .,n, satisfying the Kraft inequality (46). Then the weighted mean code-word length (47) satisfies Y ’ @ ( P )I &@(X),
(49)
with equality iff
Proof: Let a, and bi be nonnegative real numbers. Then Holder’s inequality is given by
i aihi ( “ )”’( i h:>’”,
i= I
I
a;
i=l
i= 1
with equality iff there are numbers
in (50). We get
with equality iff
+ l/q = 1,
(50)
A1 2 0 and 1, 2 0, not both 0, such that
ilxi = 12yi, Take
p > 1, l/p
i = 1,...,n.
238
L. PARD0 AND 1. J. TANEJA
As C;=, D-“ I 1, then from (51), we get the required result, i.e., the inequality (49). w
The following theorem gives bounds on the weighted mean code-word length.
Theorem 5. With proper choice of the lengths N,, . . .,N,, in the code of Eq. (45), the weighted mean code-word length satisfies the following inequality: D-’&Q(X) < Y Q ( P ) IbQ(X).
(52)
Proof: Choose the code-word lengths 4to satisfy the inequality
for all i = 1,2,. . . ,n. Simplifying the above inequality, we get
Taking the sum over all i = 1,2,. . . ,n in the right-hand side of inequality (53), we get the Kraft inequality C1=, D-“ I 1. Raising both sides of (53)to the power and multiplying by
4
P2ui
(c;=P?Ui)’/2’ 1
we get
D-1/2p;ui
< < pi~iD-Ni12 112 -
P?ui (I:=1 p’ui)”2(c;=
1 PiUi) (C?=1 P’ui)”2(c;= 1 PiUi) (I:=1 PiUi) for all i = 1,2,. . . ,n. Summing over all i = 1,2,. . . ,n, and raising both sides to
the power 2, we get the required result.
ACKNOWLEDGMENTS This work has been written during the second author’s stay with the Departmento de estadistica e LO., Universidad Complutense de Madrid, Spain and he is thankful to the abovementioned university for providing facilities and financial support.
INFORMATION ENERGY AND ITS APPLICATIONS
239
REFERENCES Aczkl, L., and Darbny, Z. (1975). On Measures of Injbrmation and Their Characterizations. Academic Press, New York. Agresti. A. and Agresti, B. F. (1978). In Statistical Methodology (Schuessler, ed.), pp. 204-237. Arimoto, S.(1971). Information and Control 19, p. 181-194. Beckenbach, E. F., and Bellman, R. (1971).Inequalities. Springer Verlag, Berlin. Bhargava, T. N., and Doyle, P. H. (1974). Theo. Proh. Biology 43,241 -251. Bhargara and Uppuluri (1975). Metron, B V I , I - 13. Bickel. P. J., and Doksum, K. ( I 977). Murhematical Statistics. Holden-Day, Oakland, California. Blackwell, D. (1951). Pror. 2nd Berkeley Symp. Berkeley, 93- 102. University of California Press. Blackwell, D., and Girshick, M. A. (1954). Theory qf yams and Statistical Decisions. Wiley, New York. Boekee, D. E.. and van der Lubbe, J. C. A. (1979).Pattern Recognition 11.353-360. Capocelli, R. M., Gargano, L., Vacarro, U.,and Taneja, 1. J. (1985).Proc. International Conference on Cybernetics and Society, Tucson, .4rizona, pp. 78-82. Chaundy, T. W., and McLeod, J. B. (1960). Proc. Edin Math. Notes 43,7-8. Chen, C. H. (1976). Information Sciences 10, 159-171. DeGroot. M. (1970). Optimal Statistical Decisions. McGraw-Hill, New York. Devijver, P. A. (1974).I E E E Trans. on Cr~tnp.C-23. 70-80. El-Sayyad, G . M. (1969). Technometrica 1 I, 40-42. Garcia-Carrasco, M. P. (1983). Pror. X I I I Conqreso de la Soc'iedad Espan'ola de Estadistica e Inoestigacih Operatiiu. Valladolid,.Spain, pp. 65-72. Gil, M. A. (1988). Ann. Inst. Statist. Math. 40,627-639. Gil, M. A. (1989). Cornrnun. Statist.-Theory Meth. 18(4),1521-1526. Gil, M. A., Lopez, M. T., and Gil, P. (1984).Kyherneres 13,245-251. Gil, M. A., Corral, N.,and Gil. P. (l985a). European J . Oper. Res. 22,26-34. Gil, M. A., Lopez. M. T., and Gil, P. (1985b).Fuzzy Sets and Systems 15.65-78, 129-145. Gil, P. (1981). Teoria Matemcirica de la Irrforrnaciiin. ICE, Madrid. Gil, P., Gil, M. A.. Menendez, M. L., and Pardo. L. (1990). Fuzzy Sets and Systems 37, 183- 192. Gini. C. (19 12).Studi Economica-yiuridici della Fucolta di yiurisprudenza dell Unitiersita di Cagliari, a 111, Parte 11. Guiasu, G. H. ( I 977). Information Theorv with Applications. McGraw-Hill Internalional, New York. Hardy, G. H., Littlewood, J. E., and Polya, G. (1934). Inequalities. Cambridge University Press. London. Harvda, J., and Charvat, F. (1967). Kyherrietika 3, 30-35. Kailath, T. (1967). IEEE Trans. on Commun. Tech. COM-15.52-60. Kanal, L. N. (1974).IEEE Trans. on Infixm. Theory IT-20,687-722. Kiesler, J . H. ( 1976) Foundations o$ infinitesimal Calculus. Prindle, Weber and Schmidt. Massachusetts. Lehmann, E. L. (1959). Testing Sfatistical Hypothesis. Wiley, New York. Lieberson, S. (1969). Amer. Sor. Rev. 34,850-862. Lindley, D. V. (1956). Ann. Math. Statistic 27,986-1005. Lindley, D. V. (1957).Biornetrica 44, 179- 186. Lissack, T. and Fu, K. S. (1976). IEEE Trans. on Inform. Theory IT-22.34-45. Marshall, A. W., and Olkin, I. (1979). Inequalities: Theory f f l Majorization Its Application. Academic Press, New York. McFadden, J. A. (1965).J . Soc. Indust. Appl. Math. 13,988-994.
240
L. PARD0 AND 1.1. TANEJA
Menendez, M. L. (1986). Ph.D. Thesis, Universidad Politecnica, Madrid. Menendez, M. L., and Pardo, L. (1989).European Journal of Operational Research, in press. Menendez, M. L., Pardo, J. A., and Pardo, L. (1989). Fuzzy Sets and Systems 32,81-91. Morales, D., Pardo, L., and Quesada, V. (1985).Estadistica Espan’ola 107, 5-14. Morales, D., Pardo, L., and Quesada, V. (1987). Tech. Rep. 6/1987. Departamento de Estadistica e I. O., Universidad Complutense de Madrid. Nawrotzky, K. (1962). Ein grenzwertsatzfiir Homogone Zufallige Punktfolgen. (Verallgemeinerung eines Satzen von A. Renyi), Math. Nachr. 24,201-217. Nayak (1983).Ph. D. Thesis, Univ. Pittsburgh, USA. Nayak, T. K. (1985). Commun. Statist.-Theory and Methods 14(1),203-215. Nei, M. (1973).Proc. Naf. Acad. Sci. 74, 3321-3323. Okuda, T., Tanaka, H., and Asai, K. (1978). Inform. and Control 38,135-147. Onicescu, 0.(1966).C. R. Acad. Sci., Ser. A. 263,841-842. Onicescu, 0.(1974).Rev. Roum. Mathe. Pures et Appl. 19(4),473-475. Ozeki, K. (1980).Information and Control 47,94- 106. Pardo, J. A. (1985).Estadistica Espan’ola 94, 113-122. Pardo, J. A., Menendez, M. L., Taneja, I. J., and Pardo, L. (1990). Journal of Combinatorics Information & System Sciences, in press. Pardo, L. (1981a). Trabajos de Estadistica e fnuestiyacidn Operatiua 32, 11-20. Pardo, L. (1981b).Estadistica EspaAola 90, 11-20. Pardo, L. (1982a).Estadistica EspaAola 94,113-122. Pardo, L. (l982b). Real Academia de Ciencias Exactas. Fisicas y Naturales de Madrid 76, 80-92. Pardo, L. (1982~).Real Academia de Ciencias Exactas, Fisicas y Naturales de Madrid, LXXVXI, 903-906. Pardo, L. (1983).Proc. Third European Young Staiisticians Meeting. Leuuen, Belgium, pp. 140147. Pardo, L. (1984a).Estadistica Espatiola 104,23-34. Pardo, L. (1984b).Estadistica Espatiola 105.27-43. Pardo, L. (1984~).Proc. XIV Congreso de la Sociedad Espaiiola de Estadistica e Inuestigacidn Operatioa, Granada, Spain, pp. 327-334. Pardo, L. (1984d). In Cybernetics and Systems Research 2 (R.Trappl, ed.),pp. 541-545. NorthHolland, Amsterdam. Pardo, L. (1985). Trabajos de Estadistica e Inuestigacidn Operatiua 36,78-93. Pardo, L. (1986a).In$)rmation Science 40, 155- 164. Pardo, L. (1986b).Statistica 46,243-251. Pardo, L. (1987).Real Academia de Ciencias Exactas, Fisicas y Naturales de Madrid 81, 102-1 15. Pardo, L. and Menendez, M. L. (1985).Proc. First I. F. S. A. Congress, Palma de Mallorca, Spain, pp. 55-61. Pardo, L., and Menendez, M. L. (1989).Journal of Cornbinatorics Information & System Sciences 14, 163-171. Pardo, L., Morales, V., and Quesada, V. (1985). Trabajos de Estadistica e lnoestiyacidn Operatiua 36,233-242. Pardo, L., Menbndez, M. L., and Pardo, J. A. (1986a).Cybernetics and Systems’ 86 (R. Trappl, ed.), pp. 599-606. D. Reidel, The Netherlands. Pardo, L., Menendez, M, L., and Pardo, J. A. (1986b).Kybernetes 15, 189-194. Pardo, L., Menendez, M. L., and Pardo, J. A. (1988).Fuzzy Sets and Systems 25,955105, Patil, G. P., and Taillie, C. (1982). J. Amer. Stat. Assoc. 77,548-567. Perez, A. (1967).Rev. Roumaine Math. Pures Appl. 12, 1341-1347. Rao, C. R. (1973). Linear Statistical InJerence and its Applications. John Wiley and Sons, New York.
INFORMATION ENERGY AND ITS APPLICATIONS
24 I
Rao, C. R. (1982). Utilitas Mathematics 21,273-282. Renyi, A. (1961). Proc. th. Berk. Symp. Math. Stat. Prohl. 1,547-561. Rohatgi, V. K. (1976). An Introduction to Prohahility Theory and Mathematical Statistics. John Wiley and Sons, New York. Shannon, C. E. (1948). Bell System Tech. J . 27, 379-423; 623-656. Sharma, B. D., and Mittal, D. P. (1975). J . Math. Sci. 10,280-40. Sharma, B. D.. and Taneja, I. J. (1975).Metrilia 22,205-215. Sharma, B. D., and Taneja, I. J. (1977).Electron Inform. Kybern. (E.1.K) 13,419-433. Simpson, E. H. (1949). Narure 163,688. Stone, M. (1959). Ann. Math. Statist. 30, 55-79. Tanaka, H.. Okuda, T., and Asai, K. (1979). Advances in Fuzzy Sets Theory and Applications. North-Holland”. 303 - 320. Taneja, I. J. (1975).Ph.D. Thesis. University of Delhi, Delhi, India. Taneja, I. J. (1979).J . Comb. Injbrm. & System. Sri. 4, 253-274. Taneja, 1. J. (1985).Pattern Recognition Letters 3, 361-368. Taneja, I. J. (1989). Advances in Electronics and Electron Physics, Vol. 16, pp. 328-413. Academic Press. Theodorescu, A. (1977). Trabajos de Estadistica e Iniwrigacicin Operatioa 27, 183-206. Vajda, 1. (1968). Problems of Inform. Transm. 4, 6- 14. Van der Lubbe, J. C. A., Boxma, Y., and Boekee, D. E. (1984). Information Sciences 32, 187-215. Van der Lubbe, 1. C. A., Boekee, D. E., and Boxma, Y. (1987).lnjirmation Sciences 41, 139-169. Yaglom, A. M., and Yaglom, I. M. (1969). Prohabiliti et In/brmation. Dunod, Paris. Zadeh, L. A. (1965). Inform. and Control 8,338-353. Zadeh, L. A. (1968).J . Math. h a / . Appl. 23,421-427.
This Page Intentionally Left Blank
ADVANCES I N ELECTRONICSA N D ELECTRON PHYSICS, VOL. no
Recent Developments in Image Algebra G. X. RITTER Center for Computer Vision Research Department of Computer and Information Sciences University of Florida Gainesville. Florida
I. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 11. Image Algebra . . . . . . . . . . . . . . . . . . . . . . . . A. Induced Operations on Images . . . . . . . . . . . . . . . . . B. Set Theoretic Operations . . . . . . . . . . . . . . . . . . . C. Examples of Pixel Level Operations . . . . . . . . . . . . . . . . D. Templates. . . . . . . . . . . . . . . . . . . . . . . . . E. Generalized Image Products . . . . . . . . . . . . . . . . . . F. Linear and Lattice Transforms . . . . . . . . . . . . . . . . . 111. A Medley of Consequences . . . . . . . . . . . . . . . . . . . . A. Examples of Intermediate and Higher-Level Transforms. . . . . . . . . B. Generalized Matrix Products . . . . . . . . . . . . . . . . . . C. Template Decomposition . . . . . . . . . . . . . . . . . . . D. Image Algebra and Artificial Neural Networks . . . . . . . . . . . . E. Recursive Processes . . . . . . . . . . . . . . . . . . . . . Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . .
243 246 247 253 254 257 261 265 273 274 282 285 292 299 305 305
I. INTRODUCTION Image algebra is a mathematical theory concerned with the transformation and analysis of images. Although the current focus is on the analysis and transformation of images by computers, the main goal is the establishment of a comprehensive and unifying mathematical theory of image transformations, image analysis, and image understanding in the discrete as well as the continuous domain. The idea of establishing a unifying theory for concepts and operations encountered in image and signal processing is not new. Thirty years ago, Unger proposed that many algorithms for image processing and analysis could be implemented in parallel using a “cellular array” machine (Unger, 1958). These cellular array machines were inspired by the work of von Neumann in the 1950s (von Neumann, 1951). Realization of von Neumann’s cellular array machines was made possible with the advent of VLSI 243 Copyright I 1991 by Audcrnic Press lm All rights of reprodudion in dny form reserved ISBN 0-12-014680-0
244
G . X. RITTER
technology, NASA’s massively parallel processor or M P P (Batcher, 1980), and the CLIP series of computers developed by M. J. B. Duff and his colleagues (Duff, 1982; Fountain et d., 1988) represent the classic embodiment of von Neumann’s original automaton. A more general class of cellular array computers are pyramids (Uhr, 1983) and Thinking Machines Corporation’s Connection Machine (Hillis, 1985). In an abstract sense, the Connection Machine is a universal cellular automaton with an additional mechanism added for non-local communication. Many operations performed by these cellular array machines can be expressed in terms of simple elementary operations. These elementary operations create a mathematical basis for the theoretical formalism capable of expressing a large number of algorithms for image processing and analysis. In fact, a common thread among designers of parallel image processing architectures is the belief that large classes of image transformations can be described by a small set of standard rules that induce these architectures. This belief led to the creation of mathematical formalisms that were then used to aid the design of special-purpose parallel architectures. Matheron and Serra’s Texture Analyzer (Klein and Serra, 1972), ERIM’s (Environmental Research Institute of Michigan) Cytocomputer (Sternberg, 1983; McCubbrey and Lougheed, 1985), and Martin Marietta’s G A P P (Cloud and Holsztynski, 1984) are examples of this approach. The formalism associated with these cellular architectures is that of pixel neighborhood arithmetic and mathematical morphology. Mathematical morphology is the part of image processing that is concerned with image filtering and analysis by structuring elements. It grew out of the early work of H. Minkowski and H. Hadwiger on geometric measure theory and integral geometry (Minkowski, 1903,1911; Hadwiger, 1957), and entered the modern era through the work of G. Matheron (1975) and J. Serra (1982) of the Ecole des Mines in Fountainbleu, France. Matheron and Serra not only formulated the modern concepts of morphological image transformations, but also designed and built the Texture Analyzer System. Since those early days, morphological operations and techniques have been applied from low-level, to intermediate, to high-level vision problems. Among some recent research papers on morphological image processing are Crimmins and Brown (1985), Haralick, et al. (1987a, 1987b),and Maragos and Schafer (l986,1987a, 1987b). Serra and Sternberg were the first to unify morphological concepts and methods into a coherent algebraic theory specifically designed for image processing and image analysis. Sternberg was also the first to use the term “image algebra” (Sternberg, 1980, 1985). More recently, P. Maragos (1985) introduced a new theory unifying a large class of linear and nonlinear systems under the theory of mathematical morphology. However, despite these profound accomplishments, morphological methods have some well-known
RECENT DEVELOPMENTS IN IMAGE ALGEBRA
245
limitations. For example, such fairly common image processing techniques as feature extraction based on convolution, Fourier-like transformations, chaincoding, histogram equalization transforms, image rotation, and image registration and rectification are-with the exception of a few simple caseseither extremely difficult or impossible to express in terms of morphological operations. The failure of a morphologically based image algebra to express a fairly straightforward U.S. government-furnished FLIR (forward-looking infra red) algorithm was demonstrated by P. Miller at Perkin-Elmer (1983). The failure of morphological image algebra to provide a universal image processing algebra is due to its set-theoretic formulation, which is based on the Minkowski addition and subtraction of sets (Hadwiger, 1957). These operations ignore the linear domain, transformations between different domains (spaces of different dimensionalities), and transformations between different value sets, e.g., sets consisting of real, complex, or vector valued numbers. On the other hand, the algebra presented in this paper includes these concepts and also incorporates and extends the morphological operations. The development of image algebra grew out of a need, by the U.S. Air Force System Command, for a common image-processing language. Defense contractors do not use a standardized, mathematically rigorous and efficient structure that is specifically designed for image manipulation. Documentation by contractors of algorithms for image processing and the rationale underlying algorithm design is often accomplished via word description or analogies that are extremely cumbersome and often ambiguous. The result of these ad hoc approaches has been a proliferation of nonstandard notation and increased research and development cost. In response to this chaotic situation, the Air Force Armament Laboratory (AFATL) of the Air Force Systems Command, in conjunction with the Defense Advanced Research Projects Agency (DARPA),supported the early development of image algebra with the intent that the fully developed structure would subsequently form the basis of a common image-processing language. The goal of AFATL was the development of a complete, unified algebraic structure that provides a common mathematical environment for image-processing algorithm development, optimization, comparison, coding, and performance evaluation. The development of this structure proved highly successful, capable of fulfilling the tasks set forth by the government, and it is now commonly known as image algebra. Research and development of image algebra theory and technology continues at an accelerated pace. An image algebra workbench is under development by a team consisting of participants from the University of Florida, the Environmental Research Institute of Michigan (ERIM),Software Leverage Inc. of Boston, and Honeywell's Research and Development Center in Minneapolis, Minnesota. This effort is sponsored by the US. Air Force Systems Command and will provide a wide variety of image-processing
246
G. X. RITTER
software tools, including an image algebra ADA translator and interpreter. Various image algebra- based high-speed architectures for image processing are under development at Texas Instruments (Dallas, Texas), Honeywell Inc., and the University of Florida. Image algebra has been implemented on the CM2 Connection Machine, ERIM’s Cytocomputer, Honeywell’s PREP (a recirculating pipeline architecture), and transputers. Several image algebra programming languages have been developed. These include Image Algebra Fortran (IAFORTRAN)(IVS Inc., 1988),Image Algebra C (IAC) (Perry, 1987), an Image Algebra Language (IAL) implementation on transputers (Crookes et al., 1990), and I A Connection Machine *Lisp (Wilson et al., 1988). Unfortunately, there is often a tendency among engineers to confuse or equate these languages with image algebra. An image algebra programming language is not image algebra, which is a mathematical theory. An image algebra programming language usually implements a particular subalgebra of the full image algebra. In addition, some implementations such as preprocessors often result in a decrease in computational performance. These restrictions and limitations in implementation are a result of several factors, the most pertinent being development costs and current hardware constraints. They are not limitations of image algebra, and they should not be confused with the capability of image algebra as a mathematical tool for image manipulation. The capability of image algebra should become evident from the theory and examples described in the subsequent sections. TI. IMAGE ALGEBRA Image algebra as defined in this paper is a heterogeneous or many-valued algebra in the sense of Birkhoff (Birkhoff and Lipson, 1970; Ritter et al., 1990), with multiple sets of operands. Manipulation of images for purposes of image enhancement, analysis, and understanding involves operations not only on images, but also on different types of values and quantities associated with these images. Thus, the basic operands of image algebra are images and certain values or quantities associated with images. Roughly speaking, an image consists of two things, a collection of points (of some topological space), and values associated with these points. Images are therefore endowed with two types of information, namely the spatial relationships of the points, and also some type of numeric or other descriptive information associated with these points. To make these notions mathematically precise, we formally define the concepts of value set, point set, and image. A homogeneous or single-valued algebra is a heterogeneous algebra with only one set of operands. In other words, a homogeneous algebra is simply a
RECENT DEVELOPMENTS IN IMAGE ALGEBRA
247
set together with a finite number of operations (Ritter et al., 1990). In image algebra, homogeneous algebras are referred to as value sets. An arbitrary value set will be denoted by F. A point set is simply a subset of some topological space. We reserve the bold letters X, Y,and W to denote point sets. Elements of point sets are called points and will be denoted by lower-case bold letterse.g., x E X. Given a point set X and a value set F, then an F-valued image a on X is the graph of a function a: X + F. Thus, an F-valued image a on X is of the form a = {(x,a(x)): x
E X},
(1)
where a(x) E F. The set of image values of a is the range of the function a (which is a subset of F). An element (x, a(x)) of the image a is called a picture element or pixel, where x is called the pixel location, and a(x) the pixel vulue at location x. The set of all F-valued images on X is denoted by FX. Here we follow the usual mathematical convention of denoting the set of all functions from a set A to a set B by BA. A . Induced Operations on lmages
Operations on and between F-valued images are the natural induced operations of the algebraic system F. For example, if 0 is a binary operation on F, then c induces a binary operation on FX(again denoted by 0) that is defined as follows: Let a, b E FX.Then a 0b
= {(x,c(x)): c(x) = a(x)
0
b(x), x
E X}.
(2)
Induced unary operations are defined in a similar fashion. In fact, any function f:F + F induces a function FX+ FX,again denoted by f ,and defined by f(a) = {(x, c(x)): c(x) = f(a(x))}. (3) The operations defined by Eqs. (2) and (3) are called induced pixel level operations. It follows from the definition of induced pixel operations that the set F Xtogether with the induced pixel level operations inherits most, if not all, of the algebraic properties of the value set F. As an example we consider the set of real-valued images on X. Here F = R, where R denotes the set of real numbers. Replacing in Eq. (2) by the binary operations of addition, multiplication, and maximum, we obtain 0
a
+ b = {(x,c(x)): c(x) = a(x) + b(x), x E X},
-
a b
= {(x,c(x)): c(x) = a(x)
b(x), x
E
X},
(4) (5)
248
G. X. RITTER
and av b
= {(x,c(x)): c(x) = a(x) v b(x), x E X},
(6)
respectively. These are the basic binary operations for real-valued images. It follows from Eqs. (4)-(6) that the ring (Rx,+,.) and the lattice (Rx,v ) behave very much like the ring and lattice of real numbers. In view of the fact that the operations between real-valued images are induced by the operations between real numbers, this should come as no great surprise. Therefore, manipulating real-valued images is analogous to manipulating real numbers, and our familiarity with the real number system provides us with instant familiarity with the induced system RX. The same observations hold for the induced system FX.If we know the system F, then we know the induced system FX.In image algebra it is always assumed that the algebraic system F is known and that the algebraic properties of FXare then derived from this knowledge. It is important to note, however, that even though the algebraic properties of FXare derived from those of F, the overall mathematical structure of FXis quite distinct from that of F. Elements of FXcarry spatial information while those of F generally do not. In addition, (Fx,o) need not be isomorphic (algebraically equivalent) to (F, Usually the induced algebraic structure FX is weaker than the algebraic structure of F. The succeeding discussion demonstrates this for the ring of real-valued images. Analogously to the development of the algebra of real numbers, other binary and unary operations on real-valued images can be derived from the basic pixel operations (Eqs. (4)-(6)) either directly or in terms of series expansion. However, as mentioned earlier, image algebra assumes familiarity with the value set F, which in this case is the set of real numbers R. Thus, the remaining operations on R X are again induced by the corresponding operations on R. Two of these operations-commonly used in image processing-are exponentiation and the computation of logarithms. In particular, if a and b are real-valued images on X, then 0).
ab = {(x,c(x)): c(x) = a(x)b(x)if a(x) # 0, otherwise c(x)
=
0, x
E X).
(7) As we are dealing with real-valued images, we follow the rules of real arithmetic and restrict the binary operation to those pairs of images a, b for which a(x)b(x)E R whenever a(x) # 0. This prevents the creation of complex pixel values such as ( - 1)’’’. The inverse of exponentiation is defined in the usual way by taking the logarithm, namely log, b = {(x,c(x)): C(X)=
b(x), x
E
X).
As for real numbers, log, b is defined only for those images a and b for which a(x) > 0 and b(x) > 0 for all x E X.
249
RECENT DEVELOPMENTS IN IMAGE ALGEBRA
An image a E FXis called a constant image if all its pixel values are the same-i.e., if a(x) = k for some fixed k E F and for all x E X. Constant images are used in order to define the concept of scalar operations, where scalar values are elements of F. In particular, if k E F, and a, b E FX,where a is the constant image a(x) = k for all x E X,then we define kub=a'~b. (9) If F = R and we replace the operation in Eq. (9) by the operations of addition, multiplication, and maximum, then we obtain 1)
k+b=a+b,
k.b=a-b,
(10)
andkvb=avb,
respectively. Other scalar/image operations are also naturally induced from previous operations. Thus, for k, a, and b as above, we obtain the following definitions using Eqs. (7) and (8): bk = b',
k b = ab,
and logkb = log,, b.
(1 1)
In the definition of log we assume, of course, that k > 0 and b(x) > 0 for all x. We also note that exponentiation is defined even when a(x) = 0. It follows from Eqs. (10)and ( 1 1) that -b = {(x, -b(x)): x E X} and b-' = {(x,c(x)):x E X and c(x) = [b(x)]-' if b(x) # 0, otherwise c(x) = O}. (13)
Subtraction and division can now be defined using (12) and (13), respectively: a -b
= a + (-b)
and
a/b
= a . b-'.
(14)
Image negation can also be used to define the minimum of two real-valued images and the absolute ualue of an image, namely a
A
b = -(-a
v -b)
and
la1 = a v (-a),
(15)
respectively. Suppose (FJ, is a ring with unity, and 0 and 1 denote the zero and unit elements of F, respectively. Then the induced structure ( F X , y , ~is)also a ring with unity. The zero image is the constant image 0 = {(x,O):x E X}, and the unit image is the constant image 1 = ((x,1): x E X ) . The images 0 and 1 have the obvious property ayO = a and a 1 = a. As mentioned earlier, this does not mean that the two structures(F,y, and (Fx,y, are isomorphic. For realvalued images, b b-' does not necessarily equal 1, although b b-' b = b. Thus, in contrast to (R,+,*), the ring (Rx, +); is not a division ring. However, it is a von Neumann ring since each element b has a pseudo inverse b-'. 0)
1
-
7 )
0)
-
-
250
G . X. RITTER
Inequalities between real-valued images can be defined in terms of maximum (or minimum) by a Ib if and only if a v b = b. If the point set X contains more than one point, then it is possible to have two real-valued images a and b such that a # a v b and b # a v b. Thus we have cases where neither a 5 b or b 5 a need to hold. Hence the lattice (Rx,v ) is only partially ordered by the induced order and can, therefore, not be isomorphic to the totally ordered lattice (R, v). These examples corroborate our earlier claim that the induced structure FX is somewhat weaker than the structure of F. Of course, scalar operations and the computation of the absolute value of an image could just as well have been induced by use of Eq. (3). For example, we could have used the real-valued function f ( r ) = Irl in order to define )a)= f(a). These comments also apply to many other image algebra operations. A surprising number of more complicated operations can be obtained from very short sequences of elementary pixel level operations. The induced functions obtained through the application of (3) have their own limitations. Chief among these is their restriction to pixel values. The evaluation f(a(x)) of the composition of a with f takes place on the value set F. The induced structure does not provide for spatial manipulation. In order to obtain induced operations that provide for spatial manipulation of image data, we compose spatial domain functions with the function a. In particular, if f: X -+ Y is a function and a E FY, then we define the induced image a(f) E FXby
a(f) = {(x, a(f(x))):x E X}. (16) In Section C, it will become evident that the induced image a(f) can just as easily be obtained by an image-template operation. However, in various cases Eq. (16) is more translucent and computationally more efficient than an imagetemplate convolution. Also, in addition to (16), another type of image algebra spatial operation is provided when the spatial domain X is a subset of a vector space. In this case the vector space is viewed as a value set with the usual operations of vector addition. This is especially useful in practice, where X is usually-but not always-a subset of Euclidean n-space R". As a matter of fact, the most commonly used set of points X is a rectangular subset of the set Z2 = Z x Z (here Z denotes the set of integers) of form X = {(i, j ) : 1 Ii I rn, 1<j
RECENT DEVELOPMENTS IN IMAGE ALGEBRA
25 1
the point set Y c X.For application purposes, image algebra also allows the restriction of a to a subset of X specified by an image-dependent property such as Y = {x E X : a(x) E S}, where S c F. This type of restriction is denoted by a double vertical bar and provides a useful tool for expressing various algorithmic procedures. For example, if a is a real-valued image and Y = (x E X:a(x) 2 T } ,where T denotes a given threshold value, then we define allrT by allzT = a1{x,X:a,x)2T).I n this case the domain of all2T,denoted by domain (all Z T );Uis the set of all locations where a exceeds the threshold T. Suppose a E F and b E FY,where X and Y are subsets of the same topological space. Then the extension of a to b on Y is defined by al(b.Y)(X)=
a(x) b(x)
if x E X if x E Y\X,
where Y \X = { y E Y:y 4 X}. In actual practice, the user will have to specify the function (image) b on Y. The need for versatile and simply formulated pixel level operations has led to the construction of new functions in terms of more elementary image algebra operations. Because of their common usage in image processing, several of these functions have become part of a standard set of image algebra functions. In addition to such commonly used image valued functions as the trigonometric, exponential, and logarithmic functions, this set includes the following generalization of the characteristic function: Given a E FXand S E (2F)X, where 2F denotes the power set of F so that S(x) c F for each x E X. then Xs(a) = {(x,c(x)): c(x) = 1 if a(x) E S(x), otherwise c(x) = 0).
(18)
Obviously, if S : X + 2Fis a constant function, i.e., S returns the same set S(x) for each x E X,then xs is the usual characteristic function. We also note that xs returns a Boolean-valued image regardless of the type of value set F used. The Boolean values c(x) = 0 or 1 in (1 8) represent the zero and unit of the algebraic system F and thus need not be the real numbers 0 and 1. Pixel level image comparison provides a simple application example of the generalized characteristic function. Given the image b E RX, we define S , b E (29' by S,,(x) = ( r E R:r I b(x)). The functions ScbrS = b , and S>b are defined analogously. Thus, for example, S,,(x) = { r E R: r > b(x)) and S,,(x) = { r E R: r = b(x)}. Substituting these set functions for S in Eq. (18) yields xs,,(a) = ((x,c(x)):c(x) = 1 if a(x) > b(x), else c(x) = 01, xs,,(a) = ((x,c(x)):c(x) = 1 if a(x) 5 b(x), else c(x) = 01, etc.
(19)
G.X. RITTER
252
In order to reduce and simplify notation, we define X > b = xS,), &b = zSSb, X
-
X,b(a) = [(a - b) v 01-' [(a - b) v 01.
(20)
Obviously, if a = b, then (a - b) v 0 = 0 and x,,(a) = 0-' - 0 = 0 since by definition of exponentiation, 0-' = 0. The function X
-
Xlb(a)
= [X>b(a)l',
X>b(a)
and Xb(a)
= [X
= XSb(a) ' X>b(a).
(21) Whenever b is the constant image with gray values equal to k, it is customary to replace b by k in the above definitions. The image algebra defined thus far is characterized by the heterogeneous algebra
((F, 01, (FX,%:),
(22)
where (F, 0) denotes the given system, (F',:) the induced system with denoting the induced operation (2) induced by and the scalar operation (9) between F and FXinduced by;. In order to reduce notational overhead, we use the same symbol 0 to denote the three distinct operations 0, z, and 7, and represent the heterogeneous algebra (22) as the triple 0,
In (22) above, we have purposely ignored the induced operation (16) and the operations of image restriction and extension. These operations are special cases of image/template operations described in Section 1I.F. We conclude this section by extending the set of operations of (23) to include an operation that turns an image into a scalar value. Suppose that X is finite, say X = { x l r x Z..., , x,,,) and that (F,y) is a commutative semigroup (i.e., y is an associative and commutative binary operation on F). Then the global reduce operation r on FX induced by y is defined as
ra = r
xex
a(x) = a(xl)ya(xz)r...ya(x,),
(24)
253
RECENT DEVELOPMENTS IN IMAGE ALGEBRA
where a E FX.Thus, r: FX-+ F. In particular, if F = R and y then (24) becomes
=
+ or y = v,
or
V a(x) = maxfa(x):x E X > = iv= 1 a(xi) m
Va
=
%EX
= a(x,) v a(x2) v
. . . v a(x,),
(26)
respectively. Although in image processing by computer the spatial domain X is always finite, the global reduce operation r need not be restricted to finite sets. Natural extensions to infinite sets are usually inherent for different binary operations, value and point sets. For example, if X is a compact subset of R" and a E RXis continuous, then Eqs. (25) and (26) have the form
and Va
5
V a(x) X€X
3
supa(x), %EX
respectively. B. Set Theoretic Operations Given a set S , then the operations of union and intersection determine the Boolean algebra (2', u, n),where 2' denotes the power set (set of all subsets) of S. Image algebra extends the heterogeneous algebra (23) by adjoining the two algebras ( 2 F , u , n )and (2',u,n). The operations of union and intersection are then used to define various other set theoretic operations. For instance, set subtraction is defined in terms of intersection: A\B = A n B', where B'denotes the complement of the set B. Thus, if A, B c F, then A\B = (r E F: r E A and r 4 B).. Two prime concepts associated with a function are its domain and range. These concepts provide two key image algebra operations that allow us to map objects from FXto 2Fand 2'. In particular, for a E RX,domain (a (1 c X is the set of points on which a achieves its maximum, while range(a) c R is the set of all values determined by a.
254
G . X. RITTER
It is often necessary to select some element from a given set. The sup of compact subsets of R,denoted by V , is an example of an operation that selects a specific element from a set, and V[range(a)] is an important quantity in various image-processing tasks. Obviously, [range(a)] = v a for compact X c R“and continuous a E RX.This indicates that various mappings from 2F or 2’ into F or X can be realized via previously defined operations. However, one function from 2 F + F or 2’+X guaranteed by axiomatic set theory (Dugundji, 1966) and not obtainable from our previously defined operations is the choice function. For this reason we add the choice function to our arsenal of image algebra operations. The choice function, denoted by the word “choice,” when applied to a set returns (chooses) an arbitrary element of that set. Another fundamental notion associated with axiomatic set theory is the cardinality of a set. In image algebra the cardinality function, denoted by the word “card,” is a function from some power set into the set N u {a}, where N denotes the set of natural numbers N = {0,1,2,.. .} and 00 is a special symbol called “positive infinity.” Specifically, card is a function card: 2* u 2’ -P N u { M } defined by
v
card(S) =
1
n 0 00
if S is finite and has n elements if S is the empty set 0 if S is neither finite nor empty
(29)
It follows from the above definition that card does not differentiate between different types of “infinites” such as countable infinite or uncountable infinite and thus cannot distinguish nonequivalent infinite sets. The theoretician can always extend image algebra to include the true cardinality of a set. However, in practice-i.e., image processing by computer-only knowledge of the number of elements of finite sets is required. Only finite elements of 2F and 2’ occur in actual computer vision tasks. Despite this limitation, it is obvious (to those familiar with axiomatic set theory) that the operations of union, intersection, card, and choice provide image algebra the capability of effectively implementing a wide variety of (finite)set theoretic concepts in both the spatial and the value domains. C . Examples of Pixel Level Operations
In this section we express several typical image processing tasks in terms of image algebra code. The examples in this and subsequent sections demonstrate that image algebra, in contrast to other current high-level computer languages, has the distinct advantage of brevity, mathematical preciseness,
RECENT DEVELOPMENTS IN IMAGE ALGEBRA
255
and translucency. However, as with any language, there are usually various ways of describing the same objective. For example, if we are interested in knowing the number of pixels whose pixel values correspond to the integer k in some integer-valued image a, we could write
n := card(domain(a1 = k ) ) . where a 1 = k corresponds to the image a restricted to the points where a assumes the value k. Alternately, we could express the code for finding n in more compact form as
Example I (Semithresholding): In semithresholding the objective is to retain only those values of an image that are above some threshold m and below another threshold n > rn. Since Xtmn,nl(a) = X>m(a).Xsn(a),
(30)
the image algebra formulation of this algorithm is simply
b := a*(x>,,,(a)*x,,(a)). In the above and subsequent examples of this section we assume that all images under discussion are real-valued images. Particularly nice examples that exhibit the brevity and translucency of image code are order statistics of an image.
Example 2 (Moments as descriprors qf regions (Hu, 1962)): For any image a = {(x,a(x)):x E X } , with X t Z 2 a rectangular m x n array of points, moments of order p q are defined as follows:
and central moments as ppq = Cf C;(i - T ) p ( j - ,T)qa(i,j), where T = m,,/rn,,, J = mol/moo. The image algebra translation of the moments is simply
G . X. RITTER
256
Defining the mean images? and 3 by I=-
i.a
-
j-a and j = -
(34) c a Ca' the nonzero central moments are then given by the following translation: ppq =
C [(i - 1).
(j -
- a].
(35)
Example 3 (Mean and standard deviation): Here again X denotes a rectangular m x n array of points, and a E RX. The image algebra formulation of the mean and standard deviation of the pixel values of a looks exactly like the mathematical formulation of mean and standard deviation of an ensemble of numbers. Mean: 1 mn
p:=-xa.
Standard deviation:
Example 4 (Euclidean isometries): Euclidean isometries of images can be realized by use of Eq. (16). Suppose X c R2,where R 2 denotes the Euclidean plane, z E R 2 some fixed point, and x = (xl, x2) E X.Define a translation of X by the vector z as the function f ( x ) = x + z. Then a shift of a by the vector z is obtained by writing b := a( f -I).
Here b is an image on Y isometries:
= f(X).
In a similar fashion we obtain the following
Rejection about a point (origin):
b:= a(h-'),
where h ( x ) = -x. Rejection across a line ( y = x):
b:= a(y-'),
where g(x) = x ' and x '
= (x2,xI).
R E C E N T D E V E L O P M E N T S I N I M A G E ALGEBRA
257
Rotation (through angle 0):
b:= a(r-'), where r(x) = (xl cos 0 - .xzsin f), x 1sin H
+ s2cos 0).
Example 5 (Image magnijcation): Let a and X be as in the previous example. For a fixed xo E X,define f: X + 2z2 by
+ (2x,,2x,), xo + (2x, + 1,2x, + 1)xo + (2x1 + 1,2x,), xo + (2x,,2x, + 1)).
f(x) = (XO
(36)
Declare x E X to be related to y E Z 2 if y E f(x). Then the magnification of a by a factor of 2 about the point x, is obtained by writing b := a( 1'-l ) .
D . Templates Templates are special types of images. In particular, an F-valued template from Y to X is an element of (FX)'. If t E (FX)', then for notational convenience we define t, = t(y) in order to denote the image t(y) E FXfor each y E Y.The pixel values t,(x) of the image t, = {(x, t,(x)): x E X } are called the weights of the template t at the point y, and y is called the target point of the image t,. The pixel values of the template t = {(y, t,): y E Y ) , on the other hand, are images. It follows from Section 1I.B that FX(together with the induced operations) is a value set. The operations on templates are, therefore, the operations induced by FX.Thus, if 0 is a binary operation on F X , we define the template r = s 0 t by
s t = {(y,r,): r,
= s,
0
0
t,, y
E
Y).
(37)
Equivalently, the template r can be defined pointwise by ry = (s 0 t),
5
s,
0
for every y E Y.
t,
(38)
For real-valued templates, the basic binary operations reflect those of realvalued images (Eqs. (4)-(6)). More precisely, if s and t are real-valued templates from Y to X,then addition, multiplication, and maximum between s and t are defined pointwise as follows:
s
+t s*t
by by
s v t by
(S
+ t), = + t,, S,
(S (S
- t)Y = S,
*
t,,
v t), = S, v t,.
(39)
(40) (41)
G . X.RITTER
258 Y
t X
FIG. 1. Pictorial example of a translation-invariant template.
If t is a real- or complex-valued template, then the support of the image function t, is denoted by S(t,); that is, S(t,) = { x E X:t,(x) # O}. It is often possible and convenient to describe real-valued templates with finite support pictorially. For example, consider the case X = Z2.Let y = (x,y) be an arbitrary point of X and set x 1 = (x, y - l), x 2 = ( x 1, y), and x 3 = (x 1, y - 1). We now define a template t E (Rx)x by defining-for each y E X-its weights as t,(y) = 1, t,(x,) = 3, t,(x,) = 2, t,(x,) = 4, and t,(x) = 0 if x is not an element of the set {y, xl, x 2 , x3}. It follows from our definition that t has support Y(t,) = {y,x,,x,,x,}, and nonzero weights as shown in Fig. 1. The shaded cell in the pictorial representation of t indicates the location of the target point y. The weights of t, in the complement of Y ( t y ) are all zero. Thus, the pictorial representation of 9 ( t y ) together with the nonzero weights and the location of y completely specify t, . Observe that the template t described in the previous paragraph has the property that for each triple x, y, z E X,we have that t,(x) = tytz(x + z). Elements of (FX)' satisfying this property are called translation-inuariant templates. A variant template is a template that is not translation-invariant. Translation-invariant templates provide a convenient tool for illustrating template operations pictorially. Suppose X = Zz. Consider the translationinvariant templates s, t E (R")" shown in the top portion of Fig. 2. Then the basic binary operation of sum, product, and maximum of these two templates are as shown in the bottom portion of the figure. As for real-valued images, more complex template operations can now be defined in terms of the basic operations. For example, we define the exponentiation s' by
+
(st),
=
(Qty,
+
(42)
RECENT DEVELOPMENTS IN IMAGE ALGEBRA
259
a=
s+t=
sVt=
FIG.2. The three basic binary template operations.
the logarithm log, t by the minimum s
A
t by (S A
t)Y
(44)
S, A t,,
and so forth. A template t E (FX)'is called a constant template if t, = t,. for all y, y' E Y. A scalar template is a constant template with the additional property that ty(x) = ty.(x')for all pairs y, y' E Y and x, x ' E X. Analogous to scalar images, we let k represent the scalar template t if t,(x) = k, and define the scalar operation
k s 0
=t
(45 1
S.
0
For real-valued templates, we replace by the usual arithmetic and logic operations in order to obtain such scalar operations as 0
k
+ s = t + s,
k v s = t v s,
k - s = t -s,
sk = s',
and ks = ts. (46)
Suppose f: F -+ F and t E. ')'F(
Then the template f(t) is defined pointwise by Cf(t)l,
where f'(t,) is defined by Eq. (3).
=
N
Y
)
9
(47)
G . X. RITTER
260
Similarly, if f: W -+ Y is one-to-one, then the analogue of Eq. (16) is given by Ct(f )Iw =
(48)
where w E W. Thus t ( f ) E (Fx)w. In view of the above-described template operations, it is obvious that the induced structure mirrors the induced structure FX. Of course, one major difference is the type of value set. This is reflected by the global reduce operations. Again, we suppose that (F,y) is a commutative semigroup and that Y is finite, say Y = {yl, y 2 , . ..,y,}. Then the global reduce operation r on (FX)' is defined as rt =
r t,
.
= tylYty,Y.. Yt,n,
(49)
YEY
where t E (F')'. Since the induced structure (FX,y)is also a commutative semigroup, the operation (49) is well defined. We note that Tt is not a scalar value but an image, namely Tt E FX.Also, analogous to Eqs. (25) and (26), by substituting R = F and + = y or v = y. we obtain the equations II
Cf = ic= tY, 1
and n
Vt = v tY,? i=1
respectively. One type of templates of particular practical importance is parameterized templates, A parameterized F-valued template from Y to X with parameters in P is a function of form t: P
.+
(F')'.
Here P is called the set of parameters, and each p E P is called a parameter for t.
It follows from the definition that for each p E P, t(p) is an F-valued template from Y to X. Thus, a parameterized F-valued template from Y to X gives rise to a family of regular F-valued templates from Y to X, namely { t( p ) : p E P} c (F')'. We conclude this section by providing an application example of a (parameterized) template operation. Example 6 (Image histogram): Suppose X is a rectangular m x n array, Y = { j E Z: k I j IK } for some fixed pair of integers k and K, and
RECENT DEVELOPMENTS IN IMAGE ALGEBRA
26 1
P = {a E RX:range(a) c Y}.For each a E P, we define t(a) E (Ry)x by
The image h E RY obtained from the code
h :=
1t(a)
t(a), and, is the histogram of a. This follows from the fact that by (50)h = CxeX hence, h(j) = C,,xt(a)x(j). E. Generalized image Products
In this section we establish the concept of generalized image products. In terms of image processing, generalized image products constitute the most powerful tool of image algebra. In subsequent sections we will examine specific instances of these products, their applications, and relationships to several well-established mathematical theories. To begin with, we assume that X is a finite point set, say X = {xi, x2,. . .,xm),(F, y) is a commutative semigroup, Fl and F2 are two given value sets, and F, x F2 -+ F (53) [I:
is a binary operation. Then 0 induces a binary operation Q: F: x F:
-+
as follows. Let a E F:, b E F: and define t t,(Y)
=
(FY)' =a
(54)
8 b E (FY)' by
4x1 '. Wy).
(55)
Thus, the induced image product 8,called the generalized outer image product, combines two images and produces a template. This is in contrast to (2), where the induced operation combines two images of the sume value type and the same spatial domain to produce an image of the same value type and domain as the input images. Note also that finiteness of X is not necessary for defining the template t in Eq. (55). However, in order to combine two images of different spatial domains whose resultant is a scalar-valued (Le., F-valued) image, we employ the global reduce operation r, which requires finiteness of X. Specifically, the generalized image product of a with b induced by c and y is the binary operation @ : F T x F:-+FY, defined by a
b
= r a Q b.
G.X. RITTER
262 It follows that
(58) a @ b = tX,Yt,,Y . . . Y t X r n , where t = a 0 b. Hence, if c = a 0 b, then the pixel value of c at a point y E Y us given by C(Y) = t X , ( Y ) Y ' *
*
(59)
Y tX,(Y).
Operations between images and templates are similar to (54) and (56). A very important operation is obtained when F, is replaced by FF in the above discussion. In this case we define an induced operation
0: F: where for a E F:, and t E (F:)',
x (F?)' -.(FY)',
(60)
the template r = a 0 t is defined by
rx(Y) = O t,(x). The generalized imageltemplate produce of a with t, induced by defined as a@t
E
0
(61) and y, is
Ta 0 t.
(62)
Thus, if b = a 0 t, then a
t
= b = {(y,b(y)):b(y) = Ta
0
t,, y
E Y},
(63)
where the induced image product c, = a t, is defined by c,(x) = a(x) ty(x). Simple substitution shows how (63) is derived from (62). Since r = a 0 t and b = a 0 t, we have 0
b = a I!€ t
= Ta
0 t = rr =
0
r rx = rx,y...yrx,.
xex
Therefore, by (61), b(Y) = rx,(Y)Y..-Yrx,,,(y) = XI) t,(xl)lr...y[a(x,) 0
0
t,(x,)],
(65)
or, equivalently, b(y) = c , ( x , ) ~-. . yc,(X,,,) =
I?
c,(x) = rc, = r a t,.
(66)
XPX
Of course, we could have defined the binary operation
0 :F: x (F:)'
+ FY
(67)
directly by using Eq. (63). Given a template t E (aY)', then the transpose of t, denoted by t', is an element of (FX)' and is defined by t;(x)
=
tX(Y)
(68)
RECENT DEVELOPMENTS IN IMAGE ALGEBRA
263
for each x E X and y E Y. The notion of transpose plays an important role in dual operations. Suppose 0’:
+F
F, x F,
(69)
is a binary operation. Then analogously to (60),0 ’ induces a binary operation
0‘: (F:)’
x F: + (F’)’,
(70)
where r = t 0‘ a E (FY)’is defined pointwise as rx(Y) = t,(Y)
a@).
(71)
+ FY
(72)
The induced template/image operation
6’: (F;)’
x
F:
is then defined by t m ’ a = Tt O ’ a ,
(73)
r t,(y)
(74)
where b = t CD’ a is given by b(y) =
cl’
a(x).
xcx
In contrast to (67), here t E (F:)’. The operations a t and t 6’ a are known as generalized backward and forward image transforms, respectively. An important observation is that for either backward or forward transforms, the input image a is an F,-valued image on the coordinate set X while the output image b is an F-valued image on the coordinate set Y.Thus template operations are capable of transforming images with certain range values defined over a given point set into images with entirely different values defined over point sets of possibly different spaces. If the operations 0 and are dual operations in the sense that 0‘
fi then t,(y)
c’
0
fi = ,f, 0‘
f,
for all f i E F, and f 2 E F,,
(75)
a(x) = a(x) tx(y) = a(x) t;(x). Therefore 0
t @ ’ a = a 6 t’.
If Fl = F2 and has form
0
=
0’)
(76)
i.e., 0 is a commutative operation on F,, then Eq. (74)
0 t’
(77)
t ’ 6 a = a 0 t.
(78)
t Q3 a
=a
or, equivalently,
264
G . X. RITTER
In other words, the forward transform induced by y and 0 can be computed as a backward transform using transposes. This observation is not only of theoretical significance but also important when implementing image algebra in hardware or software. Since templates are special types of images, the induced binary operations 0 and a for templates reflect the corresponding operations for images. The and 0'.In particular, the same observation holds for the dual operations 0' binary operation 0 of Eq. (53) induces a binary operation
0 :(Fy)' x (F;)'
-,[(FW)'Ix,
(79)
which, when applied to a pair s E (FY)' and t E (F:)', results in a template u = s 0 t E [(FW)'Ix.Here the template u is defined as
u,
=
{(y, ux(y)): y E Yf
for each x E X,
(80)
where Here we use the convention [u,], defining u is
= u,(y).
C~,Iy(W)
= SJW)
It follows that the basic equation t y w .
(82)
Note that u is an image (template) whose values are templates. The operation @, resulting in the template r = s Q) t E (FW)', is obtained as before through the global reduce operation r:
r = s a t = Ts 0 t.
(83)
Since
we have that and, therefore,
ry(w) = Cux,ly(w)r'~.rCux,l(w). Hence, by (82),
ry(w) =
r s,(w)
xax
t,(x).
(87)
Pictorially we can views @ t as a functional composition, where t is first applied as an F,-valued template from Y to X, followed by the F,-valued template s from X to W, as shown in Fig. 3.
265
RECENT DEVELOPMENTS IN IMAGE ALGEBRA
FIG.3. Illustration of template composition.
F. Linear and Lattice Transforms Substitution of different value sets and specific binary operations for y and in the definition of generalized image or image/template products results in a wide variety of different (and novel) image transforms. The value sets we are concerned with in this section are the real numbers R and the extended real numbers R,, = R u (cn,-co}. To be more specific, R will denote the ring (R,+,.), where + and denote the usual operations of addition and multiplication, respectively. The set R,, will denote the bounded I-group (R+,, v, A , +, +'). For a definition of bounded I-groups we refer the reader to Birkhoff (1984). Here the operations v and A are the lattice operations of least upper bound and greatest lower bound, respectively, on the (complete) lattice Rkm.The operation + corresponds to regular addition on R and is extended to R,, as follows:
-
r
+00=
00
+ r = 00,
+ -00 = -cm + r = -00, -00 + 0O = 00 + -03 = - 0 , r
(88)
where r E R.The equations in (88) ensure that - 00 acts as a null element in the except system (R*=,v, +). The operation +' is identical to the operation on the set (co,-m} where we define
+
-m+'oo=m+'-m=m.
(89)
This introduces an asymmetry between the corresponding operations + and +' with respect to the set { 00,- 00). Similarly to the set of complex numbers C, the lattice operation v of a bounded I-group allows for the definition of an additive conjugate element as
266
G. X. RITTER
follows. If r E Rkm,then the udditiue conjugate of r is the unique element defined by
[I
ifrER ifr=a . i f r = -a
-a
r*=
I*
(90)
Thus, (r*)* = r. This implies the following relation:
r
A
s = (r* v s*)*.
(91)
Substitution of the ring R for the value sets F,, F,, and F in (62) and replacing 0 by multiplication and y by addition results in the generalized (backward) convolution operator
where a E RX and t E (RX)'. The generalized forwurd convolution operation is obtained from (72) by proper substitution and has the form t8a
=
I
(Y, b(y)): b(y) =
xrx
t,(x). a(x), Y
E
Y},
(93)
where t E (RY)' and a E RX. Several comments are now in order. First, we note that in (92) a is a realvalued image on X,while a @ t is a real-valued image on Y. It follows that generalized convolutions can be used to change dimensionality, size, and geometric shapes of real-valued images. If Fl = R and F2 = R", then the convolution operator 0 transforms realvalued images into vector-valued images. Similarly, substitution of the ring C of complex numbers for F,, F2, and F results in a convolution operator 0 for complex-valued images. Various other value sets such as the integers, the natural numbers, etc., could be substituted. Thus, the generalized convolution operator 0 defined in (92) and (93) need not be restricted to only the real numbers. Since ty(x) = 0 whenever x 4 S(t,), we have that
c
xrx
a(x) * t,(x) =
c
4 x 1 * t,(x),
(94)
XES(tJ
.
where we use the convention CxES(ty) a(x) t,(x) = 0 whenever S(t,) = 0.For computational purposes, Eq. (94) is of prime importance. According to (94), the new pixel value of b(y) depends only on the values of a(x) and t,(x) for which x E S(t,). Thus, the smaller the cadinality of S(t,), the smaller the
267
RECENT DEVELOPMENTS IN IMAGE ALGEBRA
number of multiplications and additions that need to be performed to compute the value b(y). Furthermore, the topology of the support becomes particularly important when one considers mapping image transforms to particular types of parallel computer architectures. whenever (F,y) is a There is an easy generalization of the operation monoid (a semigroup with an identity) with identity, say, 0. Suppose that a E FY, t E (FY)y, and X and W are subsets of the same topological space. Then (67) generalizes to
a
a
a.
The forward transform t a is defined where b(y) = 0 whenever X n W = in a similar fashion. In this general setting, the convolution operator 0 becomes (y, b(y)): b(y) =
c
XsXnSlt,)
.
a(x) t,(x), y E Y}
(96)
since
Definition (95) and its implication (96) prove useful when expressing algorithms in image algebra code (Ritter et al., 1990). The operation (83) when applied to real (or complex)-valued templates results in the template r = s 0 t, where r is defined by
ry(w) =
C t,(x)
.sx(w),
w
E
W.
(98)
XEX
Equation (98)is simply (87) with multiplication and addition substituted for and r, respectively. Before discussing substitution of the value set R,, for F,, F2, and F, we provide three simple application examples of the convolution operator 0. r,
Example 7 (Local aoeraying): Let a be an image on a rectangular array X c Z2. Let Y = Z2 and t E be the 3 x 3 neighborhood template defined as follows:
(aY)'
268
G . X. RITTER
Then the image b obtained from the code b:= &(a 0 t) represents the image obtained from a by local averaging since the new pixel We again note the brevity of image value is given by b(y) = $CxES(ty)vXa(x). algebra code.
As an important remark, we note that the image a 0 t is an image on all of Z2 with zero values outside of the array X c Z2.Obviously, computers are not capable of storing images defined on infinite arrays. Furthermore, in practice one is only interested in the image &a 0 t) restricted to the array X, that is $(a Q t)Ix, where Ix denotes the restriction to X. This problem could be solved as follows: Let s E (RX)" be defined by s, = (ty)lXfor each y E X,where t is the template defined in Example 1. Then $a Q s provides the desired finite image, since (a 0 t)lx = a 0 s. Thus, the question arises: "Why not simply define t as a template from X to X instead from Zz to Z2?" The rationale for defining the template as we did is that this template can be used for smoothing any two-dimensional image independent of its array size X. The reason for this is that when defining an image b in a program, one is usually forced to declare its dimensions, i.e., the size of its underlying array X. In particular, an image algebra program statement of form b:= a @ t means to replace b pointwise by a @ t so that the value of b at location y is the value of a 0 t at location y. That is, the array on the left side of the program statement induces a restriction on the right-side image array. In short, we make the convention that the image algebra equation b = (a 0 t)lx, where X is the domain of b, corresponds to the image algebra program statement b:= a 0 t. Thus, a programmer is not faced with the task of redefining t for a different-sized image, as would be the case if he or she had defined t E (Rx)x for a given X.In fact, this is the way we have embedded image algebra into a variety of languages including FORTRAN and Common Lisp (IVS Inc., 1988; Wilson et al., 1988). Examples of image algebra FORTRAN (IAF) can be found in Ritter et ad. (1990). In Section 1I.A we pointed out that Eq. (16) can also be obtained using image/template operations. As an example, we express image magnification (Example 5 ) in terms of a generalized convolution. Example 8 (Image magnijcation): Suppose X c R2 is an m x n array, Y = Z2, P = fp: p = (xo,k), where xo E X,k a positive integer), and a is an image
on X.Given a pair of real numbers r = ( r l ,r,), define [ r ] = ( [ r l ] ,[ r , ] ) , where [ r i ] denotes truncation of ri to the nearest integer. For each y E Y and p = (x0,k), define t(p),(x) = 1 if x = [(y - x,)/k x,], and t(p),(x) = 0 otherwise. Then b = a @ t(x,, k ) represents the magnification of a by the factor k
+
RECENT DEVELOPMENTS IN IMAGE ALGEBRA
269
about the point x, . Thus, once this parameterized template has been defined, all a potential user of this template needs to supply is the magnification factor k, the point about which to magnify the image, and-in order to retain all the information-declare b to be of at least dimension krn x kn. This example also shows how a template transformation is capable of changing the size of an image. As our third example of the operation 0 ,we present the convolution of two
templates. Example 9 (Template composition under 0): Let s and t be as in Figure 2. Then the template s 0 t is given by
s
@
t
=
-1
R
[
.
-2-1
Template composition and decomposition are the primary reason for introducing operations between generalized templates. Composition and decomposition of templates provides a tool for algorithm optimization. For instance, if s and t are as in the example above, and c = s 0 t, then computation of a 0 r = a 0 (s @ t) by (a 0 s) 0 t uses six local multiplications instead of nine. In general, if r is an n x n template, and s and t are decompositions of r into 1 x n and n x 1 templates, respectively, then the computation of a 0 r by (a 0 s) 0 t use 2n multiplications instead of n2. General methods for template decomposition and applications of decompositions to algorithm optimization can be found in Ritter and Gader (1987); Davidson (1989a); and Gader and Dunn (1989). Our next goal is to replace the operations 0 and y by the appropriate lattice operations of the value set R*&. We first extend the notion of the conjugate defined in (91) to R,,-valued images and templates. For a E R f a and t E (RFm)y, the conjugate of a is the image a* E R t m defined by a*(x) = (a(x))*, while the conjugate of t is the template t* E (R:,)’ defined = GO and by t,*(y) = (t,(x))*. Observe that t,*(y) = -tk(x), where -(-a) -(m) = -a.Also, since R,, is a lattice, for images a, b E R;(x we have a
A
b = (a*
v
b*)*.
(99)
Let R-, = R u { -m}. There is a noteworthy similarity between the sublattice (R-,, v, +)of R,, and the ring(R, +, If we view the operation v as “addition”and as “mu1tiplication”in R - *, then - cc acts as the “additive
+
q).
270
G. X. RITTER
identity” and 0 as the “multiplicative identity.” The comparative behavior is as follows: 0 + r = r + 0 = r in R-, compares to I r = r 1 = r in R,
-
(-co) v r
=r v
(-a) = r i n R-, compares to 0 + r
-
=r
+ 0 = r in R,
and
(-a) +Y =r
+ (-co) = -co
in R-, compares to 0 . r = r . 0 = Oin R.
If one views the ringlike structure of (R-,, v,+) as the actual ring (R, +,.), then it becomes natural to replace real addition and multiplication in (92) and (93) by the operations v and +, respectively. More precisely, substituting + for 0 and v for y in (63),as well as R,, for F,, F2, and F, results in the lattice convolution
The operation a VIJ t is also known as the (backward) additive maximum transform. The forward additive maximum transform is defined as
where t E (R;,)’. We use the symbol 0 instead of 0 in order to distinguish this operation from the lattice convolution of multiplicative maximum, which is denoted by 8 and described in Ritter et al. (1990). Recall that the support of a real-valued function is defined in terms of the additive identity of the group (R,+). Analogously, the support of an extended real-valued function is the “additive” identity - co of (It+,, v). Specifically,we define the (negative) infinite support S_,(t,) to be the set = (x E x : tY(X)#
S-,(t,)
( 102)
-00).
As before, since
V(
xsx
W
+ t,(x))
=
V
x ES-m(ty)
(a(x)
+ t,(x)),
(103)
we can restate (100) in terms of the support as a
t
=
{
(y. b(y)): b(y) =
v
(a(x)
x E s-m(ty)
+ t,(x)),
y E Y},
(104)
a.
where VxsS-,(ty,(a(x) + t,(x)) = --03 whenever S-,(t,) = Because of the duality inherent in the structure of R*,, the operation induces a dual operation El, called the additive minimum, which is given by a
t --= (t*
IVJ a*)*.
(105)
27 1
RECENT DEVELOPMENTS I N IMAGE ALGEBRA
Equivalently, we have
or a El t
=
i
A
(Y>b(y)):b(Y) =
(a@)+ t,(x))
X€S,(tYl
I
9
(107)
{x E X: t,(x) # a', is called the (positiue)injinite support and = rn whenever Stx(t,) = 0. The forward additive minimum is defined as
where S,(t,)
=
/\xcS,(t,i(a(x)
+ t,(x))
t LJi a
= (a*
t*)*.
(108)
In the definition of 0 and m, we have assumed that the point set X is finite. However, for many commonly used value sets and specific operations 7 and < I , the above definitions have natural extensions to infinite point sets. For instance, the definition of 0 and pJ extend to continuous functions a and t, on compact sets X c R",with the exceptions that in the global reduce operation in (100) stands the sup of the function a + t,, and the sum in (92) gets replaced by an integral, namely
v
b(y) =
-
J
a(x) t,(x)dx.
(109)
X
Thus, image algebra can be used to model both discrete and continuous image transformations. In order to illustrate the use of the operators p~ and m, we present a typical application example. Example 10 (Weighted medial axis trmsform (Blum, 1967): The weighted medial axis transform skeletonizes a Boolean image by shrinking the "black" regions of the image to thin sticklike figures called the medial axis. The values (weights) associated with the medial axis pixels allow for reconstruction of the Boolean image. Let a denote the Boolean input image and t the following template:
272
G. X. RITTER
The image algebra version of the weighted medial axis transform is then given by i:= 0
mo:= a DO UNTIL bi = 0 := ai t * Xo[(ai+l i d t) El t l bi+l:= i:=i+ 1 i-
b:=
1
'2kb,
k= 1
w
Here, b denotes the transformed image.
The induced binary operation iLJ between extended real-valued templates is derived from (87) by proper substitution of the operations + and v, and the the substitution value set R*m. To be precise, for s E (RY,)x and t E (R:,)', of for 0 and v for y yields the template r = s t E (RY,)', where
+
The notion of the support of a template provides for a more efficient method of computing the templates s @ t and s t. Let S(w) = {x E X: x E S(t,) and w E S(s,)}, and S-,(w) = {x E X:x E S-,(t,) and w E SLJs,)}. Then since t,(x) s,(w) = 0 whenever x 4 S(w) and t,(x) + s,(w) = - 00 whenever x # S-,(w), Eqs. (98) and (110) are equivalent to
and
t,(x) s,(w) = 0 and VxES-,Q'(W) t,(x) + respectively. Here we define CwES(W) sx(w) = -a whenever S(w) = and S-,(w) = 0, respectively. It follows from these definitions that S(r,) = {w E W: S(w) # 0 1, and S-,(r,) = {w E W: S-,(w) # a}.Example 9 and our next example should clarify some of these concepts. 8
Example 1 I (Template composition under ): Let s and t be as in Fig. 2. Then the translation invariant template s iLJ t is defined pictorially by
RECENT DEVELOPMENTS IN IMAGE ALGEBRA
273
In their respective algebras, a 0 t is a linear transform and a El t a lattice transform (Ritter and Gader, 1987; Gader, 1986; Davidson, 1989a). The following properties of template operations follow from the isomorphisms discussed in Section 111:
+ b) 0 t = (a 0 t) + (b 0 t), a 0 (s + t) = (a 0 s) + (a 0 t), r 0 (s + t) = (c 0 s) + (r 0 t),
(a
a 0 (s 0 t) = (a
0 s) 0 t,
(1 13) (114) (115) ( 1 16)
and
El t),
(1 17)
a 0 (s v t) = (a El s) v (a El t),
(118)
s) v (r El t),
( 1 19)
(a v b)
t = (a PI t) v (b
r El (s v t) = (r
@I
a @I (s El t) = (a @I s) @I t.
( 120)
Here a and b are images, and r. s, and t are templates. The above properties are important not only as mathematical theorems but also as a tool for algorithm optimization. 111. A MEDLEYOF CONSEQUENCES
Image algebra as outlined in the previous section is an extremely rich mathematical structure. The implications of this structure, variety of consequences, and connections to various well-known mathematical structures far exceed the page limitation of this chapter. The various subalgebras of image algebra encompass such structures as linear algebra (Gader, 1986), polynomial algebra (Gader, 1986; Ritter and Gader, 1987), the mathematics of artificial neural networks (Davidson and Ritter, 1990; Ritter er al., 1989), mathematical morphology (Ritter et al., 1987a; Davidson, 1989a), and the minimax algebra of economics and operations research (Cuninghame-Green,
274
G . X. RITTER
1979; Davidson, 1989a, 1989b). In order to provide the reader with a broad vista of various consequences, we will focus on several specific structures and provide some peephole examples. A. Examples of Intermediate and Higher-Level Transforms
Most low-level image processing techniques consist of image-to-image transformations where both input and output are usually images of the same size. These techniques include such operations as local averaging, image sharpening, high- and low-pass filtering, edge detection, and thresholding. Translation of these techniques into the language or mathematical setting of image algebra is usually a straight forward affair-e.g., Examples 1 and 7, and the examples in Ritter et a/. (1990). Intermediate and higher-level techniques often involve the transformation of images to the numeric or symbolic domain and employ tools from such diverse areas as topology, probability and statistics, graph theory, differential geometry, and knowledge representation. Translation of intermediate and high-level image operations, if described in terms of a sound mathematical basis and not in an ad-hoc fashion, generally does not pose a great problem. In fact, as in the case of low-level transforms and techniques, it has been our experience that high-level techniques coded in image algebra have always resulted in translucency and significant code reduction. In this section we present three examples ranging from intermediate to high-level operations. Example I2 (Euler number (Pratt, 1978)): The Euler number of a Boolean image is a topological invariant, and is defined to be the number of connected components minus the number of holes inside the connected components. For the four-connected topology, the Euler number of a Boolean image a, E(a), is and for the eight-connected case, where b = a 0 t and
The expression of the Euler number in the image algebra follows from the Euler number formulas given in Pratt (1978). These latter are expressed in
RECENT DEVELOPMENTS IN IMAGE ALGEBRA
275
terms of number of quadbits. For example, the number of occurrences of the quadbit
F l
corresponds to counting how many times the number 7 occurs in a 0 t. Pratt’s formula is a weighted sum of the number of occurrences in the Boolean image of 15 special quadbit patterns. Note that only one convolution is necessary in either (121)or (122), namely a 0 t. Thus, the two formulas are not as involved as they may initially appear. Furthermore, we can improve the computational efficiency by defining the following lookup tables. Let f: R + R and y: R -,R be defined by
i
1
f ( r ) = -1 2
ifr=lorr=3 1 i f r = 5 o r r = 7 and g(r) = -1 ifr=2orr=6 -2
ifr=lorr=3 if r = 5 o r r = 7. ifr=2orr=6
Then (121) and (122) have the form
respectively. With the exception of the image histogram, all examples presented thus far have been pixel level or local neighborhood types of operations. These operations, as well as the image histogram, lend themselves well to parallel implementation on fine-grained architectures such as the Connection Machine. However, as the next example will show, image algebra is also quite capable of expressing a sequential type of operations. Example 13 (Octagonal chain code extraction (Gonzalez and Wintz, 1977): The chain code is a well-known feature extraction algorithm. We present a method that locates the boundary of the “black part” of a Boolean image while simultaneously labeling the chain code directions on the boundary. The “black part” consists of all pixels having value 1. The algorithm we discuss is for black objects that are digital two-manifolds. We remark that the more complicated case of an arbitrary black object is solved in a similar way to the method given here. Let N(x,) denote the eight neighbors of xo with xo not included. Define an elementary triangle to be a set of three distinct points {xo, x 1 , x 2 } c X such that llxi - xjll I i, j = 0, 1, 2. Any pair ( x i , x j ) , i # j , of the elementary triangle, is called an edge of the elementary triangle. We define a
a,
276
G. X. RITTER
digital two-manifold as a collection of black pixels M c X satisfying the following properties: 1. M is four-connected. 2. If p , q E M and IIp - 411 I then ( p , q ) is an edge of some elementary triangle A = { p , q, r } where A c M . 3. For all xo E M , N ( x , ) n M has at least two points. 4. For all p , q E N ( x , ) n M , there exists an eight-path from p to q in N ( x o )n M (recall xo 4 N ( x , ) ) .
a,
This definition corresponds to the topological notion of a triangulation of a two-manifold. Conditions 2 and 4, respectively, prevent “feelers” and “pinched points” from occurring in the set M . See Figure 4. The first step in the algorithm is to identify the boundary points and label them with correct directions with respect to the usual chain code convention. Initially, we use the direction convention as given in Fig. 5a, and proceed in a clockwise direction around the boundary, using the conventional image coordinate system as depicted in Fig. 5b. The labeling is done by using a census template, a template whose nonzero weights are each assigned a unique power of a prime. For example, we used the following census template:
ty =
WI 64
128
256
For a Boolean image a, performing a 0 t assigns a unique value to a pixel y depending on the distribution of black pixels in the configuration of t,. As an example, suppose we have a Boolean image as in Fig. 6a. For the pixel y circled in Fig. 6a, the 3 x 3 neighborhood has the distribution of black and white pixel values shown in Fig. 6b. In a 0 t this pixel will be assigned the census value 63. Here we eventually label y with the value 1, corresponding to the direction to the next pixel.
(4
(b)
FIG.4. (a) “Feeler.” (b)“Pinched point” xo.
RECENT DEVELOPMENTS IN IMAGE ALGEBRA 8
7
2
277
6
4
f
7
-
y
X
(4
(b)
FIG.5 . (a) Initial chain code directions. (b) Image coordinate system,
0 0 0 0 0
0 0 0 0 0
0 1 1 1 1
0 1 1 1 1
0 1 1 1 1
0 1 1 1 1
0 1 1 1 1
0 1 1 1 1
0 0 0 0 0
0 0 0 0 0
0 0 1 1 1 1 1 1 0 0 0 0 0 1 1 1 1 1 0 0
1 1 1 1 0 1 0 0 0
0 0 0 1 1 Q 1 1 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 (4
(b) FIG.6. (a) Input image. (b) Pixel y (circled) and 3 x 3 neighborhood.
By investigating all possible distributions of black and white pixels in the 3 x 3 window, the correspondence between distributions and the eight chain code directions has been established. A pixel with census value in the set S1 = {41,43,45,47,49,S7,59,61,63,299,303,315,319)
will be labeled with direction 1. A few possible distributions of black pixels with census values in S, are given in Fig. 7.
1 1 1 1 0 1 0 0 0
1 1 1 1 0 1 0 0 1
63
319
0 1 0
1 0 0
303
FIG.7. Distributions having direction
1.
1 1 1
G.X. RITTER
278
Similarly, for the remaining directions 2,. . , , 8 , we list the corresponding census values in the sets S,, ...,S,, respectively, below: S2 = {97,105,107,109,111,113,121,123,125,127,363,367,379,383},
S3 = {161,169,173,177,185,189,193,201,225,233,237,241,249,253}, S, = {385,393,417,425,429,433,441,445,449,481,489,493,497,505,509},
S,
=
{ 131,163,179,195,227,243,259,387,419,435,451,483,499},
S, = (7,135,167,183,199,231,247,263,391,439,455,487,503}, S, = { 11,13,15,139,143,203,207,267,271,395,399,459,463},
S , = {25,27,29,31,155,159,219,223,283,287,411,415,475,479}.
We define s: {O,. . . , 5 1 l } -, (0, l,.. . , 8 } by if i E Sj otherwise'
s(i) =
Thus, the image b defined by
b
= s(a
0 t)
has values in the set {0,1,. . .,8}, and nonzero values correspond to the chain code directions as given in Fig. 5a. In actual implementation on the image algebra FORTRAN preprocessor, the function s is evaluated via a lookup table. See Fig. 8 for an example. Following standard convention, we label the chain code array with numbers from 0 to 7 instead of 1 to 8. This is depicted in Fig. 9. The chain code of an image is a 1 x n array, where n equals the number of points in the chain code. We define a direction array or direction image as follows to extract the chain code directions from the image b. d is an image on X = (0, 1, ...,7}, where
d(0) = (0, - 11,
d(4) = (0,119
d(1) = (1, - l),
d(5)
d(2) = (LO),
d(6) = ( - LO),
d(3) = (1,1),
d(7) = ( - 1, - 1).
= (-
1, l),
For the value i, d ( i ) represents the position of the pixel in direction i relative to the present position. The entire code algorithm is stated below, where a is the Boolean input image and c is the one-dimensional output array of chain code values 0,. . .)7.
RECENT DEVELOPMENTS IN IMAGE ALGEBRA
81
( b) FIG.8. (a) Boolean input image a. (b) Image b = s(a Q t).
7
6
5
FIG.9. Final chain code directions.
279
280
G . X. RITTER
b:= s(a 0 t)
(i) (ii)
xo := choice(domain(b1)21))
(iii)
~ ( 1 := ) b(x0) - 1
(iv)
for i:= 2 t o n d o xo
+
i-1 j= 1
1
d(c(j)) - 1
Line (i) assigns the initial chain code directions to each boundary point. Line (ii) gives the length of the chain code. Line (iii) picks an arbitrary point in the chain code at which to start. In line (iv), the final chain code direction for c( 1) is assigned. Line (v) extracts the value for each c(i), i = 2,. . . ,n, from the image b. The expression xo + : C d(c(j))is vector addition of the coordinate xo and the vectors d(c(j)),j = 1 , . ..,i - 1 . Running this algorithm, we obtained xo = (20,15)and the resulting chain code 5 5 5 522223444444544534434444445666633 3 344444
4444441 11011012221212117770101122222222222 2222122122121121 1 12076777667666676666656666 66566677771 1 1 177777676766777774444444444444. Edges of roads or airport runways are often detected with the use of directional edge masks (Ritter et al., 1986,1990).The use of such masks results in edges with the property that edges on one side of a road or runway are parallel to those on the other side but have opposite (1800)direction. In order to detect and describe roads or runways, edges having the same direction, with the property that the end point of one edge is spatially close to the initial point of the other, are linked into directed straight-line segments. Two parallel line segments having opposite direction from each other are called antiparallel lines. Pairs of antiparallel line segments are examined as possible boundaries for a given road or runway (Ritter et al., 1986;Nevatia and Babu, 1980).Thus, for a given directed line segment we need to find all lines antiparallel to it. A method for doing this is described in the next example. Example 14 (Partitioning a set of directed lines into sets of antiparallel lines): Directed line segments are commonly stored as pairs of points, namely an initial point x and an end point z, which we shall denote by e(x). In the subsequent discussion we assume that e = {(x,e(x)): e(x) is an end point for x} is a given set of directed line segments obtained by some pro-
RECENT DEVELOPMENTS IN IMAGE ALGEBRA
28 1
cedure from a real-valued image on a rectangular array X = { ( i , ] ) : 1 I i I rn, 1I j _< n} c Z2. Let Y = 2’x 2’ (=Z“), F = 2’, and Fl = Z2. We note that e can be viewed as a subset of
Y
=
{(x,z): x = (X,,X,)
E
z2,z = (z,,z,)
E
ZZ),
and that domain(e) c X. Now, if Fl x Fl + F is a binary operation and F the semigroup (F, y ) = (2’, u), then-according to (72)-we have the induced forward operator 0:
0 :(F:)’ x FY -+ FY, where b = t 8 a is defined by
b(Y) =
u
(123)
tX(Y)
X€X
for each y E Y. For our purposes we specify the binary operation 0 as
Here 0 denotes the origin 0 = (0,O)E 2’. Note that the image b is a set valued image on Y. Now let t(e) E (F:)’ be a parametrized template defined by
I
x
t(e),(x’, z’) =
if (x’,z’) E e and there exist (x, z) E e, with (x’,z’) antiparallel (x,z) otherwise
0
Extend e to a = (x’,2 ’ ) E Y
Then, according to (124), we have for each y =
b(x’,z’) =
u
xex
=
t(e),(x’,z’) o a(x)
u
x cdomain(e)
t(e),(x’,z‘) e(x). 0
(125)
The last equality in (125) follows from the facts that t(e),(x’,z’) 6 a(x) = 0 whenever x 4 domain(e) and that e(x) = aldomain(e). Therefore, b(x’,z‘) = ((x, e(x)):(x, e(x))is antiparallel (x’, z‘)},
(x’,2 ‘ ) E e. (126)
By using the generalized version of @ (Eq. (95)), we can eliminate the and compute b directly as b:= t(e) 0 e. definition of extension a =
282
G. X. RITTER
B. Generalized Matrix Products In this section we introduce the concept of a generalized matrix product. This concept provides a method for expressing image algebra operations as matrix operations. The definition of the generalized matrix product is similar to that of generalized image and template products and includes the matrix product of linear algebra and minimax algebra as special cases. For a given value set F, let F, denote the set of all m x n matrices with values in F. We make the notational convention of setting F" = Fl ,",and view F" as the set of all row vectors of length n with entries from F. The set of where (F")' denotes column vectors of length m is then the set (F")' = F, the set of all transposes of row vectors from F". Suppose that G and H are two value sets, (F,y) a semigroup, and G x H -,F a binary operation. Let A = (aij)E G, , and B = (b,) E H, Then the generalized matrix product C of A and B, induced by and y, is denoted by 0:
,
0
C=AOB and defined by
Thus,
is a binary operation
a : G m x p x Hpxn+Fmx,(128) Although we use the same symbol a for both generalized matrix product and image/template products, it should be clear from the context as to which one is applied. There are pertinent reasons for using the same symbol for both operations that will become evident in the subsequent discussion. If A E F" and B E F", then the generalized outer product of the two vectors is defined by AQB
=A
6 B',
(129) where B' is the transpose of B so that B' E (F")'. It follows from (128) that AOBEF,,,. The semigroup (F, y) also induces its own matrix operation on the set of matrices with entries from F. Using the same notation y for the induced operation, we define for A = (aij)E F,,, and B = (bij)E F,,, the product AYB by ~ i= j aijybij. ( 130) For the special case where G = F and (ayb) c = (a 0 c)y(b0 c) 0
(131)
RECENT DEVELOPMENTS I N IMAGE ALGEBRA
283
for Q, b E F and c E H, we obtain
(Am
a c = (A 6 C)y@ a C),
(132)
where A, B E F m X and p C E HpXn. It becomes obvious from these observations that a general matrix algebra can be developed using induced operations. If F = G = H, then the properties of the induced algebra will reflect many of the properties of the algebraic system (F,y,0). Our principal examples are obtained from the substitution of the value sets (R,+,.) and (R,,, v, A , +, +') for F. In the first case we obtain real matrix algebra, which reflects the ring structure of R,while in the latter we obtain minimax algebra, which reflects the lattice structure of R+m.For example, using the ring (R,+, .), Eq. (1 32) becomes the well-known fact of the distributivity of matrix multiplication over matrix addition: (A
+ B) x C = (A x C)+ (B x C).
Similarly, substituting R,,
(1 33)
and defining A E l B by use of (127) as
we obtain (A v B) El C = (A El C) v (B IXI C).
(135)
We have already noted the similarity between generalized image/template products and generalized matrix products. We now investigate this similarity more closely. Suppose that X is finite, X = {xI,x2,., .,x,). It is not difficult to show that the map v: FX-+ F" defined by v(a) = (a(x,),
~(xZ),...,
is one-to-one and onto. Furthermore, if v(a b) 0
=
(1
a(xm))
(136)
is a binary operation on F, then
v(a) CJ v(b),
(137)
where the product a b is the induced image product (2), and v(a) 0 v(b) is the induced matrix (or vector) ptoduct (130) with replacing y. Since v is a one-to-one correspondence, Eq. (1 36) implies that v: (Fx,0) -,(F", 0 ) is an isomorphism. In case F = R, 0
0
v: (Rx,+)
-+
(R" +)
is a vector space isomorphism. Therefore, addition of real-valued images is equivalent to addition of points in R". If Y is also a finite point set with Y = ( y l , y 2 , ...,yn}, then we define
V:(FY)' -,F,,,
284
G . X. RITTER
by Y(t) = c, = ( C i j ) ,
where cij = txJ(yi).Note that the j t h column of C,is simply (v(txJ))'.Again, it is not difficult to show that Y is a one-to-one correspondence. Furthermore, 'P preserves the induced operations and, hence, is an algebraic isomorphism. The isomorphisms v and Y establish the desired connection between image algebra and generalized matrix algebra. Specifically, the link between (55) and (129) is given by "(a
0 b) = v(a) 0 v(b).
(139)
Here we are a little imprecise by using the same symbol v to denote the two maps FY -+ FY and F: -+ F;. Equation (139)states that calculating v(a) 0 v(b) is algebraically the same as calculating the outer image product a 0 b, and conversely. The generalized image/template products (62) and (73) can now be rewritten as v(a 0 t) = v(a)
W)
( 140)
and v(t
0' a) = "r(t) 6' (v(a))'l',
(141)
respectively. Note that in (140) Y(t) E F,,,, while in (141) Y(t) E F,,,. The template/template product (83) has the matrix algebra form Y (s 0 t) = Y (s) @ Y (t).
( 142)
Since Y is an isomorphism, Y'(syt)= V(s)yY(t),
(143)
where syt is the pixel operation (37) with y replacing 0, while the operation y on the right-hand side of the equation is defined by (130).Thus, by (1 32),(142), and (144), we have
W) = "W) 6 ~ ( r ) l y " W )@ YWI. (144) As a direct consequence we have that if (F, ,, y , is a ring, then ((FX)',y, 6) Y"(syt) 0
rl
= W s y t ) GI
0)
is a ring isomorphic to it. In particular, ((R")', +, x ) is isomorphic to the ring of rn x m square matrices (R,,,, +,.). Similarly, the lattice ((R&)", v, A , KJ) is isomorphic to the minimax algebra ((Rtm),,,,,,,, v, A , @, In view of these observations, the proofs of Eqs. (1 13)-(120) are now trivial. One very powerful implication of the observations made in this section is that all the tools of linear algebra and lattice theory are directly applicable to solving problems in image processing whenever image algebra operations
a,
a).
RECENT DEVELOPMENTS IN IMAGE ALGEBRA
285
such as 0 and P J are employed. In the next section we look at one particular application domain.
C . Template Decomposition Both linear convolution and lattice transforms are widely used in image processing. One of the common characteristics among them is that they both require applying a template to a given image, pixel by pixel, to yield a new image. In the case of convolution, the template is usually called a convolution window or mask, while in mathematical morphology, it is referred to as a structuring element. Templates can vary from one pixel to another in their weights, sizes, and shapes, depending on the specific applications. Intuitively, the problem of template decomposition is that given a template t, find a sequence of smaller templates t,, . . . ,t, such that applying t to an image is equivalent to applying the ti values sequentially to the image. In other words, t can be algebraically expressed in terms of t . . ,t,. One of the reasons for template decomposition is that some of the current image processors can only handle very small templates at a time. For example, ERIM’s Cytocomputer (Sternberg, 1983) cannot deal with templates of size larger than 3 x 3 on each pipeline stage. Thus, a large template has to be decomposed into a sequence of 3 x 3 or smaller templates before it can be applied to an image on the Cytocomputer. A more important motivation for template decomposition is to speed up template operations. For a large convolution mask, the computation cost resulting from implementation can be prohibitive. However, in many instances, this cost can be significantly reduced by decomposing the masks or templates into a sequence of smaller templates. For instance, the linear convolution of an image with an n x n template requires n 2 multiplications and n 2 - 1 additions to compute a new image pixel value, while the same convolution computed with a 1 x n row template followed by an n x 1 column template takes only 2n multiplications and 2(n - 1) additions for each new image pixel value. This cost saving may still hold for parallel architectures such as mesh connected array processors (Lee and Aggarwal, 1987), where the cost of a convolution is proportional to the size of the template. The problem of template decomposition has been investigated by several researchers. Ritter and Gader (1987) presented some very efficient methods for decomposing DFT and general linear convolution templates using image algebra. Wiejak et al. (1985) proposed a method to decompose a 2-D (or higher-dimension) Marr- Hildreth convolution operator into two 1 -D convolution operators. Zhuang and Haralick (1986) gave an algorithm based on a tree search that can find an optimal two-point set decomposition of a
286
G . X. RITTER
morphological structuring element if such a decomposition exists. In this section, the issues of template decomposition are discussed in the context of image algebra. The properties listed in Eqs. (1 13)-(120) turn out to be extremely useful because they help in exploring the possibilities of computing template operaticw in different ways. As far as the efficiency is concerned, the goal of template decomposition is to find the most efficient way to implement a template operation with a given template. For instance, if we know that r = s @ t, then, by the associative law, we could apply s and t sequentially to a instead of computing a 0 r directly, since s and t are much smaller than r in general. We begin our investigation with the following definitions. Definition. A @-decomposition of a template t E (R')' templates t l r . . . , t nE (R')' such that t = t, 0 ... @ t,.
is a sequence of
Definition. A a-decomposition of a template t E (R!,)' templates t,, ..., t, E (RXJ'such that t = t, ... t,.
is a sequence of
By using both associative and distributive laws, a weak version of template decomposition can be defined also. Definition. A weak @-decomposition of a template t E (RX)' is a sequence of .,tk,, E (R')' such that templates t , , . . . ,tklr... ,tkn- + t = ( t l @ " ' @ t k l ) + ( t k , + l@ " ' O f k l ) + " . + ( t k n . l + ,
@"'@tk,).
For example, suppose a E RX and t E ( R') has a weak decomposition t = t, @ t, + t, @ t,. Then we can compute a @ t as follows: a 0t
= ((a
0 t,) 0 tz) + ((a 0 t3) 0 t4).
In general, the decomposition of a template t, if possible, is preferred to its weak decomposition because usually more time and space would be involved in computing and applying a weak decomposition. With the concept of template decomposition defined, we show next how to decompose some commonly used templates. In the next definition, suppose that F = R, C,or Rkm. Definition. An invariant template t E (FX)' with finite support is called a rectangular template if its configuration S(t,) (or S-Jt,)) is of rectangular shape at each target pixel y.
287
RECENT DEVELOPMENTS IN IMAGE ALGEBRA
Rectangular templates are the simplest and most commonly used templates in image processing. Ideally, to speed up template operations, we would like to decompose an arbitrary rectangular template into two onedimensional templates, namely a row template and a column template. Thus, for a given m x n rectangular template, the number of arithmetic operations required for each template operation on each pixel can be reduced from mn to m n.
+
Definition. A rectangular template is called separable if it can be expressed as the composition of a row template and a column template. Let r E (FW)' be an rn x n rectangular template with weights w i j , i = 1,. . . ,rn; j = 1,. . . ,n. We say that r is @-separable if there exist a 1 x n row template s E (Fw)xand an rn x 1 column template t E (Fx)y such that r = s 0 t; -separable if c = s t. Obviously not all rectangular templates are separable. Therefore, it is natural to ask under what conditions a template is separable. Here, we give a sufficient and necessary condition for the separability of rectangular templates.
Proposition 1. (D. Li) Let r E (R!,)' be an m x n rectangular template with weights rij,where i = 1,. . . ,rn; j = 1,. . . ,n. Then r is -separable if and only if for all 1 I i I m and 1 I j I n, rij - r I j = ril - r l , .
(144)
Proof': Suppose that r is Fjj -separable and r = s El t, where s is a 1 x n row template with weights s l , ., . ,s, and t is an rn x 1 column template with weights t,, ...,t,. By the definition of template composition, rij = ti s j , for i = 1, ...,r n a n d j = 1, ..., n.Then
+
rij - r I j = (ti + sj) - ( t l =
fi
-
= (ti
+ sj)
t,
+
S J - (tl
+ sl)
= ril - r I 1 .
Now assume that Eq. (144) holds. Define a 1 x n template s and an rn x 1 template t as follows: (145)
sj = r , j , t I. = r r.I
-
r 11.
(146)
288
G . X. RITTER
Thenforj= 1,..., n a n d i = l , ...,m, rij = (ril - rll) = ti
+
+ rlj
sj.
Thus r = s J W t and r is m-separable as desired.
Q.E.D.
a-
This yields a straightforward method for testing and decomposing separable templates. Given an m x n rectangular template r, it takes (m - l)n additions and (m - l)(n - 1) comparisons, according to Eq. (144), to see whether r is separable or not. If it is separable, then one can easily construct the corresponding row and column templates by Eqs. (145) and (146), as given above.
Proposition 2. Let r E (Rx)y be an m x n rectangular template with weights rij, where i = 1,. . . , m ;j = l , , . ,n. Then r is @-separable if and only if for all 1 Ii I mand 1 I j In, rij/rlj
(147)
= ril/rlI*
The proof is similar to that of Proposition 1. Notice that the condition (147) is equivalent to saying that the rank of r is 1. Sometimes, especially when defining large templates, it is rather convenient to define a rectangular template of size m x n by a function W(x,y) of two variables over an m x n grid. We call such function a weight function. On the other hand, any real-valued function of two variables defined on a finite rectangular grid defines a real-valued rectangular template, whose configuration is the domain of the function. Thus the separability of a rectangular template can be reduced to the separability of its weight function. The following results are obvious. Proposition 3. Let t be a real-valued rectangular template of size rn x n and W,(x,y) its weight function, where x = 1,. . . ,m and y = 1,. , . ,n. Then t is @-separable iff K ( x , y ) = f ( x ) g ( y ) for some real-valued functions f and g. The template t is &separable iff w ( x ,y ) = f ( x ) g ( y ) for some real-valued functions f and g.
-
Example 15: We define a (2m function
+
+ 1) x (2n + 1) paraboloid template t by the
W(X,Y) = 4 x 2
+ y2),
where k is a constant, - rn I x I m, and - n 5 y I n. Then t is D-separable.
RECENT DEVELOPMENTS IN IMAGE ALGEBRA
289
Example 16: An n x n Marr-Hildreth template t is defined by
where - n l x l n , a n d - n l y < n . Clearly t is not separable but it still has a very efficient weak 0decomposition, which is the sum of two separable templates, as follows: t where ti is defined by fi for i
=tl
0 t,
+ t,
0 t,,
= 1,. . . ,4.
As another example of template decomposition, we consider the class of symmetric convex tempIates. Symmetric convex templates are often used in morphological filters. In this section, we show how to decompose these symmetric convex templates with respect to the VIJ operation. The concept of a convex set is usually defined in Euclidean space R”.In image processing, an image has to be digitized on a subset of 2”. Hence, it is not clear what a digital convex set should be. In this section, we first define what a digital convex set is in Z z and what a convex template is. Then we define a set of special symmetric convex templates, called cross templates, and show that all Boolean symmetric convex templates can be decomposed as a sequence of Boolean cross templates. We can actually show a stronger result, that is, a Boolean symmetric template is convex if and only if it is the composition of a sequence of Boolean cross templates. Here again we only consider translation invariant templates. Let X be a subset of Z2. We say that X is simply connected if it is fourconnected and has no hole. The set X is symmetric if it is symmetric with respect to both the x-axis and the y-axis. If X is finite and simply connected, then a point x is called an extreme point of X if x is in X but x does not lie between any two points of X. In other words, x has at most one horizontal neighbor and at most one vertical neighbor in X. Definition. A subset X of Z z is called convex if it is four-connected and satisfies the following two conditions: 1. The set of extreme points of X forms a convex polygon H in R2. 2. For any x E Z 2 , x E X if and only if x E H.
G . X. RITTER
290
Definition. An invariant template t e(RXm)' is a convex template if its support S-,(t,,) is convex for each y E Y. The template t is called a symmetric convex template if its support is both symmetric and convex.
It is clear that a convex set in 2' is uniquely defined by its extreme points and so is the support of a convex template. From the definition of convex template, we immediately have the following: Proposition 4. Let t e(RXm)' be a symmetric convex template and (xl,y1),...,(x,,y,) be the extreme points of S-,(t,) in the first quadrant, where xI I ... 5 x, and y = (0,O). Then for i = 2,. , ,,n,
min{xi - x i - l , yip
-
y i ) = 1,
and for i = 2, ...,n - 1,
Definition. A cross template cross(i,j) is defined as the overlap of a row template of size 2i + 1 and a column template of size 2j t 1 on their target pixels, where min{i, j } is either 0 or 1.
Some examples of cross templates are shown in Fig. 10. Clearly, all cross templates are convex templates. Furthermore, they are irreducible in the sense that they cannot be decomposed into a sequence of smaller templates (except the simple rows and columns). A far more important property is that all Boolean symmetric convex templates can be represented as the p J -composition of a sequence of Boolean cross templates. A template t E (RE,)' is called Boolean if all of the weights in its support are 0's. Proposition 5. Any Boolean symmetric convex template t has a decomposition t = cross1 FJi . . . cross,,,, where crossk(ik,jk), k = 1 , . . . ,m, are Boolean cross templates.
s+t
FIG. 10. Examples of cross(l,O),cross(1, I), and cross(0, I ) templates.
RECENT DEVELOPMENTS I N IMAGE ALGEBRA
29 1
Proof: We prove this by induction on n, the number of extreme points of S - , ( t , ) in the first quadrant. If n = 1, let ( x , , y l )be the extreme point in the
first quadrant. Then t is a Boolean rectangular template that is separable, and t = cross(x,, 0) VIJ cross(0,y,). For n 2 2, let (x,, yl),. . . ,( x n ,y,) be the extreme point of S_,(t,) in the first quadrant such that x , I .. . < x,. We define a Boolean cross template as crossl(x2 - x I ,y, - yJ. It can be shown by using Proposition 4 that ( t @ cross,) crossl = t and t cross, is still a Boolean symmetric convex template, whose configuration has n - 1 extreme points (xl,yl), (x3 - Ax, y3),.. .,(xn- AX,^,,), where Ax = x2 - x l . Thus by the induction hypothesis, t crossl has a M-decomposition of a sequence of Q.E.D. H cross templates. Hence, so does t. The set of all Boolean symmetric convex templates from Z2 to Z2 are generated by the set of Boolean cross templates and closed under the operations and El. In Li and Ritter (1990), an algorithm is derived from Proposition 5 that can be used to decompose an arbitrary Boolean symmetric convex template into a sequence of cross templates. In many morphological filters, a disk template or spherical template is desirable because it defines an equal-distance neighborhood.
Definition. An invariant template t E (R!,)' is called a disk template if it is Boolean and its support S-,(t,) is a digital disk. Definition. An invariant template is called a sphericul template if its support S-,(t,) is a digital disk and the weights in its support define a digital half-sphere. Now the problem is that given an integer r, we need to find a sequence of ... t, gives rise to a disk templates t l , . . . ,t, such that the composition t, (or spherical) template of radius r. Note that a disk template is a special Boolean symmetric convex template, and thus can be decomposed by the algorithm given in Li and Ritter( 1990).In this paper, an algorithm is presented that also decomposes a spherical template as a sequence of cross templates with various weights. Decompositions of disk and spherical templates can result in very efficient template operations. When a template is applied to an image, the amount of computations involved is proportional to the size of the support of the template operations. When a template is applied to an image, the amount of computation involved is proportional to the size of the support of the each new image pixel. Thus, the larger the template is, the more computations would be required.
G.X. RITTER
292
Proposition 6. Let t = cross,(i,, j , ) El ... Ed cross,(i,, j,) be a decomposiR!,)' of radius r. Let T ( r )= tion of some disk (or spherical) template t E ( card(S-,(t)), the size of the support of t, and TD(r)= card(S-,(cross,)) + card(S-,(cross,)). Then T,(r) I 5r 1.
+
+
Proof: Note that n I r
+ 1 for any r 2 1, and i , + i2 + ... + in = r, j , + j , + . + .+ j , = r.
It follows that
+ .. . + card(S-,(cross,)) = (W, + j , ) + 1) + ... + (2(i, + j , ) + 1) = 2(il + i, + + in) + 2 ( j , + j , + ... + j,) + n
TD(r)= card(S-,(cross,))
=4r+n I 5r
+ 1.
Q.E.D.
It is easy to see that T ( r )2 d .Thus after the decomposition, the template operation can be faster by an order of magnitude. In conclusion, template decomposition is not only necessary in the case where special image processing hardware cannot handle large templates, but also desirable when the problem of efficient computation is of importance. Note that the decomposition techniques presented in this section were based on the associative properties of template operations in image algebra. If both associative and distributive properties are used, a weak decomposition may also be derived.
D. Image Algebra and Artijicial Neural Networks
In recent years there has been a resurgence in the field of artificial neural networks. This resurgence has brought new hope of achieving humanlike performance in the fields of image processing and target identification (DARPA, 1988; Grossberg, 1988; Rumelhart, 1988). Image algebra, on the other hand, was developed for the express purpose of providing a common mathematical image processing environment, as well as providing an algebraic tool for image processing algorithm development, comparison, and optimization (Ritter and Wilson, 1987a, 1987b; Ritter et al., 1987b, 1990). In this section we investigate how these two apparently independent developments have converged to a similar mathematical framework. We show how
RECENT DEVELOPMENTS IN IMAGE ALGEBRA
29 3
a subalgebra of the image algebra includes the mathematical formulations of currently popular neural network models. In addition, we provide image algebra expressions representing algorithms designed for neural network computations. These image algebra expressions are extremely simple and translucent. The neural network algorithms represented by these expressions look like their textbook formulations, with no lengthy code involved. In addition, we point out how image algebra suggests more general classes of neural networks than those that are under current investigation. Artificial neural network models are specified by the network topology, node characteristics, and training or learning rules. These rules specify an initial set of weights and indicate how weights should be adapted during use to improve performance. The two basic equations underlying the theory of computation in a neural network are
+ 1) = c n
q(t
Uj(t)Wij
j= 1
and
where aj(t)denotes the value of the jth neuron at time t, n the number of neurons in the network, wij the synaptic connectivity value between the ith and j t h neuron (at time t), ri(t 1) the next or total input effect on the ith neuron, 8 a threshold, and f the next state function that usually introduces a nonlinearity into the network. Although not all current network models can be precisely described by these two equations, they nevertheless can be viewed as variations of these. If we let X denote the one-dimensional array X = { 1,2,. . . ,n} c R', a E RX the set of current values of the neurons (i.e., a = {a( l), a(2), . ..,a@)}, where a(i) denotes the value of the ith neuron), b E RX the set of next values, and define t E (RX)' by ti(j) = wij for all i, j E X, then Eqs. (148) and (149) correspond to the following equivalent image algebra expressions:
+
r=a@t
( 150)
and
b =f(r
-
O),
(151)
where T E RXdenotes the intermediate image. Since image algebra is capable of expressing the computational methodology of neural networks, it should be obvious that expressing neural network algorithms in the language of the image algebra poses no great problem. As we shall demonstrate, neural network algorithms expressed in the
294
G . X. RITTER
image algebra are extremely translucent and resemble their textbook formulations. Thus, image algebra is an ideal language for neural network algorithm development and comparison. In the following examples we present image algebra formulations of some popular neural network algorithms.
Example 17 (The Hopjeld net algorithm): The Hopfield net can be used as an associative memory to solve optimization problems (Hopfield, 1982). One version of the Hopfield net that can be used as a content-addressable memory is described in terms of the image algebra formalism in this section. This net and two other nets presented in this paper are normally used with binary inputs. In image processing applications, these nets are most appropriate in the classification of Boolean images. The Hopfield net under consideration has n nodes containing hard limiting nonlinearities and binary inputs and outputs taking the values + 1 and - 1. The output of each node is fed back to all other nodes by weights w i j .The operation of this net is described below. First, weights are set using the given recipe from exemplar patterns from all classes. Then an unknown pattern is imposed on the net at time zero by forcing the output of the net to match the unknown pattern. Following this initialization, the net iterates in discrete time steps using the given formula. The net will have converged when the output no longer changes. The pattern specified by the node outputs after convergence is the net output. We need to point out that convergence is not assured, since in the algorithm description outputs are not updated asynchronously. However, convergence to an exemplar occurs most of the time (above 98% in 650 test cases) if the number m of classes is less than 0.12 times n, the number of nodes of the net. A detailed discussion and the mathematical formulation of the Hopfield net algorithm and subsequent neural net algorithm presented in this section can be found in R. Lippmann’s excellent survey paper on computing with neural nets (Lippmann, 1987). Let n be the number of input elements or nodes in the net, and let rn be the number of exemplar patterns. Let X = { 1,2,, . . ,n} c R. The weights of the net are represented as a generalized template t E (Rx)x that is defined as follows: t i = { ( j ,ti(j)):j E
X},
V i E X,
where
and x: is the ith element of the exemplar for the pattern class k.
RECENT DEVELOPMENTS IN IMAGE ALGEBRA
295
Now let c in { - 1,l)' be the unknown input pattern. Then the Hopfield net algorithm is as follows: repeat a:= c b:=aOt
until
+
c:= ~ , ~ ( b-)~ < ~ ( b a) * x&) c =a
If asynchronous behavior is desired in order to achieve convergence (not necessarily to an exemplar pattern), then either the template t needs to be parametrized so that at each application of a 0 t only one randomly chosen neuron changes state, or the following modification of the algorithm can be used. repeat
until
i := choice(X) a:= c b:= a b ( i ) : = (a 0 t)l,i, c := X>O(b)- X < O ( b ) c =a
+a* XoW
It is obvious that the latter algorithm leads to a loss of parallelism and, hence, loss of speed. Similar modifications and observations hold for the nets described in the remaining part of this section. Example18 (Hamming net algorithm): The Hopfield net is often tested on problems where inputs are generated by selecting exemplar patterns and then reversing the bit values randomly and independently with a given probability. The optimum minimum error classifier calculates the Hamming distance to the exemplar for each class and selects that class with the minimum Hamming distance. The Hamming net is a neural net that implements this algorithm. The image algebra version of the Hamming net is described below. It divides the net into a lower and upper subnet, described in steps 1, and 2, respectively. Weights and thresholds are first set in the lower subnet such that the matching scores generated by the outputs of the middle nodes are equal to n minus the Hamming distances to the exemplar patterns. These matching scores will range from 0 to I I (the number of elements in the input) and are highest for those nodes corresponding to classes with exemplars that best match the input. Thresholds and weights in the upper subnet, called MAXNET, are fixed. After weights and thresholds have been set, a binary pattern with n elements is presented at the bottom of the Hamming net. It must be presented long enough to allow the matching score outputs of the
296
G . X. RITTER
lower subnet to settle and initialize the output values of the MAXNET. The input is then removed, and MAXNET iterates until only one node is positive (Lippmann, 1987). Let X = {1,2,...,n} correspond to the domain of the lower net, Y = { 1,2,,.. ,m} to the domain of MAXNET, and 0 = n/2. The weights of the lower subnet and upper subnet are represented by generalized templates sE' )R( and t E, ')'R( respectively, which are defined as follows: XJ
sj(i) = 1, V j E Y and i E X, 2
where x! is the ith element of the exemplar for the pattern class j , and E < l/m. Now let a E { - 1, l}x be the unknown input pattern. Then the image algebra version of the Hamming net algorithm is as follows: Step 1. Calculate matching scores.
Step 2. Pick the maximum (MAXNET). repeat b:=c@t c:= X>l(b) + b * X,O,l,(b) until C(x>,,(c))= 1 class := domain (c 1 ,J Example 19 (Carpenter Grossberg Net Algorithm): We only present the image algebra interpretation of this net. The reader interested in the description and properties of this net is referred to R. P. Lippmann's paper (1987). Let X and Y be as in the Hamming net algorithm, and MAXNET the MAXNET algorithm. Let p be the vigilance threshold, where 0 5 p I 1. The bottom-up and top-down weights are represented by generalized templates s E' )R( and t E, ')'R( respectively, which are defined as follows: 1 sj(i) = -
l+n
t j ( i )= 1
V j E Y,i E X,
V j E Y, i E X.
RECENT DEVELOPMENTS IN IMAGE ALGEBRA
297
Let d E R Y be an auxiliary image that will be used to disable temporarily the best matching node in the net. The Carpenter/Grossberg net algorithm can be expressed as follows: while there is more input let a be the next input, and d := maxint repeat b:= a 0 s b:= b A d j* := MAXNET(b) n, := C a n, := C [(a 0 t)I{j.J d(j*):= 0 until nl/n2 2 p s:= s * r l ( j * ) + t * r2(a,j*,n 2 ) t := t * w(a,j * ) end while The parametrized templates r l , r2, w E (RX)' above are defined as follows:
rlj(i) =
0
j=j*
1
j#j*
Vi E
X,j E Y, Vi E X, j E Y,
wj(i) =
a(i) {I
j =j*
j#j*
Vi
E
X,j E Y
Example 20 (Single-layer perceprron): For the description and properties of the perceptron nets, we again refer the interested reader to R. P. Lippmann's work. Let X = { 1,2,. . . ,n). Let a, w E RX be the input and weights respectively, and d E { - 1,1} be the desired output. Initialize w and 8 to be small random values. Then the single-layer perceptron is expressed as follows:
while there is more input y:=fh(Ca * w - e) w:= w + ~ ( -d y) * a end while Note that gain fraction.
.fh
is the hard limiting function and 0 < q < 1 is a positive
rn
298
G . X. RITTER
Example 21 (Three-layer perceptron with back propagation training): Let X = { 1 , 2 ,..., n},Y, ={1,2 ,..., nl},Y2={1,2,...,n 2 } , a n d Y = { 1 , 2,..., m}. Let w E (R")", w 1 E (RV1)'I, and w2 E (RV2)' be the weights among three layers, which are initialized to small random numbers. Let 8 E Rvl, 8, E RY2, and 0, E: Rv be the thresholds among three layers, which are set to small random numbers also. The back propagation training algorithm is as follows: while there is more input let a be the next input, and d the desired output a, := &(a 0 w - 0) a2:= L(a, 0 w 1 - 0,) b:= f,(a2 0 w2 - 0,)
6, := b(l - b)(d - b) 6, := a,(l - a,)@, 0 (w,)') 6:= a l ( l - a,)(6, 0 (w,)')
+
w2 := w2 t2(6,, a2) wl:=w 1 tl(6,,a,) w := w t(d, a)
+
+
end while The parameterized templates t E (R')'', above are defined as follows: tj(i) = q 6 ( j ) a ( i )
t, E (RY1)Yz,and t, E (Ryz)y
Vi E X, j E Y,,
(tl)j(i) = q d d j b d i )
Vi E Y,,j E Y2,
(t2)j(i)= q6,(j)a2(i)
Vi
E Y,,j E Y.
Note that f, is the sigmoidal function and 0 < 9 < 1 is a positive gain fraction. The transpose of a template t E (RX)' is another template t' E (R')', which is defined as follows:
t3Y) = t,(x). Here, S(tL(y)) = {y E Y: t,(x) # O } . The thresholds 8, 8,, and 8, are adapted in a similar fashion. rn The above discussion and examples show how common neural network models are easily expressible in the framework of image algebra. However, image algebra as a mathematical structure encompasses far more general relationships of neural connections and computing than those provided by Eqs. (148) and (149). These equations state that the basic idea underlying
RECENT DEVELOPMENTS IN IMAGE ALGEBRA
299
artificial neural network theory consist of a vector/matrix product followed by some threshold function. Of course the matrix values may change (because of some predefined “learning” rules) each time we multiply the output vector (ie., at each neural firing). Nevertheless, at each stage of neural computing, (148) and (149) are reapplied. Image algebra, on the other hand, provides a much broader framework of neural computing. There is no overbearing reason why the vector/matrix product (148) or its equivalent image algebra formulation z = a 0 t (1 50) cannot be replaced by the more general product ~ = a @ t ,
(152)
where input neurons a have values from a value set that is distinct from the value set of the output t.In addition, the matrix product need not be the usual matrix product, but the generalized product discussed in Section 1II.B. For example, a new computational model is obtained by using r=aPJt
(153)
and b = f’(7 - 0)
(154)
in place of (148) and (149), respectively (Davidson and Ritter, 1990; Meyer et al., 1971). Obviously, interaction of biological neurons is a far more complicated and general process than that entailed by Eqs. (148) and (149). Applying generalized matrix algebra as defined in Section 1II.B may be a first step beyond the current limited neural network model. We have only begun to investigate the implications of Eqs. (153) and ( I 54). Stability and convergence conditions involving various models covered by these equations wait to be established. Layered neural networks using computations expressed by (148) and (149)together with those of (148)and (149) on different levels are yet to be investigated. As mixtures of these operations are of extreme importance in computer vision, there is no doubt as to their potential in neural network theory. E . Recursitle Processes
Many image transformations, such as the Fourier transform, average and median filters, and directional edge detectors, are considered as parallel operations. Each parallel image operation can be performed independently on each pixel of the given image, since the value of each pixel of the transformed image is only a function of the pixel values of the given image. The sequence in which the pixels are processed is completely irrelevant; therefore,
300
G. X. RITTER
these parallel image operations can be applied to all pixels simultaneously if appropriate parallel architecture is available. Parallel image transformations are also referred to as nonrecursive transformations. Along with nonrecursive transformations, a class of recursive transformations is also widely used in signal and image processing, e.g., IIR filters, sequential block labeling, predictive coding, adaptive dithering, etc. (Huang, 1981; Rosenfeld and Kak, 1982; Ballard and Brown, 1982; Ulichney, 1987). One of the characteristic properties of these recursive image transformations is that a pixel value of the transformed image may depend both on pixel values of the given image and on some pixel values of the transformed image itself. Thus, the transformed image may have to be computed recursively according to some partial order imposed on the underlying image domain. In other words, a pixel value of the transformed image may not be processed until all the pixels ordered previous to it have been processed. Some of these recursive transformations have their parallel counterpart. Yet many recursive transformations are considered to be more efficient if only conventional sequential computers are employed (Rosenfeld et al., 1966). In this section, we introduce the notion of generalized recursive templates and recursive template operations, which are the direct extension of the generalized templates and the corresponding template operations defined in the image algebra. Recursive templates are templates where some partial order is imposed on the target point y. Recall that a partially order set (P, <) (or poser) is a set P together with a binary relation < satisfying the following three axioms: 1. For all x E P, x < x (reflexivity). 2. If x c y and y c x, then x = y (antisymrnetry). 3. If x < y and y < z, then x < z (transitivity).
Let X and Y be point sets and F a value set. Also, let < be a partial order imposed on the point set Y. A generalized F-valued recursive template t from Y toXisafunctiont = (t#,t,):Y +(FX,FY),wheret,:Y -+FX,andt,:Y +FY, such that 1. Y E S(t,(y)), and 2. for each z E S(t,(y)), z < y.
Thus, for each y E Y, t,(y) is an F-valued image on X, and t,(y) is an F-valued image on Y. For notational convenience we define t,, = t#(y), and t,, = t,(y). So, t, = (t,,, t,,), The support of t is defined as S(t,) = (S(tgy),S(t<,)).The set of all F-valued recursive templates from Y to X that admit the partial order on Y will be denoted by (Fx, Fy):, or simply by (Fx, FY)'.
30 1
RECENT DEVELOPMENTS I N IMAGE ALGEBRA
FIG. 11. Example of a recursive template.
Similarly, a recursive template t E (Fx, FY)2is called translation invariant if for each triple x, y, z E X with y + z E X and x z E X, we have t,(x) = ty+Jx z), or equivalently, t,,(x) = tgr+z(x z) and t,,(x) = t<,+Jx 2). An example of an invariant recursive template t is given in Fig. 11. If t is an invariant recursive template and it has only one pixel defined on the target point of its nonrecursive support S(t,,), then t is called a sirnpljfified recursive template. Pictorially, a simplified recursive template can be drawn in the same way as a nonrecursive template, since the recursive part and the nonrecursive part do not overlap. For sake of brevity we restrict our attention to only real and extended realvalued recursive templates. We define two new recursive template operations, denoted by 0,and < . Let Y, X c R" be finite. Let a E RX and t E (RX,RY)Y,that admits some partial order < on Y. Then we define the generalized recursive convolution c = a 0, t by
+
+
c(y) =
C
a(x). t,,(x)
xES(tgy)
+ 1
+
c(z)-t,,(z),
+
Y E Y.
(155)
zeS(t
The recursive template operations compute a new pixel value c(y) based both on the given image values a(x) and on some previously calculated new pixel values c(z), which are determined by the given partial order < and the region of support of the participating template. By the definition of the recursive templates, z < y for all z E S(t,,) and y $ S(t,,). Thus, c(y) is always recursively computable. Some commonly used partial orders in 2-D recursive transforms are forward and backward scanning, and serpentine scanning. It follows from the definition of 0, that the computation of a new pixel value c(y)can be done only after all its predecessors (ordered by <) have been computed. Thus recursive template operations may not be performed in a globally parallel way as nonrecursive template operations. Note that if the recursive template t is defined in such a way that S(t,,) = /a for all y E Y, then we would get the usual nonrecursive template operations: (Y, c(Y)):C(Y)=
1
x e srtr ,)
a@) -tc,(x), Y
E
Y
302
G . X. RITTER
So, the recursive template operations are natural extensions of the nonrecursive ones. Similarly, we can define recursive additive maximum and recursive multiplicative maximum as follows. Let a E R!, and t E (R!,, RTm):. Then
The generalized recursive convolution is a linear operation, while the recursive lattice operator is nonlinear (Li, 1990). For the sake of template decomposition, we need to define operations between recursive templates as well. Let X, W c R" be finite. Let t E (Rx, Rx)!5, s E (Rw, R X ) l be two recursive templates. Then we define the convolution of two recursive templates r = s 0, t, where r = (rx,r,) E (Rw, Rx), as follows:
0 t<, r< = 1 - (1 - s,) 0 (1 - t,),
r{
= SI.
where 1 E (RX)' is the unit template and is defined by 1 l,(x) =
ify=x otherwise
0
We also define the addition of two recursive templates r = s +
rZ. = s # 0 (1 - t,)
+ (1 - s,) 0tx,
r < = 1 - (1 - s,) 0 (1 - t<). Note that the recursive template composition is defined in terms of non recursive ones. Similarly, if t E (RX,, Rx,)?, s E (RW,, R]I,)? are two recursive templates that admit the partial order < on the point set X,then we define the recursive template composition r = s =. t, where r = (r+,r,) E (RWm,RX,)2, as follows: rx=s~EZIt,
r< = 4
A
(0 v s,)
(0 v t<),
where 4 E (R:m)x is the template defined by
-a
4 , ( x ) = +cc
ify=x otherwise
RECENT DEVELOPMENTS IN IMAGE ALGEBRA
303
We define the maximum of two recursive templates r = s v, t by
r K = s , m(0 v t) v (0 v s ) m t, r < = d, where 0 E (Rx,)'
A
(0 v s) p~ (0 v
t),
is the zero template and is defined by OJX)
=
("
- cc
ify=x otherwise'
Recall our observation (Section 1II.B) that the map Y: ((FX)',y,0 ).+ (F, , y, 6) is a ring isomorphism providing, of course, that (F, , y, 6 ) is a ring. We will state a similar result for recursive templates and operations. An invariant template t E (FX)' is said to be quarter-plane if its support S(t,) lies in a quarter plane. Here 0 = (0,O).In particular, t is said to be causul if its support lies in the first quarter plane. We denote the set of all causal templates in (FX)' by Tx.Now suppose that T, be the set of all rn x m upper triangular block Toeplitz matrices with upper triangular Toeplitz blocks. Then (T,,, +, x ) is a commutative ring that is isomorphic to (Tx,+, 0 )(Li, 1990). Let P, be the set of ordered pairs ( A ,B ) where A, B E T, such that det(B) # 0. We define a relation on P, as follows: ( A , B ) z (C,D) if and only if
A x D = B x C.
Then x is an equivalence relation on P,. We let p, denote the set of all equivalence classes [ A , B ] where (A, B ) E P,. We define an addition and a multiplication over the elements in p,,, as follows: [A,B]+[D,C]=[A x D + B x C , B X D ] ,
[ A , B ] * [ D , C ] = [ A x C , B x D].
Then (p,, +, *) is a commutative ring with identity. +<,@<). An invariant In a similar fashion, we construct the ring (T<,, recursive template t E (Fx, FY): is said to be quarter-plane if both t<, and tco lie in the same quarter plane. In particular, t is said to be causal if both t, and tt are causal. We denote the set of all causal recursive templates by T,,, . Define a mapping @: T,, -,P, by @(t) = (Y(tK), I, - Y(t<)).Also define a relation z on T,, by szt
if and only if
@(s)z @(t).
Then x is an equivalence relation on T<,. We let equivalence classes [t], where t E T<,.
?<,
denote the set of all
G . X. RITTER
304
(r,,,
It can be shown that +<,0,)is isomorphic to the ring (pm,+, *) (Li, 1990). Now since (F<x,+<, 0,) is a commutative ring, it easily follows that
r 0, (s 0, t) = (r 0, s) 0, t,
s 0,t = t 0,s, r 0, (s
+, t) = (r 0, s) +<(r 0, t),
0,(s 0,t) = (a 0,s) 0,t, a 0,(s 0, t) = (a 0, s) + (a 0, t), a
and (a
+ b) 0,t = (a 0,t) + (b 0, t).
These properties provide a mathematical basis for decomposition of recursive templates. Recursive transform optimization can now be based on sound mathematical methods. We conclude this section with an application example.
Example 22 (Distance and medial axis transform): The distance transform is a nonlinear transform, which may be implemented either recursively or nonrecursively (Borgefors, 1986). The recursive version works much faster than the nonrecursive one if only sequential machines are used (Rosenfeld et al., 1966). Let a be a binary image with + co for feature pixels and 0 for nonfeature pixels. The distance transform of a is a gray-level image b such that each pixel value b(i, j ) is the distance between a(i,j ) and the nearest nonfeature pixels. The distance can be measured as Euclidean distance, or city block, chess block, etc., subject to the specific applications. The distance transform can be used to compute the medial axis transform a)', and b E RX. (Rosenfeld er al., 1966). Let X c Z2, a E (0, + First, the distance transform b of a is given by
b:= ( a m ,t)O > t', where here the order < is the forward scanning (from left to right and top to bottom) order on X and > is the backward scanning (from right to left and bottom to top) order on X. The weighted medial axis W(a) of a is the set of local maxima of b, which can be computed by W(a):= b * Xo(b FJ s - b).
RECENT DEVELOPMENTS IN IMAGE ALGEBRA
305
Finally, the reconstruction from W(a) can be accomplished by setting
b = (Wa)
a<( -t)) a>t*.
Note that the templates may be defined according to the specific distance measure being used (Borgefors, 1986).
ACKNOWLEDGMENTS The author wishes to thank the Air Force Armament Laboratory (AFATL) and the Defense Advanced Research Projects Agency (DARPA) for sponsoring the development of the image algebra. The author is particularly grateful to Dr. Sam Lambert (AFATL), Mr. James Kirkpatrick (AFATL), Mr. Patrick Coffield (AFATL), Ms. Karen Norris (AFATL), and Dr.Jasper Lupo (DARPA) for their continued support of this research. This work has also been in part supported by a grant from the Florida High Technology and Industrial Council (Grant #9E069).
REFERENCES Ballard, D. H., and Brown, C. M. (1982).Computer Vision. Prentice Hall, Englewood Cliffs, New Jersey. Batcher, K.E. (1980).“Design of a massively parallel processor,” I E E E Trans. Computers 29(9), pp. 836-840. Birkhoff, G. ( 1984).Lattice Theory. American Mathematical Society Colloquium Publication, 25. Providence, Rhode Island. Birkhoff,G., and Lipson, J. (1970).“Heterogeneousalgebras,”J . Combinatorial Theory8,IlS- 133. Blum, H. (1967).“A transformation for extracting new descriptors of shape,” in Symp. Models for Perception of Speech & Visual Form (Weiant Whaten-Dunn, ed.). MIT Press, Cambridge, Massachusetts. Borgefors, G . (1986). “Distance transformations in digital images,” Computer Vision, Graphics. and Image Processing 34, pp. 344- 371. Cloud, E.,and Holsztynski, W.(1984).“Higher efficiency for parallel processors,” in Proceedings IEEE Sourhcon 84. Orlando, Florida, March, 1984, pp. 416-422. Crimmins, T. R., and Brown, W. M. (1985). “Image algebra and automatic shape recognition,” IEEE Trans. Aerospace and Elec. Systems AES-21( I) 60-69. Crookes, D., Morrow, P. J., and McParland, P. J. (1990).“An Implementation of Image Algebra on Transputers,” Tech. Report, Dept. of Computer Science, Queen’s University, Belfast, Northern Ireland. Cuninghame-Green, R. (1979). Minimax Algebra: Lecture Notes in Economics and Mathematical Systems 166. Springer-Verlag, New York. DARPA (1988).Neural Network Study. AFCEA International Press, Fairfax, Virginia. Davidson, J. L. (1989a). “Lattice Structures in the Image Algebra and Applications to Image Processing.” Ph.D. Dissertation, Dept. of Mathematics. University of Florida, Gainesville, Florida.
306
G. X. RITTER
Davidson, I. L.(1989b). “Minimax techniques for non-linear image processing transforms,” in Proc. of the 1989 SPlE Tech. Symp. on Optics, E1ec.-Opt., and Sensors, Orlando, Florida, March 1989, pp. 110-121. Davidson, J. L., and Ritter, G. X. (1990). “A theory of morphological neural networks,” in OEILASE 90 Optics, Electro-optics. and Laser Appl. in Sc. and Eng. (SPIE),Los Angeles, California. January 1990, pp. 378-388. DuR, M. J. B. (1982). “CLIP4,” in Special Computer Architectures for Pattern Processing (T. Ichikawa, ed.). CRC Press, Boca Raton, Florida, pp. 65-86. Dugundji, I. (1966). TopoLogy. Allen & Bacon, Inc., Boston. F0untain.T. J., Matthews, K. N., and DuR, M. J. B. (1988).“The CLIP7A image processor,” IEEE Pattern Analysis and Machine Intelligence lO(3). pp, 310-319. Gader, P. D. (1986).“Image Algebra Techniques for Parallel Computation of Discrete Fourier Transforms and General Linear Transforms.” Ph.D. Dissertation, Dept. of Mathematics, University or Florida, Gainesville, Florida. Gader, P. D., and Dunn, E. G. (1989). “Image algebra and morphological template decomposition,” in Proc. of the 1989 SPIE Tech. Symp. on Optics, Elec.-Opt., and Sensors, Orlando, Florida. March 1989, pp. 134-145. Gonzalez, R. C., and Wintz, P. (1977). Digital Image Processing. Addison-Wesley, Reading, Massachusetts. Grossberg, S. ( 1988). Neural Networks and Natural Intelligence. MIT Press, Cambridge, Massachusetts. Hadwiger, H. (1957). Vorlesungen iiber Inhalt, Oberjlache und Isoperimetrie. Springer-Verlag, Berlin. Haralick, R. M., Shapiro, L., and Lee, J. (1987a).“Morphological edge detection,” lEEE Journal of Robotics and Automation RA-3(1), 142-157. Haralick, R. M., Sternberg, S. R., and Zhuang, X. (1987b).“Image analysis using mathematical morphology: Part I,” IEEE Transactions on Pattern Analysis and Machine Intelligence PAMI3(4), 532-550. Hillis, W. D. (1985). The Connection Machine. The MIT Press, Cambridge, Massachusetts. Hopfield, I. J. (1982). “Neural Networks and Physical Systems with Emergent Collective es,” Proc. Natl. Acad. Sci. USA 79 (April, 1982),4-22. Hu, M. K. (1962). “Visual Pattern Recognition by Moment Invariants,” IRE Transactions on Information Theory IT-8 179- 187. T. S. Huang, ed. (1981). Two-Dimensional Digital Signal Processing I-Linear Filters, Vol. 42, Topics in Applied Physics. Springer Verlag, Berlin. IVS Inc. (1988). “Image Algebra FORTRAN Version 2.0 Language Description and Implementation Notes,” IVS-TR-88-02, Gainesville, Florida. Klein, J. C., and Serra, J. (1972).“The texture analyzer,” J. Micros. 95, pp. 349-356. Lee, S. Y..and Aggarwal, J. K. (1987). “Parallel 2-D convolution on a mesh connected array processor,” 1EEE Transactions on Pattern Analysis and Machine Intelligence PAMI-9, 590594. Li, D. (1990). “Recursive Operations in Image Algebra and their Applications to Image Processing.” Ph.D. Dissertation, Dept. of Computer and Information Science, University of Florida, Gainesville, Florida. Li, D.,and Ritter, G. X. (1990). “Decomposition of separable and symmetric templates,”in Image Algebra and Morphological Image Processing, SPIE lntl. Symp. on Optical and Optoelectronic Appl. Sc. and Eng.. July. 1990, pp. 408-418. Lippmann, Richard P. (1987). “An Introduction to Computing with Neural Nets,” IEEE Transaclions Acousl.. Speech, Signal Proc. ASSP-4 (1987), 4-22. Maragos, P. (1985). “A Unified Theory of Translation-Invariant Systems With Applications
RECENT DEVELOPMENTS IN IMAGE ALGEBRA
307
to Morphological Analysis and Coding of Images.’’ Ph.D. Thesis, School of Electrical Engineering, Georgia lnstitute of Technology, Atlanta, Georgia. Maragos, P., and Schafer, R. W. (1986). “Morphological skeleton representation and coding of binary images.” IEEE Trans. Acoustics, Speech, and Signal Proc. ASSP-34(5), 1228- 1244. Maragos, P., and Schafer, R. W. (1987a). “Morphological filters part I: Their set-theoretic analysis and relations to linear shift-invariant filters,” lEEE Trans. Acoustics. Speech, and Signal Proc. ASP-35, 1 153- 1169. Maragos, P., and Schafer. R. W. (1987b).”Morphological filters part 11: Their relations to median, order-statistic, and stack filters,” IEEE Trans. Acoustics, Speech, and Signal Proc. ASSP-35, 1 170- I 184. Matheron, G . (1975). Random Sets and Integral Geometry. Wiley, New York. McCubbrey, D. L., and Lougheed, R. M.(1985). “Morphological image analysis using a raster pipeline processor,” in IEEE Computer Society Workshop on Compurer Architecture for Pattern Analysis and Image Daruhasc Manuqemenr Norvmher 1985, Miami Beuch. Floridu, pp. 444-452. Meyer, T. E., Freeman, P. M., Davidson, J. L. (19901. “Modeling Neural Net Chips using Image Algebra,” in Proc. o / the SPlE Conference on lnirige Alqehra and Morpholoyiciil Imiige Processing, 1350, San Diego. California. July 1990, pp. 296-307. Miller, P. E. (1983).“Development of a Mathematical Structure for Image Processing,” Optical Division Tech. Report. Perkin-Elmer, Danbury. Connecticut. Minkowski. H. ( 1903). “Volumen und Obertlliche.” Muthcmurische Annulen 57.447 -495. Minkowski, H. (191I). Gesammelte Ahharidlungen. Teubner Verlag, Leipzig- Berlin. Nevatia, R., and Babu, K. R. (1980). “Linear feature extraction and description.” Computer Griiphics and Imaye Processing 13. pp. 257-269. Perry, W. K. (1987). “IAC: Image Algebra C.” Master’s Thesis, University of Florida CIS Department. Pratt, W. K. (1978).Digital Image Processing. John Wiley, New York. Ritter, G. X., and Gader, P. D. (1987).“Image algebra techniques for parallel image processing,” Journul of Parallel and Distributed Computing 4(5), 7-44. Ritter. G. X., and Wilson, J. N. (1987a).“The image algebra in a nutshell,” in Proceedings of’ the First lnlernational Confkrence on Computer Vision, IEEE Compuier Society, London, June 1987, pp. 641-645. Ritter. G. X.. and Wilson, J. N. (1987b).”Image algebra: A unified approach to image processing,” in Proceedings of’ the S P l E Medical Imaging Conference,Newport Beach, California,February 1987. pp. 338-345. Ritter, G . X., Gader, P. D., and Davidson, J. L. (1986). ”Automated bridge detection in FLIR images,” in Proceedings o j the Eighrh lnternutionul Conference on Pattern Recognition. Puris, Frunce. pp. 862-864. Ritter, G. X., Davidson, J. L.. and Wilson, J. N. (1987a).“Beyond mathematical morphology,” in Proc. o/‘ SPIE Con/:- Visual Comniuniciition ond Image Processing 11, Cambridge. Massuchusetts, October 1987, pp. 260-269. Ritter, G. X., Shrader-Frechette, M. A., and Wilson. J. N. (1987b). “Image Algebra: A Rigorous and Translucent Way of Expressing All Image Processing Operations,” in Proc, qf‘ the 1987 SPlE Tech. Symp. Southeast on Optics. Elec.-Opt.. rind Sensors, Orlando, FL (May 1987). Ritter, G . X., Li, D., and Wilson, 3. N. (1989). “Image algebra and its relationship to neural networks,” in Proc. qf the 1989 SPlE Tech. Symp. Southeast on Optics. E1ec.-Opt.. and Sensors, Orlundo, Florida. March 1989, pp. 90- 101. Ritter, G. X., Wilson, J. N., and Davidson, J. L. (1990). “Image algebra: An overview,” Computer Vision. Graphics. and Image Processing 49. pp. 297-33 I . Rosenfeld, A., and Kak, A. C. (1982). Diyiful Picture Processing. Academic Press, New York.
.
308
G . X. RITTER
Rosenfeld, A., Waltz, J., and Otsuki, T. (1966). “On the total curvature of surfaces in Euclidean spaces,” Japan J. Math. 35(4), 61-71. Rumelhart, D. E. (1988). Parallel Distributed Processing. MITPress. Cambridge. Massachusetts. Serra. J. (1982).lmage Analysis and Mathematical Morphology. Academic Press, London. Sternberg, S. R. (1980).“Language and architecture for parallel image processing,” in Proc. of the Conf. on Pattern Rec. in Practice (Gelsema, E. and V. anal, L. ed.), North-Holland Publ. Co., Amsterdam, May 1980, pp. 21-23. Sternberg, S. R.(1983).“Biomedical image processing.” Computer 16(1), pp. 22-34. Sternberg, S . R. (1985).“Overview of image algebra and related issues,” in Integrated Technology for Parallel lmage Processing ( S . Levialdi, ed.). Academic Press, London. Uhr, L. (1 983). “Pyramid multi-computer structures, and augmented pyramids,” in Computing Structures for Image processing (M. J. B. Duff, ed.). Academic Press, London, pp. 95-102. Ulichney, R. (1987).Digital Halftoning. MIT Press, Cambridge, Massachusetts. Unger, S . H. (1958).“A computer oriented toward spatial problems,” Proc. IRE 46, pp. 17441750. von Neumann, J. (1951). “The general logical theory of automata,” in Cerebral Mechanism in Behauior: The Hixon Symposium. Wiley, New York. (Jeffress, L. A. ed.), pp. 1-41. Wiejak, J. S., Buxton, H., and Buxton, B. F. (1985).“Convolution with separable masks for early image processing,” Computer Vision. Graphics, and Image Processing 32,279-290. Wilson, J . N., Fischer, G. R., and Ritter, G. X. (1988). “Implementation and use of an image processing algebra for programming massively parallel machines,” in Proceedings of the Second Symposium on the Frontiers of Massioely Parallel Computation. Fairjax, Virginia, October. 1988, pp. 587-594. Zhuang, X., and Haralick, R. M. (1986). “Morphological structuring element decomposition,” Computer Vision, Graphics, and Image Processing 35, pp. 370-382.
ADVANCES IN ELECTRONICS A N D ELFCTRON PHYSICS. VOL XO
Image Filtering and Analysis through the Wigner Distribution GABRIEL CRISTOBALt CONSUELO GONZALO' AND JULIAN BESCOS' 'International Computer Science lnstitutr and Elecrricul Engineering-Cr)mputrr Science Depurtmenr Uniuersify rf Cal(/brniu. BcJrkeley Culifimia 'Insrituto de Optica del CS1C
Madrid. Spain 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 11. The Wigner Distribution . . . . . . . . . . . . . . . . . . . . .
A . Definition . . . . . . . . . . . . . . . . . . . . . . . . . B. Properties . . . . . . . . . . . . . . . . . . . . . . . . . C. Discrete Wigner Distribution . . . . . . . . . . . . . . . . . . D. Cohen's Unified Approach and Related Distributions . . . . . . . . . . 111. Wigner Distribution Representation of Images . . . . . . . . . . . . . A . Digital Implemenlations . . . . . . . . . . . . . . . . . . . . B. Optical Implementations . . . . . . . . . . . . . . . . . . . . C. Hybrid Implementations . . . . . . . . . . . . . . . . . . . . D . VLSl Implementations . . . . . . . . . . . . . . . . . . . . IV. lmage Filtering through the Wigner Distribution . . . . . . . . . . . . . A . WD Filtering for I-D and 2-DImages . . . . . . . . . . . . . . . B. WD Computation Using the Hartley Transform . . . . . . . . . . . . V . lmage Analysis through the Wigner Distribution . . . . . . . . . . . . . A . Feature Extraction . . . . . . . . . . . . . . . . . . . . . . B. Classification . . . . . . . . . . . . . . . . . . . . . . . . C . Texture Discrimination . . . . . . . . . . . . . . . . . . . . VI . Applications of the Space (Time)-Frequency Representations . . . . . . . . A . Review of Applications . . . . . . . . . . . . . . . . . . . . 9. Trends toward Biological Image Modeling. . . . . . . . . . . . . . VII . Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . .
309 313 313 316 318 319 326 327 332 339 343 344 345 354 359 360 363 369 372 372 376 387 3XX 38R
I . INTRODUCTION
The joint space-spatial frequency representations have received special attention in the field of image processing. vision. and pattern recognition.This interest is due in essence to three different aspects: This class of functions 309 Copyrighl ( 1991 by hc.tdomic Press. I n L All nghts 01 reproduction In any form reserved I S B N n-iz O I ~ ~ X O - ( I
310
GABRIEL CRISTOBALet a/.
displays all image information in the conjoint domain where the representation is defined; neurophysiological studies have suggested that some cells of the primary visual cortex serve to encode some particular joint representation (Jacobson and Wechsler, 1988); and moreover, they have a high pattern separability, Nevertheless, the joint representations must observe some conditions in order to be useful in those areas: The correct marginal distributions should be contained for them-that is, the projections of the representation on the conjoint domain must yield the local image power and the image spectrum; they also should have high resolution in both domains; and they must be positive, in order to be interpreted as an energy distribution onto the joint space-spatial frequency domain. On the other hand, the condition of bilinearity is a logical one in the context of image processing. It has been claimed that the Wigner distribution function (WD) has the best properties to be used in image processing, against other joint representations of this kind. Thus, it has the best resolution, which is matched to that of the image in both domains; it overcomes the resolution trade-offs that traditionally have limited the utility of windowed power spectrum analysis. And besides, the WD is a joint bilinear representation, very close to being positive, invariant within linear transformations, and it contains all image information. These aspects will be explained more extensively below. The WD is a natural idea, which has been used in spatial filtering and pseudocolor (Bescos, and Strand, 1978), without the need to mention it explicitly. The interpretation of some current physical events and concepts can be done in an easy and elegant way, such as a musical score (Bartelt et al., 1980),or a ray in geometric optics (Bastiaans, 1981a).It can be understood, in one approach, as a function that simultaneously shows the distribution of the image energy onto spatial and frequency domain. The aim of this paper is to present a review of the state of the art of practical applications of the WD in image processing (image representation, filtering, and analysis). We are conscious that many interesting works are not contained here, but some of them could make up an independent body themselves. In this sense, it is worthwhile to mention Bastiaans’s theoretical works, which are about issues such as the relation between geometric optics and the Fourier optics (Bastiaans, 1978), the transport equations for the Wigner distribution function (Bastiaans, 1980), or the description of the distribution in the case of partially coherent light (Bastiaans, 1981a). In this last work, Bastiaans used the concept of generalized radiance introduced by Walther (Walther, 1968,1973). This author used the WD to establish a relation between the radiometry and the coherence. This theory was further elaborated by Carter and Wolf (1977). Other fields where the WD has became an useful tool, and that are not commented on in detail in our paper, are: signal processing (Boashash, 1984, Boashash and Escudie, 1983, speech represen-
IMAGE FILTERING/ANALYSIS THROUGH THE WlGNER DISTRIBUTION
311
tation and analysis (Szu, 1982; Janse and Kaizer, 1983; Athale et al., 1983), detection problems and theory of communication (Kumar and Carroll, 1984; Kay and Boudreaux-Bartels, 1985). We would have considered it interesting to begin this review by presenting the definition and properties of the WD for a continuous function. However, an exact evaluation of this distribution is generally impossible. This is so because the WD is defined, as we will see, by an infinite integral. There are two ways to deal this problem: by computing the integral optically, or by converting the continuous function into a discrete function and defining discrete versions of the distribution. Then a discrete version of this distribution is presented, and the consequences of this discretization are discussed. At the end of the second section, the space (time)-frequency distributions are discussed by using Cohen’s unified approach. And the W D s interference problems, introduced by the bilinear definition, are discussed in the context of pattern recognition and image analysis applications. It has been said above that the WD can be computed analogically or digitally. In the third section, different implementations of this distribution are presented. First, digital implementations are described in the case of images with 1-D and 2-D variations. The 2-D case could be considered a generalization of the I-D one, but it will imply an excessive number of calculations and high storage requirements. In this paper, a way of generating the 4-D Wigner distribution function through its spatial samples has been considered, which allows one to calculate the discrete Wigner distribution function in a fast and efficient form. Afterwards, we describe the evolution of the optical Wigner processors from their origins, in the 1970s, to the present. The present optical Wigner processors compute this function for any kind of input (I-D, 2-D, real and complex) in a very efficient way. The WD has been computed through the different implementations. Also, the inversion property was tested by recovering the image information. From the point of view of image processing, it is very interesting that such a representation retains all image information, since in this case this information could be retrieved, and the WD meets this property. This suggests that it could be used in different image processing operations. Nevertheless, it is necessary before its use to estimate the computer requirements that the image’s distribution generation and inversion implies. This evaluation brings out the need for developing another class of Wigner processors that in some way reduce those resources. The hybrid processors take advantage of the best characteristics of the optical and digital processors. Two different Wigner processors have been proposed in the literature. One of them performs all calculations optically and digitally manipulates the information in order to retrieve it. The other introduces an electronic processor, which computes the fast Fourier transformation (FFT)with the purpose of avoiding the coherent
312
GABRIEL CRISTOBAL ef al.
illumination. In the last part of the third section is presented the potential of using VLSI technology in the generation of the WD, and the other joint representations. The general techniques of spatial filtering of images are applied in almost all areas of processing. It can be used for performing restoration, enhancement, codification, and analysis of images. On the other hand, this kind of operation is frequently applied as a preprocessing tool in different fields, as pattern recognition, or vision modeling. The image filtering operations can be carried out through different representations of the image information. The most usual image representations used in image filtering are the spatial and frequency ones. Both representations are complete and equivalent. Nevertheless, in many cases, neither of them is appropriate to perform some kinds of filtering. For example, there exist many situations in which images presents space-variant degradations. Spheric aberrations, atmospheric turbulence defocusing and motion are a few examples of the most frequent distortions that can lead to space-variant degradations. In these cases, it is necessary to find another representation that play a similar role to the Fourier transformation in the invariant case. The joint space-spatial frequency representations have attracted much attention, since they could be more useful for the interpretation of information, and therefore the possibility of information filtering would be improved. The W D s property that allows retrieval of all image information from the distribution, and its possible interpretation as a local spectrum associated with each image point, suggest the generalization of the traditional filtering defined in the Fourier domain to the Wigner domain. Thus, spatially variant filtering operations can be performed by multiplying these local spectra by different masks. In Section IV, different space-variant filtering results, obtained through some of the processors and implementations presented in Section 111, are shown in the case of I-D and 2-D images. These filtering operations allow us to compare the WD results with respect to the classical Fourier methods. Then the application of the WD to simulate space-variant degraded images has been considered. The removal of the degradations has been carried out by filtering operations on the WD domain, through the minimal square error Wiener restoration technique. The WD computation via a fast Hartley transform has been considered instead of the standard fast Fourier transform method in order to reduce the number of computations required. An increasing corpus of evidence suggests the mammalian biological visual systems are capable of a selective spatial frequency analysis; however, the application of the classical Fourier spectra to texture analysis has been only partially successful (Sutton and Hall, 1972;Weszka et al., 1976).The main reason for these disheartening results is the fact that each frequency component contains global information. On the other hand, the shift-invariance
IMAGE FILTERINGiANALYSIS THROUGH THE WIGNER DISTRIBUTION
3 13
property of the Fourier spectra is obtained at expense of the loss of phase information. However, as Oppenheim and Lim have demonstrated, many features of a signal are retained in the phase information but not in the amplitude (Oppenheim and Lim, 1981). This fact led to some researchers to consider the use of local image representations by computing the power spectrum over subregions of an image. Bajcsy and Libermann were the first to perform work in this area using local (windowed) measurements of the image spectrum to compute the texture gradient (Bajcsy and Lieberman, 1976). The use of space-frequency distributions overcomes the shortcomings of the traditional Fourier analysis. Section V deals with the application of the W D for texture discriminations and classification by using both pairwise linear and discrimination analysis. Several textural features have been extracted from the WD, and a comparative study with the classical Fourier feature extraction methods has been performed. In section VI, a review of the applications of space (time)-frequency distribution has been presented, emphasizing those that are vision-oriented. Afterwards, the importance of these distributions in the modeling of the early vision processes has been considered, and we present a brief review of the physiological findings in order to have a quantitative measure of the degree of biological plausibility. We would wish that this paper was self-contained; however, it is clear that a basic background is necessary to understand the concepts and results presented here. We think that elementary knowledge about image processing and optics are sufficient for readers to understand the paper. In any case, an extensive bibliography is given about these issues. Also, some additional references have been included in the References section because of their relationship with the present work.
11. THEWIGNERD~STRIBUTION A. Drjnition
The modeling of a stationary linear process, in the area of signal processing, can be done by its spatial (or temporal) amplitude or by its spatial or temporal frequency; however, the assumption that a process is stationary fails to be true in many applications. In that case, it would be more appropriate to define a local power spectrum that combines the advantages of both descriptions. The Wigner distribution (WD), which gives a joint representation in the space and spatial-frequency domain, entails a rigorous mathematical framework in the study of these local representations. The WD
314
GABRIEL CRISTOBAL e t a [ .
significance and usefulness would be pointed out by means of a musical score (Bartelt et al., 1980). The musical score constitutes a fairly good representation for the musician, but is erroneous from the mathematical point of view try to specify a monochromatic frequency in a given point of time. It would be necessary to observe the signal at least in a period of time in order to define the frequency of the signal. The Wigner distribution (WD), introduced by Wigner (1932) as a phase representation in quantum mechanics, gives a simultaneous representation of a signal in space and spatial-frequency variables. It belongs to a large class of bilinear distributions known as Cohen’s class, in which each member can be obtained choosing different kernels in the generalized bilinear distribution definition (Cohen, 1966; Claasen and Mecklenbrauker, 1980~).The Wigner distribution might be interpreted as a local or regional spatial frequency representation of an image. It represents two main advantages with respect to other local representations. First, the WD is a real-valued function and encodes directly the Fourier phase information. Second, the election of the appropriate window size, which depends on the kind of analyzed information, is not required for the computation of the WD. The WD of a 2-D image is a 4-D function that involves Fourier transformation for every point of the original image. This fact leads one to consider different alternatives to compute the WD of discrete images (see Section 111). In order to overcome these problems, two possible alternatives have been proposed. First, several hybrid optical-digital image processors have been proposed in the literature to generate the WD of 1-D and 2-D signals. Second, some VLSI special-purpose architectures have been recently proposed for computing the distribution. In this section, we consider the auto-Wigner distribution function corresponding to 1-D signals for notational simplicity, making the extension to 2-D signals straightforward. Let us suppose a continuous, integrable, and complex function f(x). The symmetric definition of the Wigner distribution W,(x, o)is given by
where x and xo are spatial variables, o is the spatial frequency variable, and f*(-) means the complex conjugate o f f ( . ) .The product function rf(x,xo)is given by (2) The auto-Wigner distribution gives a generalized autoconvolution at nonzero frequency (Szu, 1982). From (l), it can be observed that the WD is the
IMAGE FILTERING/ANALYSIS THROUGH THE WIGNER DISTRIBUTION
3 15
Fourier transformation, for a given point x o , of the image product f ( . ) f * ( . ) . It may also be obtained from the Fourier transform, P(off(.) ), by
According to ( 1 ) and (3), the following relation is observed:
ycx,o,
=
W.(w,x),
(4)
which shows the symmetry between the two conjugate domains (space and spatial-frequency). The mathematical foundations of the W D were considered by De Bruijn in the context of quantum mechanics applications (De Bruijn, 1973). However, they can be modified in many instances to the field of signal processing. The Wigner distribution is closely related to the ambiguity function (AF) proposed by Woodward (1953) in the study of radar signals; the definition of the AF is given by
Both Eqs. (1) and ( 5 ) could be considered particular occurrences of the generalized local spectral signal representation complex-spectrogram (CS), defined by
The CS can be interpreted as a windowed process of the signal f(x) by a shifting window function g ( x ) (rectangular, Gaussian, etc.) and Fourier transformation of the product f(.)y*(.). If the function f ( x ) is chosen as window g ( x ) , a self-windowed S',- version is obtained in (6). In that case, Eqs. ( 5 ) and (6) are equivalent by the following relation: AJw,x)
= SJ-f(X,W).
(7)
Comparing Eqs. (1) and ( 5 ) leads one to conclude that the Wigner distribution is related to the ambiguity function through a double Fourier transformation :
The A F can be interpreted as a signal autocorrelation in the presence of Doppler shifts (Bartelt et al., 1980).
316
GABRIEL CRISTdBAL et al.
B. Properties A complete study of the properties of the WD was formulated by Claasen and Mecklenbrauker (1980a,b). The most salient properties for image processing applications are listed below. 1. Realness
The WD of any real or complex function is always real, since it is the Fourier transformation of an hermitian product function (see Eq. (2)). However, it is not possible to interpret this distribution as a density energy distribution, because the W D is not always positive. This implies that the phase information is implicitly encoded in the changes of sign of the distribution. The importance of the Fourier phase in image representation and analysis has been recognized by various authors (Oppenheim, 1981); (Behar, 1988). The WD-phase information gives up the spatial dependence correspondent to the spectral frequencies. 2. Space and Spatial -Frequency Marginals The WD integration over the spatial variable at a fixed frequency gives the spectral energy density at that frequency, and the WD integration over the frequency variable at a fixed point gives the local power at that point:
3. Finite Support
In the case of space-limited signals, the WD is zero out of the range of signal definition. The same property applies to the spatial-frequency domain:
f ( x )= 0
for 1x1 > N,
W,(x,o) = 0
for 1x1 > N;
if then
if then
9(0) =0
W,(x,w) = 0
(1 1)
for 101 > M, for IwI > M.
(12)
4. Space and Frequency Shifts Shifts in space and frequency domains give corresponding shifts in the distribution:
IMAGE FILTERINGIANALYSIS THROUGH THE WIGNER DISTRIBUTION Y(X)
= f ( x - x,)
+
Wg(X,4 = q x - xo, 0)
Y(w) = 9(0- (00) + Wq(X,w ) = W/(X,
(1)
3 17
(13)
- 00)
(14)
5. Inversion
The product function rf(x,xo) (See Eq. (2))can be recovered by an inverse Fourier transformation: 1
f;c
By introducing in (15), the variable transformation x 1 = x
+and 2 XO
XO
x2 = x - -, we obtain
2
and by substituting in Eq. (16) x1 = x2 and x2 = 0,
Equation (17) can be also interpreted as the necessary condition that a function of two variables must satisfy in order to be a WD. A function of x and w is a WD function if and only if the right-hand side of Eq. (17) can be separated in the product expression of the left side. A more general study was done by de Groot (1972) and includes the necessary and sufficient conditions.
6 . Product und Convolution The WD of a convolution of two signals f(-)and g(.) is equal to the convolution of the distributions in the spatial variable: h ( x ) = f(x) @ g(x) * wh(x.w) = w,(x,w) 0, Wg(x,w).
(18)
The W D of a product of two signals f and y is equal to the convolution in the frequency variable:
h ( x ) = f ( x ) - Y(X) *
wh(x,4 = W , ( x m )0, Wq(.x,w),
(19)
where 0, and 0, mean convolution in the variables x and o respectively.
318
GABRIEL CRISTOBALet ai.
7. Interference The WD computation of a multicomponent signal introduces spurious “cross-terms” because of its intrinsic bilinearity (Eq. (1)). As has recently been pointed out, the definition of a multicomponent signal is a controversial issue, because any signal can be split up in an infinite number of different ways (Cohen, 1989). The WD of the sum of two signals f ( x ) + g(x) is given by
w,+ &,0)= w,(x, 4+ Wg(x, + 2ReCWr.,(x, d l, 0)
(20)
where the last term in Eq. (20),which can be interpreted as the cross-WD corresponding to f ( x ) and g(x), is called “cross-term.” The presence of crossterms in the distribution is not necessarily always undesirable. The recent work of Choi and Williams (1989), exploring particular members of Cohen’s class in order to minimize the cross-terms but retain the basic properties, constitutes a major advance in the study of the time-frequency distributions.
C. Discrete Wigner Distribution Although the W D was initially proposed for continuous variable functions, Claasen and Mecklenbrauker proposed at the beginning of the 1980s a first definition for discrete variable functions (Claasen and Mecklenbrauker, 1980b). However, one of the main disadvantages of the discrete definition is that not all the properties of the continuous WD are preserved by discretization because of aliasing effects. In this sense, several alternative definitions have been proposed in the literature in order to overcome this problem (Chan, 1982); Claasen and Mecklenbrauker, 1983; Brenner, 1983); Day and Yarlagadda, 1983; Peyrin and Prest, 1986). It is shown that all the definitions are expressed by similar formulas and can be interpreted as a smoothed (filtered) version of the original elementary definition (Pacut et al., 1989). The discrete WD of a sampled function f ( n ) is defined by
w,(n, w ) = 2
c f ( n + k)f*(n
N- 1
- k)e-Zjwk,
k=O
(21)
where n and w = 2nrn/N are the spatial and frequency variables, respectively. Equation (21) can be interpreted as the N-point DFT of the image product
r-(n, k ) = f ( n + k ) f * ( n - k )
(22)
for a given point n. The discrete W D is periodic in the frequency variable with period 71, i.e., y ( n , ~=) W,(n, w + 7c). However, the signal’s Fourier spectrum periodicity is 271. The discrepancies between the different periodicities could be avoided
IMAGE FILTERING/ANALYSIS THROUGH THE WIGNER DISTRIBUTION
3 19
by discharging the factor 2 in the exponent of Eq. (21), but this has the drawback that the components in j' at 0 occur at 2 0 in the WD. This fact is a consequence of the intrinsic bilinearity of the definition given by Eq. (21) and means that the simultaneous apparition of the even and odd samples occurs separately. This can result in aliasing unless analytic signals are considered, or signal oversampling by a factor of two. In the case of image applications, two problems arise in practice in computing the discrete WD. First, the aliasing problem can be reduced by smoothing the original image using low-pass filtering. One additional problem that appears in practice is the spectral dispersion or leakage due to the window size, especially important in the case of small sizes. To reduce leakage, it is necessary to introduce spatial apodization or truncation filters in order to minimize the spurious side lobes of the sinc function, as a consequence of the windowing effect. The discrete WD definition given by Eq. (21) retains the basic properties of the continuous WD, but introduces discrete-space signals. However, in that case, the main difference comes from the inversion property. Let us suppose a discrete signal f ( n ) ,n = 0,. . . ,N - 1. Inserting k = n in Eq. (21) allows one to write (Ni21- 1
f(2n)f*(O)=
C wf (n, m ) e - j 2 ( 2 n m " ) n .
m = -N!2
(23)
From Eq. (23), only the even samples can be recovered. Inserting k - 1 = n in Eq. (21) leads to the recovering of the odd samples: (Ni2)- 1
j ( 2 n - l)f*(1) =
C wr(n,)71)e-j*"nmiN)(n-1).
(24)
m=-N/2
In the case of image filtering operations, the inversion property is crucial in order to manipulate the image in the Wigner domain and return backwards to the spatial domain (see Section 111). D. Cohen 's Unijied Approach and Related Distributions
It is interesting to introduce briefly some historical remarks about the origin and later advances of the joint time (space)-frequency distributions. Although the present study is based on the Wigner distribution, a more exhaustive discussion about the relations between the W D and other timefrequency distributions has already been considered elsewhere (Claasen and Mecklenbrauker, 1980c),(Cohen, 1989).Wigner was the first to have found, in the context of quantum mechanics, a function that simultaneously describes the position and momentum of a particle (Wigner, 1932). In his first paper, he formulated that a bilinear distribution in the wave function $, satisfying the
320
GABRIEL CRISTORAL et al.
marginals (Eqs. (9) and (lo)),cannot be always positive (the concept of a wave function J/ in quantum mechanics is formally identical to the concept of a signal f ) . This means that the WD cannot be interpreted as a energy density function. However, this interpretation has minor significance in the image processing applications. When the positivity is a major requirement, different smoothed distributions have been proposed in the literature in order to obtain positive distributions, maintaining the rest of the properties. More details can be found in a recent paper by Cohen (1989).Subsequently, Ville derived in the area of signal processing the distribution that Wigner proposed in quantum mechanics (Ville, 1948). Many other authors later use the Wigner-Ville or Ville distribution to denote the same distribution. In the context of the time-frequency analysis, the contribution of Dennis Gabor in his famous monograph "Theory of Communication" would be considered crucial (Gabor, 1946).The basic idea of Gabor's theory implies the representation of a signal in terms of quanta of information named elementary functions or logons, introducing the joint representation of signals by using the diagram of information. He proved the principle that this kind of simultaneous representation is limited by the uncertainty relation. Following a proper formalism in quantum mechanics, Gabor derived, by the use of the Schwarz inequality, which family of elementary signals minimizes the uncertainty relation between time and frequency. These elementary functions (now called Gabor functions) are gaussians modulated by sinusoids of the form
f (XI
=e
-(x-xo)~/K* e-jwx 1
(25)
where o represents the frequency of the sinusoid wave and x o and a represent the position and spread of the gaussian envelope, respectively. The mathematical properties of these functions are discussed elsewhere (Bastiaans 198la). We will consider in Section VI the relevance of Gabor's theory in the context of image applications. Kirkwood in 1933 proposed another distribution, conceptually simpler than the WD, given by (Kirkwood, 1933)
Rihaczek derived some time later the same distribution, which he named complex energy spectrum, giving some interesting insights about its physical interpretation (Rihaczek, 1968). The Rihaczek distribution is related to the WD by a double convolution: 1
r m
rw
IMAGE FILTERING/ANALYSIS THROUGH THE WIGNER DISTRIBUTION
321
The distribution given by Eq. (27) is complex, but its real part is also a distribution that satisfies the marginals. Page in 1952 proposed a new distribution named the instantaneous power spectrum (see Table I), but the study was done by Mark (1970) by introducing the physicul spectrum, essentially the same concept of spectrogram, which is worthwhile for its extensive use in different areas of research. The physical spectrum is related to the WD by W,,,(X,- S, wo - ~
)
~
(
~
~
, (28) ~ ~
where w ( x )represents a window function. Mark's physical spectrum coincides with the square modulus of the complex spectrogram. A general formulation of the bilinear time-frequency distributions was given by Cohen (Cohen, 1966). Each member of the class named Cohen's is given by
w)e-j(i~x~ onx+ ~ wiY)
dY dxo duo
7
(29)
where the signal f ( x )appears in the habitual bilinear form, and @isa so-called kernel function. The different members of Cohen's class will be obtained as particular occurrences of the kernel 0.For example, the WD and the ambiguity function can be obtained by taking 0 = 1 and Q, = 6 ( x - x o )6(m - Q ~ ) , respectively. This formulation has the advantage of giving a general framework to a more systematic study of this time-frequency distribution. A n interesting consequence of this formulation is based in the idea that by placing constraints on the kernel, one obtains a subset of the distributions that have an particular property (Cohen, 1966). Table I presents a summary of the main distributions and their corresponding kernels. We will discuss here a remarkable work published by Choi and Williams (1989) in the study of new distributions with desirable properties that simultaneously reduce the presence of undesirable effects. Some researchers have considered the WD as a musterfbrm distribution from which the rest of the distributions can be derived (Bartelt et ul., 1980; Jacobson and Wechsler, 1988; Reed and Wechsler, 1990). Choi and Williams found that by an appropriate kernel selection, one can reduce the presence of cross-terms for multicomponent signals while retaining the rest of the desirable properties (see Table I). The parameter CT in the kernel
)
d
~
0
I x"' + y
+
X I N
+
XIN
h
-.s %
nn r
0
x
IMAGE FILTERING/ANALYSIS THROUGH THE WIGNER DISTRIBUTION
323
regulates the presence of cross-terms. In the limit D + W , the Choi and Williams distribution becomes the WD. The presence of cross-terms is reduced for small values of 6,and the rate of cross-term decreasing is proportional to l/(&). Figure 1 shows some examples of kernels of different distributions and the corresponding results of the distribution’s computation. In this particular example of a multicomponent signal (two different frequency sinusoids occurring at different times), the reduction of cross-terms in the case of the Choi-Williams distribution is quite significant. Several different applications of the use of this new distribution have been reported in the study of speech, cortical event-related potentials (ERPs), and electromyograms (EMGs) (Williams and Jeong, 1989).The importance of this work would be reflected in the different methodology used to obtain distributions with desirable properties. Instead of modifying the WD in order to reduce the cross-terms, one is interested in changing the kernel for which the spurious values of the distribution are minimal. The definition of the Choi- Williams distribution kernel (Eq. (30))includes only a modifiable parameter D, but the same method can be explored in the future to find new distributions with desirable properties. The presence of cross-terms can obscure the importance of the autoterms, especially for applications in which signal discrimination and recognition is a critical issue. For instance, in the analysis of speech signals, the presence of these artifacts can lead to a misunderstanding of the presence of formants. In the case of medical applications, these artifacts can hamper the signal analysis and hence the subsequent diagnosis. Recently, Sun et ul. (1989b) have proposed the elimination of the cross-terms via preprocessing filtering based in the use of averaging or median filters. The basic idea is to consider the pseudo-Wigner distribution as an image with polarity, eliminating the negative values by using classical convolution masks. Another alternative to the reduction of the cross-terms is the use of the analytical signal rather than the real signal itself. This fact has the advantage of retaining the original WD definition without introducing a smoothing factor in the kernel. Zhu et ul. (1990b) have recently reported the use of a 2-D Hilbert transform for Wigner analysis of 2-D real signals. They proposed a 2-D continuous Hilbert transfo;m based on the Read and Treitel (1973) discrete Hilbert transform. The theoretical performance of the 2-D analytical signals in the WD computation was illustrated by Zhu et al. (1990a) by using simulated and real images. Figure 2a presents a simulated test image composed of two sinusoidal gratings: The first one is a pure sinusoidal image with a constant frequency, and the other is a so-called chirp image whose frequency varies with the spatial position. Figure 2b presets the Hilbert transform of Fig. 2a. Figures 2c and 2d present the results of the W D computation corresponding to a selected point from the original real signal and from the analytical signal,
w---
IMAGE FILTERING/ANALYSIS THROUGH T H E WIGNER DISTRIBUTION
(a
325
(d) FIG. 2.2. (a) Sirnulaled test image with two components; (b) 2-D Hilbert transform associated with (a); ( c ) 2-D WD slice corresponding to the point (47.47); (d) smoothed 2-D slice associated with the same point. (Reprinted by permission form Zhu et a/., 1990b).
326
GABRIEL CRISTOBAL et a/.
respectively. The reduction of the interference terms is quite significant in the case of using the analytical signal. However, the 2-D analytical signal, unlike the 1-D analytical case, is not unique. It is then necessary to decide which 2-D analytical signal can be used in order to reduce various possible interference terms. In general, the appropriate choice of the 2-D analytical signal depends of the spectral signal characteristics. Recently, Zhao et al. (1990) and Atlas et al. (1990) have also proposed the use of a gaussian kernel in order to reduce the interference terms while retaining the finite-time support and good frequency resolution. In the case of speech and other nonstationary signals, the kernel takes a cone-shaped support region. It is interesting to remark that the frequency responses of the cone-shaped kernels take the form of a lateral inhibition mechanism that can be observed both in auditory and vision processes. Further research is needed both in the study of smoothing kernels and in the 2-D analytical signals for image feature extraction and pattern recognition by using the WD spectral analysis.
REPRESENTATION OF IMAGES 111. WIGNER DISTRIBUTION It has alreay been mentioned that the Wigner distribution function (WD) was defined by Wigner in 1932 (Wigner 1932), in the context of quantum mechanics. Some years later, it was applied to the theory of communications (Ville, 1948), and to optics (Bastiaans, 1978). However, this function did not receive much attention until the well-known series of papers by Claasen and Mecklenbrauker (1980a-c), where the properties and applications of the distribution were carefully studied. In fact, although some theoretical work (Walther, 1968, 1973) had been presented before these papers, most of the work about this topic followed them. The WD doubles the number of variables of the represented image. In this way, the W D of 2-D images is a 4-D function (two spatial coordinates and two spatial-frequency coordinates). This fact is shown in Fig. 3, where a sinusoidal grating of 256 x 256 pixels, and with four periods per image (Fig. 3a), as well as its WD (Fig. 3b) are displayed. Since the original image varies only in the x direction, its WD is a 2-D function. This representation of the distribution is similar to the information diagram proposed by Gabor (Gabor, 1946).The computation of this distribution involves a Fourier transformation for every point of the original image. These facts lead us to consider the WD as a quite intensive process, which can limit the range of applications. The image processing can be a good example of this situation. Different alternatives have been proposed in the literature in order to overcome these problems: optical processors, VLSI special-purpose processors, and hybrid optical -electronic systems.
IMAGE FILTERING/ANALYSIS THROUGH THE WIGNER DISTRIBUTION
327
(a) (b) FIG.3. (a) Sinusoidal grating object of 64 pixels/period; (b) WD of (a)for the x direction and the autoconvolution profile (01 = 0) with zoom x 2. (Reprinted by permission from Cristobal, Bescos. and Santamaria, 1989.)
A . Digital Implementations
Most of the applications of the WD in image processing have been carried out through digital implementations (Jacobson and Wechsler, 1982a, 1982b; Cristobal et al., 1989; Gonzalo et al., 1989; Gonzalo et al., 1990). This is a consequence of the characteristics of this kind of system, such as the flexibility offered by programmability, or the high speed and storage achieved by them in recent years. From the definition of the discrete Wigner distribution function WD (see Eq. (21)),it is clear that the first step to compute this distribution is to generate the corresponding product function rf(n,k ) = f ( n + k ) f * ( n - k). Since here only real and positive value images will be considered, the function f*(.) is equal to f(.);then, the function r,(n,k) is obtained, for each value of k , by shifting the image represented in the spatial domain k pixels from the origin to the left and to the right, and multiplying these two images. For a particular test image, a composite rectangular grating with 32, 16, and 18 pixels/period (Fig. 4), its corresponding product function is shown in Fig. 5 ; each column represents a section of the product function associated with each value of k. The WD is obtained by computation of the l - D Fourier transformation of each column of the product function. The distribution of this composite grating is represented in Fig. 6. The frequency variable is mapped along the vertical axis, and the spatial coordinate along the horizontal axis. Figure 7 shows the WD corresponding to a rectangular window test. In this case, the distribution can be interpreted as a composite of sine(.) = sin(-)/(.) functions represented along the y-axis. Each one sinc comes through DFT of the different image products corresponding to the different points selected in the spatial domain.
FIG.4. Composite rectangular grating of 32, 16, and 8 pixels/period. (Reprinted by permission from Gonzalo, Berriel-Valdos, Bescos, and Santamaria, 1989.)
+
FIG.5 . Representation of the function /(n k ) f * ( n - k). The spatial variable n is represented along the horizontal axis, and the k parameter along the vertical axis. (Reprinted by permission from Gonzalo, Berriel-Valdos,Bescos, and Santamaria, 1989.)
328
IMAGE FILTERING/ANALYSIS THROUGH THE WIGNER DISTRIBUTION
329
FIG.6. Discrete Wigner distribution function of Fig. 4. The spatial variable n is mapped along the horizontal axis, and the k parameter along the vertical axis. (Reprinted by permission from Gonzalo, Berriel-Valdos. Bes&s, and Santamaria, 1989.)
FIG.7. WD for a rectangular window test in they direction and the autoconvolution profile 2. (Reprinted by permission from Cristobal, Bescos, and Santamaria, 1989.)
(w = 0) with zoom x
3 30
GABRIEL CRISTOBAL et al.
The procedure described above, for generating the 2-D WD, corresponds with a direct implementation of the definition. In this way, the generation of the 4-D WD, in the case of images, implies building a matrix of N x N x N x N elements, N x N being the number of samples of the represented image, as well as to perform N x N Fourier transformations of matrices of N x N elements. The computer time and the storage saving that these calculations required have limited the application of the distribution in the field of image processing. Nevertheless, a different digital implementation of the 4-D distribution has recently been proposed that significantly reduces these computer requirements in numerous practical applications (Gonzalo et al., 1990). This new implementation is based on the approach used by Bamler and Glunder (1983b). Since the WD of 2-D images is a 4-D function, this distribution must be represented by two-dimensional sectional pictures obtained by appropriate sampling. These sections may be arranged either in a plane, resulting in parallel two-dimensional representations, or in a temporal sequence; but in any case, it is necessary that the samples of the product function be calculated in the space where the Fourier transformations must be performed: that is, in the domain defined by the shifting parameters, denoted by k, and k, in the expression ( N / Z )- 1
c
r j h r n y , k,, k y )
x exp{ -2j[(?)k,
+ (%)k,]},
q b , , n,, mu,m,) = 2
c
( N / 2 )- 1
kx=-(N/Z) k y = - ( ~ / Z )
(31)
which shows the definition of the 4-DWD. To this purpose, it is considered that the discrete input image, f(n,, nJ, is represented in a (k,, k,) coordinate k,, system, yielding f ( k , , k,,). The first factor of the product function, f(n, ny + ky), is obtained by shifting the input value n, pixels in the horizontal direction, and n,, in the vertical direction. The second factor, f(n, - k,, n, - kJ, is produced by rotating the first factor by 180". When these two images are multiplied, a sample of the product function is projected on the plane (k,, k,). Figure 8 represents the procedure described above. The Fourier transformation of each 2-D section of the product function allows us to obtain the spatial samples of the WD. Figure 9 shows an example of an aerial image with two main tree species, Eucaliptus globulus and Pinus pinea. Figure 9b shows at the top two samples of the image product rj(n,, ny,k,, k,,), and on the bottom the corresponding samples of the WD. In spite of the aliasing effects, Gonzalo has also shown that the information of the real and positive digital images can be retrieved from the spatial sample. It is clear, from the inversion property (see Eqs. (23) and (24)),that different procedures are necessary for recovering different groups of samples
+
FIG.8. Scheme of the generation procedure of the 2-D samples of the product. (Adapted with permission from Bamler and Gliinder, 1983b)
(a) (b) FIG.9. (a) Aerial test image with two tree species: pinus (left) and eucalyptus (right) with zoom x 1.2; (b) image products at the point (-72,O) of the pinus region (above left) and at the point (72,O)of the eucalyptus region (above right) and WDs in the considered points (bottom). The origin is at the center of the picture. (Reprinted by permission from Cristobal, Bescos, and Santamaria, 1989.)
332
GABRIEL CRISTOBAL
el a[.
of the image from the product function’s samples. Thus, in the case of the images with one-dimensional variation, there are two different procedures: one for the even samples, and another for the odd samples. However, in the 2-D case, there are four procedures, whose application depends on the evenness of the columns and rows of the image samples. Each one of these groups of samples is multiplied by a different factor. In the general case, that fact implies that the value image samples cannot be retrieved; but if only real and positive value images are considered, these factors can be calculated through the property of the distribution that allows recovery of the local image power. The inversion property also shows that it is not necessary to generate all spatial samples of the distribution to recover all samples of the image. In fact, only N2/4 samples of this distribution contain all information. This is very interesting in some applications, as for example in image filtering, since the computer time is reduced to 25%. This new implementation has allowed space-variant processing operations of real images, as is shown in the next section.
B. Optical Implementations It has been already mentioned that the WD doubles the number of variables of the function that it represents. In this sense, the 2-D nature of the optical systems is very appropriate for performing the distribution in the case of functions with only one variable (temporal signals or images with onedimensional variation), but they also are very useful for 2-D images, as we will see. On the other hand, the ability of the optical systems to perform the Fourier transform of a light distribution is well known (Goodman, 1968).This aspect eliminates the bottleneck that supposes the computation of the Fourier transformations in the digital WD generation. These facts have caused most of the implementations of the W D to be optical. From the relation between the Wigner distribution function and the ambiguity function (AF) (see Eq. (8)),we have considered it interesting to include in this section not only the Wigner optical processors, but also those of the AF. It is necessary to mention that these last processors were developed some years before those of the Wigner function. Cutrona et al. (1960),Cutrona (1965), and Preston (1972) proposed coherent processors for the computation of the ambiguity function, which use multiple channels t o display this function for discrete values. Afterwards, Casasent and Casasayas (1975) presented a scheme that generates 1-D slices of the AF. Marks et al. (1977) developed a very simple processor that displays simultaneously all values of the A F corresponding to 1-D signals. This processor is shown in Fig. 10. It requires two identical 1-D transparencies of
IMAGE FILTERINC/ANALYSIS THROUGH THE WIGNER DISTRIBUTION
333
FIG.10. A coherent processor for ambiguity function display. Both the lenses have a focal length f. Fourier transformation is performed in the horizontal direction, and imaging in the vertical direction. (Reprinted by permission from Marks et a/., 1977).
the temporal signal f(t) in the plane P , , each rotated 45" in such a manner as to r ) / ( a ) ] f * [ (r T)/(&)], where r is the time form the product s(t,r) = f [ ( r variable and z represents a time shifting. Then, the lens L1 performs the l-D Fourier transform of the coherently illuminated transmittance in the horizontal direction, and images this transform along the vertical direction. In this way, the squared modulus of the AF is displayed in the plane P2. Similar processors have been used to generate the WD of I-D signals (Eichman and Dong, 1982) and giving time-variant filtered l-D signals (Subotic and Saleh, 1984a). The main disadvantage of this processor is the need for two identical transparencies of the signal, which must be accurately positioned and centered. The group led by Lohmann and Bartelt has presented different Wigner optical processors that avoid this drawback. In a first contact with this issue, Bartelt et a\. (1980) developed two optical setups for generating this distribution in the case of one-dimensional signals. One of these systems was based on a horizontal continuous displacement of a gaussian slit synchronized, by two coupled step motors, with a recording plate that moved vertically; in this way, the local spectrum of the signal was generated. Strictly speaking, they obtained a weighted version through the gaussian slit of the Wigner distribution. This method was more convenient when the signal, f ( x ) , was traveling by its own nature like a wave function f ( x - u t ) . For real time applications, they developed a second setup. There, a Gaussian transmittance was rotated by an angle of 45". Then, the onedimensional Fourier transform gives the local spectrum again. These implementations were used to generate the WD of some test signals and, as a practical example, the local spectrum of speech signals. They shown the suitability of this distribution for speech recognition and for speaker identification. Two years later, the same group presented another optical processor for generating the distribution of real signals (Brenner and Lohmann, 1982). From our point of view, this processor is one of the most representative Wigner optical processors of one-dimensional real signals. Figure 11 shows the setup of this processor. In this system, the object is first illuminated from the right side, and its image is formed by a cylindrical lens into a roof prism.
+
334
GABRIEL CRISTOBAL et al. OBJ
BS
FIG.1 1 . WD processor for real signals. R : roof-top prism, BS: beam splitter. (Reprinted by permission from Brenner and Lohmann, 1982).
This prism produces an inversion in the y coordinate and reflects the object’s wave back, through the imaging lens, onto the object transparency again, where the product function is produced. The W D of the signal is obtained by a Fourier transformation of this function, with respect to the y coordinate. This system is similar to one proposed by Marks and Hall (1979), but it contains only one cylindrical lens; furthermore, the adjustment is easier to perform in Brenner’s setup. In the case of complex signals, the generation of the WD is not as simple as in the real case. This point is clear from the definition of the distribution, since in this case, the complex conjugate signal f*(.)must be calculated to obtain the product function. Brenner and Lohmann ( 1 982) proposed two different ways for generating the WD of complex signals. First, they used the same optical setup that has already been shown (Fig. 11) and assumed a hologram as the complex object. Thus, the system is arranged so that the light senses the original image on its first pass through the hologram, and the complex conjugate on its return. Therefore, suitable masks permit selection of the desirable term. These authors designed another optical system for the production of the W D of a I-D arbitrary complex object. The first aim of this new setup (Fig. 12) was the generation of the function u(x + y) = exp(2lrjay), which can be achieved simply rotating the object, and choosing an appropriate illumination angle (sin c1 = RE). This signal is divided in two parts; one of them is inverted by a prism, with respect to the y coordinate, and the other is reflected without change. The two parts are joined again by using some recording material. In this way, the modulus square of the input is registered, One of the terms of this modulus square is the product function multiplied by an exponential. Then, a Fourier transform of this term generates the WD of the signal, in the first diffraction order (Born and Wolf, 1959).
IMAGE FILTERING/ANALYSIS THROUGH THE WIGNER DISTRIBUTION M
BS
335
R
-Y
I'
u(x+y/2IeZRiaY* u(X-y/zlp FIG. 12. WD processor for truly complex signals. M: mirror, BS: beam splitter, R : roof-top prism. (Reprinted by permission from Brenner and Lohmann, 1982).
Mateeva and Sharlandjiev (1986) have presented another optical processor for performing the WD of complex signals through spatial filtering. This optical setup is based on the analogy with Zernike's phase contrast method (Born and Wolf, 1959).Although the input signal is subject to certain constraints (the input phase must be small), it can be used for l-D and 2-D complex signals. These authors have show that under these restrictions, the product function can be obtained by using a reflective phase filter in the Fourier plane. In a second step, the WD of the input is obtained through an optical setup that had already been presented by Bamler and Glunder (1983b) for generating the WD of 2-D functions. These authors proposed, for the first time, several optical setups to obtain the 4-D distribution through its 2-D spatial samples. The processors developed by Bamler and Glunder differ in the calculation of the samples of the product function. Figure 13a shows a processor, which calculates these samples by multiplying two properly shifted and rotated transparencies. In Fig. 13b, the transparencies are composed of two arrays built through periodical replications of the input image. As has already been mentioned, the need to use two different transparencies implies that the different pictures of the input image were identical, and an accurate adjustment of these transparencies is required. Another optical system developed by these authors, which avoids the duplication of the image, is represented in Fig. 14. A collimated beam illuminates the shifted input image, and a Fourier transformer lens generates the transform of this picture across the mirror. Then, a second Fourier transformation of the reflected irradiance projects, on the object plane, the image rotated and shifted by the same value from the origin, but in the
336
GABRIEL CRISTOBAL et al.
PdXu', v'
C 0HERENT
2 , y'l2
-d-
(b)
FIG. 13. (a) Coherent-optical generation of the WD by sandwiching two properly and rotated transparencies followed by an optical Fourier transformation. (b) Parallel generation of several sections through the WD. Transparencies I1 and 111 of (a) have been replaced by arrays formed by periodical replications of the input pattern. (Reprinted by permission from Bamler and Gliinder, 1983b).
opposite direction. In this way, a sample of the product function is obtained from a single input transparency. The second lens gives the spatial sample of the WD corresponding to that shift. In these experiments, two frequency coordinates of the 4-D function are displayed sectionally on the output plane. Since this work, other methods of generating the WD of 2-D real-valued signals have been investigated, but all of them use the same sectional idea.
IMAGE FILTERING/ANALYSlS THROUGH THE WIGNER DISTRIBUTION
X'f2
I
337
y q MIRROR
2dXu', v' FIG. 14. Optical generation of the WD from a single transparency. (Reprinted by permission from Bamler and Gliinder, 1983b).
Subotic and Saleh (1984b) have developed a Wigner optical processor, employing a reflective multipath scheme, that exhibits a parallel architecture. This system permits the simultaneous generation and display of the N slices of the distribution. Moreover, it requires only a single transmittance function, no moving parts are necessary, and the display of the slices into a sectional array is performed simultaneously. A scheme of the optical setup is shown in Fig. 15. The input transmittance is coherently illuminated, reflected by mirrors M I , M,, and M 3 , successively,and at lens L re-imaged onto itself. A bidirectional diffraction grating D (Born and Wolf, 1959) has been used to obtain different samples of the product function. Each order of this grating contains a replica
M1
FIG.15. Optical system utilizing reflective multipath scheme to generate slices of the WD. (Reprinted by permission from (Subotic and Saleh, 1984b.)
338
GABRIEL CRISTOBAL eta/.
of the original image; the replicas are propagated away from it and feed into the original image. The shift between different orders depends on the fringe spacing on the grating, and its distance from the input. When the replicas pass through the original input, the multiplication is performed, and some samples of the product function produced. The lens L, transforms each of this samples simultaneously, and therefore it generates the corresponding samples of the WD. The number of slices that can be obtained is determined by the number of orders produced by the grating. Since the generation of the WD from a single transparency with conventional methods causes considerable loss of input light power, other methods of generating the 4-D WD have been investigated. Conner and Li (1985) presented three different optical setups, whose purpose was to avoid the waste of power embodied in the use of only one transparency of the original input. In the first setup, the shifted and rotated image is formed onto the other transparency through two lenses, and a third lens performs the Fourier transform. The second setup uses only one lens. In this case, the two transparencies are placed at the focal length of the transformer lens with a small distance between them. The product function samples are obtained through a slight tilting about the geometrical center. The authors presented a third optical setup that is a generalization of that presented by Bartelt et al. (1980) for I-D signals. It is based on a continuous movement of the recording plate simultaneously with the sequential generation of the product function samples. This setup allows any two variables of the distribution to be fixed, and 2-D sectional pictures to be obtained depending on the other two variables. More recently, the group led by Conner has proposed a new method for generating the WD of complex signals and images that uses optical phase conjugation via nonlinear four-wave mixing (Li et al., 1988). Gupta and Asakura (1986) have also investigated the reduction of the loss of light produced in the double-pass Wigner optical generation. They have used a method that usually is applied to avoid this loss in the traditional interferometer (the double-pass Wigner processors can be considered as special-purpose interferometers). The proposed optical method allows the more efficient utilization of the input light power, through the optical polarization of the incident coherent beam. The architecture of the optical setup is very similar to that of Bamler’s Wigner double-pass optical processor. In this case, a collimated linearly polarized beam, from the He-Ne laser, illuminates the original input through a 1/2 plate (this kind of plate changes the polarization state of the light) and a polarizing beam splitter (PBS) cube. An appropriate polarization of the input beam allows that whole incident light energy illuminates the original input. That polarization is obtained by rotating the 4 2 plate preceding to the PBS cube.
IMAGE FILTERlNG/ANALYSIS THROUGH THE WIGNER DISTRIBUTION
339
C . Hybrid Implementations The ability of the optical systems to perform image processing is due, fundamentally, to their 2-D nature, since in this way, they can process 2-D information in a very short time. The main characteristics of the optical processors can be summarized as follows: parallel processing, high throughput, and powerful operations. However, the optical processing does not easily allow the algorithm’s programming to make logical decisions, and to implement control and analysis sequences. Fortunately, these last aspects can be done by using any digital processor. Because of this, despite what has been claimed by some, these methods of processing are not rivals, but complementary. In fact, the philosophy of the hybrid systems is to use the best aspects of each of these methods. In general, the transmittance of a transparency is transformed by an optical system, and this transformation, or some similar light distribution, is analyzed through a digital algorithm. Basically, hybrid systems consist of an optical system, an electronic system, and several optoelectronic interfaces that link the different components of the system and allow the input/output operations. The idea of designing hybrid processors was proposed by Huang and Kasnitz (1967);however, these systems did not receive much attention until the 1970s. The first hybrid systems were developed by Casasent (1974).These new processors may finally bridge the gap that has separated laboratory systems from practical ones. The implementation and application of the Wigner distribution function is very appropriate to the characteristics of hybrid processing. Since one of the main problems of this function is the amount of computer time required, most of the calculations can be performed optically, and the digital part is used to manipulate the information. The hybrid processors have a specific character, i.e., each problem requires a different architecture. In fact, different architectures should be used to implement the same algorithm applied to different problems; this is the case with the WD. Thus, in 1984 Easton et al. (1984) presented a hybrid opticalelectronic processor, which optically computed the product function, associated with the WD, through the Radon transformation; it made the Fourier transformation by an efficient 1 - D electronic processor. The Radon transformation can be considered as a set of I-D line-integral projections of the original image. In this sense, this hybrid system might be useful in distinguishing patterns with known texture direction. When a 2-D signal is represented by its Radon transform, the 2-D operations are reduced to a series of I-D operations on these projections. The required projection data were produced optically through a so-called flying line scanner. This system projects a line of
340
GABRIEL CRISTOBAL et a/.
FIG.16. Hybrid system to generate the WD of a real input t ( i ) . The line of light from the flying line scanner passes through the beam splitter onto the transparency centered at i, + ?/2. The transmitted light is refocused onto the transparency by the lens mirror system, but is now centered at i, + i ' / 2 . The output is reflected by the beam splitter onto the PMT. The PMT output is Fourier transformed by the SAW filter as before and yields one line through the WD of t ( i ) . (Reprinted by permission from Easton et al., 1984).
light onto the input transparency, and an image rotator, for example a dove prism, is used to select the azimuth of the line of light. Then the technique used by Bamler was adapted, as represented in Fig. 16, to produce the line integral projection of the product function. The light transmitted is collected by a photomultiplier tube, and then an acoustic chirp Fourier transformation produces one line of the WD. The advantage of this system over that of Bamler is that coherent illumination is not required, since Fourier transformation is not computed optically. Afterwards, in 1989, Cristobal et al. presented another Wigner hybrid processor, based on the optical setup developed by Bamler and Gliinder (1983b)and shown in Fig. 14. Also in this case, the output of the optical system was 2-D spatial samples of the WD associated with each point of the represented image. This local spectrum was then fed, via a TV camera, into a
IMAGE FILTERING/ANALYSIS THROUGH THE WIGNER DISTRIBUTION
-
Laser Spatial Filter
Beam SDlitter
(/J,Object
Li
341 Mirror
M
L*
q-Ha Digital Proerrring System
Lens
Monitor
FIG.17. Wigner optical digital processor used to obtain space-variant filtered images. L , Fourier transforms the shifted image f ( x + x,/2. y + yJ2). The mirror reflects its spectrum; L, yields f ( x - x,/2, y - y,/2) and performs the Fourier transform of the product function of the two images; and L, inverses it. (Reprintedby permission from Cristobal, Bescos, and Santamaria, 1989.)
digital image processor, where the visualization and digital processing of the spectrum was possible. This hybrid processor has been recently improved by Gonzalo et a/. (1990). The system has been modified in order to retrieve information of an image from the spatial samples of its Wigner distribution. With this purpose, a transformer Fourier lens has been included to invert the Wigner distribution function; therefore, the output light distribution is not a spectrum, but a product function sample. This sample is fed into the digital processor, where the software required to retrieve the original information has been implemented. The optical setup is shown in Figure 17. The retrieval of the information has been carried out in the same way as has already been described in the case of the digital implementation of the 4-D WD. However, in the optical case, problems of alignment and coherent noise hinder retrieval. In this sense, the knowledge of the physical location of the product function definition domain is of utmost importance to applying the inversion property correctly. Figures 18a and b show the retrieval information of a composite rectangular grating with three different frequencies (128 x 1 samples, 6x = 40 pm), of a computer-generated binary image (128 x 128 samples, 6x = 6y = 40 pm), and of a fragment of the painting Guernicu (Fig. 18c) by Picasso (128 x 128
FIG.18. Retrieval of the image information from the optical WD: (a) Rectangular composite grating. There are 128 samples in the horizontal direction (Sx = 40 pm). (b) Computer-generated 2-D test image (128 x 128 samples, Sx = by = 40 pm). (c) Fragment of the painting Guernica (128 x 128 samples,
d r = dy = 40 pm).
IMAGE FILTERlNGiANALYSIS THROUGH THE WIGNER DISTRIBUTION
343
samples Sx = 6 y = 40 pm). The similitude between the original and retrieved images is high; only a small loss of definition in the edges and a reduction in contrast are apparent. The differences are due to the degradation produced by the optical setup, and the fact that all samples of the image are multiplied by the central value sample of the original image. The hybrid processor developed by Gonzalo er al. has been used to perform space-variant filtering operations that are shown in next section.
D. VLSI Implementations Heretofore, most applications of the time-frequency analysis in which the WD is computed correspond to the case of 1-D signals. As a result, Wigner analysis of speech, radar, sonar, and biomedical signals has become a reality. Real-time implementations have recently been considered for computing the WD of discrete signals by the use of special-purpose processors. In all cases, the FFT plays a central role for the WD computation. Chester et al. (1983) proposed a hybrid system in which the entire preprocessing (data acquisition, buffering, time reversal, and windowing operations) was done by a conventional serial computer, the DEC LSI-11, whereas the FFT was computed in special-purpose hardware by using a dedicated microprocessor. The special symmetry properties of the discrete WD definition allow the computation of 2 WDs in one FFT cycle. Boashash and Black (1987) proposed FFT computation by the use of a cascade of many processors connected in parallel. This method is effective for real-time applications, but increases the system cost because of the large number of processors required. Sun et al. (1989b) have recently reported an alternative system by using a pipeline implementation for the WD computation. The concept of pipeline processing is to divide the computational task into several subtasks, in such a way that each subtask is processed by a simple stage. Chester et ul. (1989) have proposed more recently a fully programmable hybrid system capable of implementing discrete versions of Cohen’s class of functions, and in particular the WD. The implementation allows the capture of wideband signals, by filtering an analytic baseband signal and decimating by a programmable rate. Once the baseband signal is generated, the WD is computed by using vector processors under the control of a RISC control processor. In the case of images, the use of special-purpose processors for timefrequency analysis has mainly been considered for the computation of the Gabor scheme of representation. Einziger (1986) proposed the use of VLSI modules for the computation of the elementary and/or auxiliary functions inherent in the Gabor scheme. Cristbbal(l990)has recently proposed the use
344
GABRIEL CRISTOBAL et al.
of cellular neural networks for Gabor’s receptive field computation. A machine vision based in the Gabor scheme should be implemented in specialpurpose VLSI hardware in order to take advantage of the parallel architectures in computing the early vision processes, especially in the segmentation and feature extraction procedures. Hierarchical structures of computation retain the advantages of the local computations performed by the cellular automata, while simultaneously providing a capability for global computations. The pyramid structures may be considered as a combination of cellular arrays and tree structures (Tanimoto et al., 1987).In the case of image processing applications, the main advantages of the pyramid schemes derive from the multiresolution data representation. The pyramids have been considered as a general framework for implementing highly efficient structures of computation, of which image compression, texture analysis, and image motion are some illustrative examples of their use (Burt, 1984). The pyramid structures are well suited for implementation in VLSI circuits because one can be concentrated in a chip design that provides a “basic building array” of a reduced number of cells (i,e., 5 x 5 or 9 x 9 arrays). The whole architecture can be constructed by the interconnection of these basic structures.
IV. IMAGEFILTERING THROUGH THE WICNERDISTRIBUTION The filtering operations can be understood as a transformation, or an extraction of the information of an image. In this sense, these operations can be carried out in any domain where all image information is contained. More frequently, filtering operations are performed by selecting different spatial frequencies. With this purpose, the Fourier transform of the image is multiplied by a filter function, which can be defined in the spatial domain or, directly, in the frequency domain. In both cases, the filtering operation has an invariant character-that is, although these filters can be used to perform space-variant filtering, in a formal sense, they modify all image points in the same way. In practice, there are few systems that can be described in this form. Consequently, it is very interesting to have an image representation that allows us to carry out operations similar to those usually performed in the Fourier domain, but that can be generalized to the spatially variant case. This representation must yield local image information; thus, each point of the image could be modified in a different way. In analogy to Fourier filtering, the possibility of knowing a local spectrum associated with each point of the image is very useful, since the product of each spectrum by a different filter, and the following inversion, allow one to obtain spatially variant filtered
IMAGE FILTERING/ANALYSIS THROUGH THE WIGNER DISTRIBUTION
345
images. The use of the WD has already been justified in this kind of operations. Furthermore, it has been shown that the WD is a transformation that can be inverted, that is, the original image can be reconstructed from this distribution. This fact proves all information in the image is preserved, and, at first, filtered images can be retrieved from the filtered distribution. A . Wl3 Filtering for I-D and 2-0 Images
The use of joint representations for performing filtering operations was proposed by the end of the 1970s. Fargetton er ul. (1979) carried out different filtering operations of a time-frequency representation to isolate distinct patterns from this representation. They were interested in the processing of signals with high separability in the time-frequency domain. In brief, these authors worked with ULF signals of PC1 type. This kind of signal is usually first approximated by a gliding tone with numerous echoes. They proposed to isolate the echoes by filtering different time-frequency representations. Later, Eichmann and Dong (1982) explored various 2-D optical filtering schemes for I-D signals. These authors proposed different WDs for linear filtering operations of 1-D signals, such as convolution with a filter function, multiplication of the signal in the space domain, or weighting with a sliding window function. Also, they introduced a generalized spatial representation (GSF) that allows the simultaneous filtering of both the space and spatial frequency of a 1-D signals. Special cases of the GSF are the WD, the AF, various filtered WD and pseudo-WD functions, as well as different spectral representations, such as the spectrogram, the local spectra, or the local Doppler spectra. All these representations are obtained from the GSF through different filtering operations. The 2-D nature of optical signal processors allows one to display, as well as to manipulate, all the above generalized spacespatial frequency functions (GSF). The traditional filtering in the Fourier domain was suggested by Subotic and Saleh (1984a) to modify a joint time-frequency representation of a temporal signal, with a two-dimensional function that depends on the temporal and frequency variable. The modified representation is given by @,(&
4 = @ l o , o ) H ( t ,w),
(32)
where O(t,w ) is the joint representation of the original signal, and H(r, w ) is an arbitrary function (a “mask”). The nature of the resultant filtering process depends on the choice of the particular time-frequency representation. In the case of the WD, the application of its inversion property to the 2-D function Q2(t, w ) yields information related to the filtered signal. Nevertheless, Saleh and Subotic (1985) showed a solution might not exist, i.e., @ , ( t , ~ might ) not
346
GABRIEL CRISTOBAL et ul.
FIG. 19. (a) Input square signal used in Subotic’s optical processor. (b) Filters used to manipulate mixed time-frequency representation. (c)Filtered output CS representation. (d) Filtered output via WD representation. (Reprinted by permission from Subotic and Saleh, 1984a.)
be the joint representation of a temporal function. They did present an approximate solution when this situation happens (Saleh and Subotic, 1985). They performed the filtering operation defined by Eq. (32) optically, in the particular cases of the WD and the complex spectrogram (CS). The optical setup is based on that presented by Marks et al. (1977) (Fig. 10). Since in this case the filtered image is retrieved, the replication of that scheme is required. Some of the results obtained through this setup are shown in Fig. 19. The input test is a square wave signal; the filters used are shown in the middle part of the figure, and, in the lower part, the two filtered images are presented. These retrieved images have been obtained by applying the inverse property in the output plane. This method cannot be generalized to 2-D images. In spite of the disadvantages of this kind of filtering, the group led by Saleh has
IMAGE FlLTERINGiANALYSIS THROUGH THE WIGNER DISTRIBUTION
347
FIG.20. (a) Rectangular grating test;(b) recovered image from the discrete WD, (c)results of local digital filtering operations:low-pass filtering (halfleft) and filtering (half right).(Reprinted by permission from Cristobal, Bescbs, and Santamaria, 1989.)
continued working on the design of time-variant filters based on the different time-frequency representations (Asi and Saleh, 1990). Cristobal et al. (1989) applied the kind of filtering defined in Eq. (32) in the image processing field. They carried out some digital space-variant filtering of images with one-dimensional variation, and recovered the information of the filtered image through the local spectrum power. The results of the operations after coming back to the original domain are shown in Fig. 20. The test image is shown in Fig. 20a, and the local image power of the filtered image is displayed in Fig. 20b; a smoothing is obtained on the left-hand side, and an edge sharpening on the right-hand side, since the samples of the WD corresponding with the left half of the image were filtered with a low-pass filter, and the rest of the W D with a high-pass filter. More recently, Gonzalo et al. (1989) have developed the line of work initiated by Cristobal et al. They have performed, digitally and optically, space-variant filtering of images with one- and two-dimensional variation,
348
GABRIEL CRIST6BAL et al.
and the filtered images have been retrieved. In the first step of the work (Gonzalo et al., 1989), a digital implementation was used to obtain spacevariant filtered l-D images. Previously, the effect of a multiplication factor H ( n , m) on the discrete Wigner distribution function WD of digital images was analyzed. The expression equivalent to Eq. (32) in the case of digital images is
w,(n,m) = q ( n ,m)
H(n,m),
(33)
where g(.) and f(.) are the filtered and original images, respectively. The continuous time and temporal frequency variables, in Eq. (32), have been changed to discrete space and spatial frequency variables. Under the approach mentioned above, the inversion of the filtered function W,(.) yields
g(n + k ) g ( n - k ) =
2N- 1
1 q ( n , m) - H(n,m)ej2(2nm)kiN
m=O
(34)
by substituting the discrete WD's definition (see Eq. (21)) into Eq. (34). This yields 2N- 1 2N- 1
g(n
+ k)g(n - k) = 1 1 f(n + r ) f ( n - r)H(n,m)ej2'2"m)(k-'"N(35) m=O
r=O
And in the special, and interesting, case that H(n, m) is a separable filter,
where h,(n) is the Fourier transformation of H,(m). The addition of the last expression can be interpreted as the l-D convolution of the product function with a function related to h,(n) by a scale factor. Therefore, Eq. (37) can be expressed as
g(n + k ) g ( n - k ) = ( 2 / N ) H o ( n ) r J ( n , k0 ) u(k),
(38)
where rJ(n,k)= f ( n
+ k)fb
- k),
u(k) = h(2k).
Equation (38)shows that the filter is not linear in the spatial domain defined by the n variable. Table I1 shows a diagram of the relation between linear spaceinvariant filtering, carried out in the Fourier domain, and nonlinear spacevariant filtering,defined in the Wigner domain. The symbol Ufi means Fourier transformation, and G(m)and F(m)are the Fourier transformations of g(m) and f(m), respectively.
349
IMAGE FILTERING/ANALYSIS THROUGH THE WIGNER DISTRIBUTION TABLE 11 COMPARISON BETWEEN FILTERING OPERATIONS I N THE FOLJRIER AND WIGNER DOMAINS
The filtered image samples are retrieved by setting n = k and n in Eq. (38); then, y(2n)dO) = f ( n ) 0.f(n)h,(2n)H,(n), d2n
-
l)g(l) = f ( n ) 0 f ( n ) h , M n -
1wm.
=k
-1
(39a) (39b)
Spatially invariant filtering was carried out to compare the similarities and differences in the performance of the same filter playing on the Fourier and Wigner spaces. These filtering operations allow comparison of the WD filtering effects with respect to the well-known classic Fourier filtering and likewise, to obtain information about the performance of spatially variant operations in the Wigner domain. Figure 21 shows some experimental results of this comparison. The masks used to filter the WD of the composite rectangular grating (Fig. 22a) are shown in the first row of Fig. 21. Onedimensional sections of these masks were used to filter the Fourier transformation of the image. Each column of this figure shows, from top to bottom, the filter, the filtered image in the Fourier space, and the filtered image in the Wigner domain. In the first column, for the low-pass filter, and the second column, for the band-pass filter, the effects on the filtered image are stronger when the filtering is made in Fourier space than in Wigner space, but, as can be expected, for the high-pass filter, in the third column, the selection of the high frequencies is better in the Wigner domain than in the Fourier space. A composition of the filters presented in the last figure was used to transform the composite rectangular grating (Fig. 22) into a composite sinusoidal grating (Fig. 22b). Since the rectangular grating had different fundamental frequencies, the transformation could not be made by filtering the spectrum. It was necessary to filter the local spectra corresponding to each region of the image in a different way. Figure 22c shows the filter used, and the Fig. 22d displays the filtered WD where only the fundamental frequency of each region is presented. The use of the WD for simulating space-variant defocused images has also been considered. In this case, the defocusing masks were generated through defocused OTFs (Hopkins, 1955).This function (OTF)describes the frequency
350
GABRIEL CRISTOBAL et
al.
FIG.21. First row, left to right: low-pass, band-pass, and high-pass filters. Second row: images filtered in Fourier space with the filters of the first row. Last row: images filtered in the Wigner space. (Reprinted by permission from Gonzalo, Berriel-Valdos, and Santamaria, 1989.)
response of an optical system. It is the Fourier transformation of the point spread function (PSF) of the system, whose amplitude is calculated by Fourier transformation of the pupil function. Therefore, space-invariant defocused images can be modeled by multiplying the Fourier transformation by the original image by a defocused OTF, and inverse Fourier-transforming the product. The degree of the degradation is determined by the coefficient ( o ~ ~ ) that expresses the optical path difference introduced by the shift of focus at the edge of the aperture. In analogy to the space-invariant case, space-variant defocused images have been simulated, multiplying each local spectrum of the WD by a defocused O T F with a different defocusing coefficient. The removal
IMAGE FILTERING/ANALYSIS THROUGH THE WIGNER DISTRIBUTION
351
FIG.22. (a) Composite rectangular grating. (b) Composite sinusoidal grating obtained by filtering the WD of the rectangular grating of (a) with the space-variant filter of (c). (c) Spacevariant filter applied in the Wigner space. (d) WD of the object of (a) after filtering. (Reprinted by permission from Gonzalo, Bescos Berriel-Vald8s. and Santamaria, 1989.)
of defocusing was carried out through a method used extensively in the Fourier domain, the Wiener filter (Gonzalez 1987). Basically, this method consists of an inversion of the degradation, and it is derived by minimizing the difference between the original and the restored image. The generalization of this method to the Wigner domain yields a restoration mask whose column is the Wiener filter associated with each defocusing value. Figure 23 shows the results obtained for a particular example. The test image is a rectangular grating of 32 pixels/period. The defocusing and restoration masks used are shown in the lower part of this figure (c and d). The defocusing coefficient varies from zero, on the left-hand side, to 161, on the right-hand side (1is the wavelength). The image defocused with this filter is shown in Fig. 23a), and the restored image in Fig. 23b. In the last section, it was mentioned that Gonzalo et al. have also implemented digitally the WD of conventional images, that is, the 4-D WD,
352
GABRIEL CRISTOBAL et ar.
(C)
(4
FIG.23. (a) Space-variant defocused object obtained with the defocused OTFs of (c). (b) Restored object from the defocused image, by the mask shown in (d). (c) Space-variant defocusing filter generated by defocusing OTFs, displayed with the spatial frequency along the vertical axis. The w20coefficient varies from zero to 16.0 i,along the x axis. (d) Wiener restoration filter. (Reprinted by permission from Gonzalo, Bescos Berriel-Valdos, and Santamaria, 1989.)
and retrieved the image information in an accurate form. This allows spacevariant degraded images to be simulated and restored. This kind of operation is also described through the generalization of Eq. (33) to four variables, two spatial variables and two frequency variables. Figure 24 shows some of the results obtained from a particular test image. It is a fragment of the painting Guernica by Pablo Picasso, similar to that shown in previous section. Figure 24a shows the image retrieved from the spatial samples of its WD. Since the differences between this image and the original one are negligible, here only the retrieved image is displayed. The original image has been degraded by multiplying the spatial samples of its distribution, associated with different regions of the image, by different Gaussian filters. Four different regions have been considered; they are shown in Fig. 24b. The central part is a square of 32 x 32 pixels, and the other regions are concentric frames, whose
IMAGE FILTERING/ANALYSIS THROUGH THE WIGNER DISTRIBUTION
353
FIG.24. (a)Digital retrieved information of a fragment of the painting Guernicn. (b) Different regions space-variant-degraded. (c) Space-variant-degraded image. (d) Restored image.
dimensions are displayed in the pictures. The intensity of the degradation increases from the center to the edges. It can be appreciated in the degraded image (Fig. 24c) how the resolution of the details is higher in the center than in the external frame. This degradation has also been removed through filters based on Wiener filtering. A different restoration filter has been generated for each Gaussian filter employed in the simulation of the degraded image. After the application of these filters, the inversion property was used in order to obtain the restored image that is shown in Fig. 24d. The noise present in this last image could be due to the aliasing effects, and to the fact, already expressed, that this kind of operation is not an accurate filter.
3 54
GABRIEL CRIST6BAL
et al.
Different filtering results have been obtained by Gonzalo et al. (1990) through the hybrid optical-digital processor shown in Fig. 17. The images test-filtered are the same ones whose retrieved information was displayed in the previous section (Fig. 18).In the first case, a rectangular composite grating (1-D),the samples of the WD associated with the low frequencies in the image (right region) have been filtered with a high-pass filter, and the sharpening of the edges is quite evident; low-pass filters have been applied in the rest of the samples of the distribution. The filtered image is displayed in Fig. 25a. The 2-D computer-generated image (Fig. 18b) has been filtered with the purpose of selecting edges with different orientations in different regions of the image. Three regions have been considered to obtain the filtered image shown in the Fig. 25b: a rectangle of 50 (horizontal) x 45 (vertical) pixels in the upper left-hand corner (region I); another rectangle of 78 x 80 pixels in the upper right-hand corner (region 11), and the rest of the samples of the image (region 111).The 2-D spatial samples of the WD corresponding with region I have been processed to extract the horizontal edges of the images; in region 11, the vertical ones have been selected; and in region 111, all edges have been extracted. The fragment of Guernica (Fig. 18,) has been spatially variant defocused in three different regions of the image. The defocusing value is zero in the rectangle of the upper left-hand corner (43 x 43 pixels), and progressive defocusing has been produced in the other two regions.
B. WD Computation Using the Hartley Transform The Hartley transform (HT) has some interesting advantages for its application in image processing. Traditionally, it has been used to obtain the image spectrum with less cost in computing time. Berriel-Valdos et al. (1988) showed that this transformation not only allows faster computation of the WD, but also has some properties that are very appropriate to computing this distribution. The properties of the Hartley transformations (Hartley, 1942),and the discrete Hartley transformation (DHTj have been already reported (Bracewell, 1983). The DHT for a discrete real function f(nj is given by
where 271) cas ( 2 y n ) = cos ( -
+ sin
(2y) -
(41)
and it is called Hartley’s abbreviation. HJ(m) is a real function. The inverse
(a) (b) (C) FIG. 25. Space-variant filtering operations of the images displayed in Fig. 18. (a) The low-frequency region of the composite rectangular grating has been filtered with a high-pass filter, and the rest of the image with a low-pass filter.(b) Sharpening of the edges with different orientations. (c)Space-variant degradation.(Reprinted by permission from Gonzalo, Berriel-Valdos, and Artal. 1990.)
356
GABRIEL CRISTOBAL et al.
relation has the following expression:
From these definitions, it can be shown that the W D can be computed through the DHT of a function related to the product function. For notational simplicity, let us consider 1-D signals. Let rf(l,n) be the product function corresponding to the function f(n). Since this is a hermitian function, it can be expressed as rf(L4 = ef(L4 + .iof(l,n),
(43)
where ef(l,n) and q ( L , n) are the real-even part and imaginary-odd part of the product function. Then the WD of f(n) is given by N-l
By using the definition of the exponential function, the next expression is obtained :
Therefore,
where sfU, 4 = eJU,4 + of(l,4.
(47) From the Eq. (46), it is clear that the W D of f ( n ) can be expressed as the Hartley transform of the function sf(l,m)-that is,
q ( l , m ) = Kf(l,m).
(48)
The inverse relation is given by
The function sf(/,n) contains all spatial information of f ( n ) . The real-even and imaginary-odd parts of the product function are obtained from sf(l,n), through the expressions
IMAGE FILTERING/ANALYSIS THROUGH THE WIGNER DISTRIBUTION
357
Thus, the product function can be obtained and the original function, f(n), can be retrieved through the inversion property of the WD. This new implementation of the distribution was used to simulate defocused images and for restoration purposes. If h(n) is the irradiance defocusing point spread function, the defocused space-invariant incoherent image can be expressed through the next convolution: ( 5 1) g(n) = f ( n ) 0 h(n). The WDs corresponding to these functions are related by the expression (see Eq. (18)) W,(n, m) =
y(n,m) 0 Wh(%m)-
(52)
The DHT in the n variable of the previous equation yields
HT,(W,(n,m)) = H T , ( y ( n ,m ) 0
Wh(n,
m)),
(53)
and applying the results of Bracewell (1983) about convolution to the last equation, the next expression is obtained: HT,(W,(n, m)) = [Ef(m’, m)
+ of(m‘,m)]Eh(m’,m ) + [EJm’,
m) - Of(m’, m)l Oh(m’,4, (54)
where Ef(m’,m)and Of(m’, m) are the even and odd parts in m’ (frequency varim)),and &(m’, m) and O,(m’,m)have the same able) of the function HT,(Wf(n, meaning but for HK(Wh(m‘,m)).If h(n) meets any of the following properties: (i) real-even, (ii) imaginary-odd, (iii) hermitian, (iv) antihermitian, it is possible to decrease the total number of numerical operations (additions and multiplications) to perform the computation of the filtered image information. Since in these cases OJm, m’)is zero, Eq. (54) can be expressed as
-
HT,(W,(n,m)) = HT,(W/(n,m)) HT,(K(n,m)).
(55)
The product function of the defocused image is obtained from H7J W,(n, m)) by a double Hartley transform, g(l
+ n)g*(l
-
n) = HT,[HT,.[(E,(m’, m) + Of(m’, m))E,,(m’,m ) ] ] , (56)
and therefore the local values of the deblurring image can be obtained. A similar procedure allows the blurred image to be restored with an adequate filter. Since one of the most commonly accepted linear techniques for image restoration is Wiener filtering, it has also been considered here to remove the defocusing degradation. Therefore, if h(n) is the PSF defocusing, the
N
0
.
.
.oc30~30. 0
00 0 00
. . .
0
SAMPLES
(a)
SAMPLES
(C)
0
SAMPLES
(4 FIG.26. (a)Object test: two steps with three different levels of irradiance. (b) WD of (a) computed by a fast Hartley algorithm. (c) WD of a defocused PSF.(d) Convolution of the WDs shown in (b)and (c).(e) Defocused image recovered from (d). (f) Restored image by convolution of the defocused WD with a restoration. (Reprinted by permission from Berriel-Valdos, Gonzalo, and Bescbs, 1988.)
IMAGE FILTERING/ANALYSIS THROUGH THE WIGNER DISTRIBUTION
359
filter will be defined in the (m, m’) domain as (57)
where m is the frequency variable, m’ is the shifted spatial frequency parameter, and c1 is considered a constant value. The next step in the restoration process is to multiply Eq. (57) by Eq. (55): where f ’ ( n ) represents the restored version of f ( n ) . Again a double Hartley transform in m, and m’ variables, allows the local irradiance values of the restored image to be obtained. Figure 26 shows some numerical results calculated by using the DHT algorithm. The computation time was slightly superior to one-half of the time required when the same W D was evaluated using an FFT algorithm. The W D was used to generated a defocused image through Eq. (54), the irradiance defocusing PSF being symmetrical around the origin. Figure 26a shows a test image, with three different levels of irradiance, where the irradiance has been normalized to one. Its WD is shown in Fig. 26b as irradiance contour graphs. Figure 26c shows the W D associated with the irradiance point spread function due to the defocusing error for an amount equal to 3L. The convolution, in the variable n, of the WDs of Figs. 26b and 26c is given in Fig. 26d which represents the WD of the defocused image. The local irradiance of the defocused image, recovered from the defocused WD, is shown in Fig. 26e. Finally, the restoration of the defocused image using the filter of Eq. (57) is shown in Fig. 26f. V. IMAGEANALYSIS THROUGH
THE
WIGNERDISTRIBUTION
Texture is one of the attributes employed to characterize the surface of an object. However, there is not a unique definition of the notion of texture. Its definition has been formulated in terms of an enumeration of properties such as fine, coarse, etc. Conceptually, texture could be defined as the arrangement or spatial distribution of intensity variations in an image (Jernigan and D’Astous, 1984). The two major characteristics of a texture are its coarseness and its directionality. Since the spatial frequency domain representation contains explicit information about the spatial distribution of an image, one could expect to obtain useful textural features from the spatial frequency domain. However, the texture methods that embody spatial frequency information have met mediocre results (Sutton and Hall, 1972; Weszka et al., 1976), mainly because the Fourier transform is an intrinsically global transformation, i.e., each frequency component contains global information
360
GABRIEL CRISTOBAL et al.
about the whole image. Even reducing the window size to contain homogeneous texture subimages was not an advantage in comparison with the spatial methods based on the use of second-order probabilities (co-occurrence methods). Here we have considered the use of the space-frequency representations, namely the WD, in the case of image analysis applications. A recent review about the use of the joint representations, mainly in the areas of signal and speech processing, can be found in Cohen (1989).In Section VI, a summary of these applications is presented, considering also the use of these representations in the image and vision modeling research. Recent studies have suggested that the visual texture discrimination ability is achieved locally (Gagalowitz, 1981; Julesz and Bergen, 1983). The WD entails a rigorous framework in the use of local representations, in such a way it embodies the spectral local variations that can improve the texture discrimination and segmentation processes. In this way, we have considered the use of the WD for texture discrimination and classification, in the statistical approach. Texture features can be described by generalized filtering techniques through image transforms. Fourier spectral analysis has been applied for discrimination of terrain types (Lendaris and Stanley, 1977)and for detection and classification of lung diseases by comparing the normal and abnormal textural features (Kruger et al., 1974). In the present study, the WD-based method for texture classification has been applied to four Brodatz texture field examples (Brodatz, 1966). A complete archive of digitized images, including the Brodatz textures, can be found in Weber (1989). Figure 27a represents the four textures selected (clockwise from top left): sand, straw, raffia, and cotton canvas. The texture samples were digitized and converted into 256 x 256 picture arrays with 8 bit/ pixel from Brodatz (1966). One of the advantages in the use of the WD is that the window size selection is not required. However, in order to reduce the computational requirements, it is necessary to relax the definition and to define a specific window size. Different window sizes have been proposed in the literature to obtain a good statistical resolution (Ashjari and Pratt, 1980; Pratt, 1980). Here we have used N = 16 texture fields of 64 x 64 pixels each.
A. Feature Extraction The first order statistics of all textures have been normalized to uniform distributions using histogram equalization, so that differences in luminance and contrast are eliminated in the discrimination process. The histogram of an image represents the frequency of ocurrence of the gray levels along the image.
IMAGE FILTERING/ANALYSIS THROUGH THE WIGNER DISTRIBLJTION
361
By histogram equalization, we can obtain an output image in which the gray levels of the picture are more uniformly distributed than in the original image. Figure 27 shows an example of the resulting histogram equalization of the intensity-mapped image. The histogram-modeling techniques have been considered as a powerful approach to image-enhancement applications. An example of the utility of these techniques is the case of images with narrow histograms (low-contrast images). By histogram equalization one can increase the image contrast as a consequence of the histogram stretching (Gonzalez and Wintz, 1987; Jain, 1989). After performing histogram equalization on textures, a complete homogeneity of the data is reached, and any result from the computation is solely due to the intrinsic texture structure and not to undesired luminance biases. Figure 28a shows the four preprocessed Brodatz textures selected from Brodatz (1966). Figure 28b shows the corresponding Fourier spectra. The evaluation of the WD’s categorization capabilities requires the election of the most discriminant features. The method proposed here is based on the computation of the auto-WD at N = 16 different points of the texture separated by two pixels, and the corresponding selection of features from such distributions to obtain the texture feature vector. The auto-WD was sampled in I = 16 annular regions and H = 8 angular sectors, using a similar sampling scheme to the Fourier power spectrum methods (Stark, 1982). The features extracted from the WD were the following (Cristobal et al., 1989):
362 GABRIEL C R I S T ~ B A Let d.
N
m
FIG.27. (a) Original Girl (Lenna) histogram. (b) Equalized histogram corresponding to (a).
IMAGE FILTERlNG/ANALYSlS THROUGH THE WIGNER DISTRIBUTION
363
FIG.28. (a) Preprocessed Brodatz textures (clockwise from top left): sand, straw, raffia, and cotton canvas; (b) Fourier spectra associated with each texture of Fig. 5.2a. (Reprinted by permission from Cristobal, Bescos, and Santamaria, 1989.)
where W 2 ( 0 , 0 )is the mean value of the WD at the N selected points, and p. 6 are the polar coordinates in each local WD spectrum. The physical interpretation of the proposed features is as follows. The feature w , gives a measure of the mean spatial frequency content of the image. It will have low values for images with a limited grade of detail, increasing in the case of sharper images. The feature w 2 is a variance estimation of the feature w , and gives a measure of the contrast or the amount of the local luminance variations of the image. The features w 3 and w4 give a measure of the mean directionality and its variance, respectively. The feature w 5 is a measure of the homogeneity of the image. This measure is based on one of the properties of the Fourier transformation: The value at the origin represents the mean value of the original image (Bracewell, 1986). These features have been arranged as a 5-D pattern vector w that constitutes the input to the classification process (see Fig. 29). The same procedure has been applied for comparison purposes to the same data texture tests by the computation of the Fourier transformation directly to the windowed images. The results are illustrated in the next section. B. Classification
The evaluation of the statistical texture measures can be done by using direct methods through classification errors, or by indirect methods that involve classification errors estimation via the use of a figure of merit. In this study, the texture features proposed in Section V.A have been evaluated with other methods according to their Bhattacharyya distance (B-distance). The Bdistance is a scalar function of the probability densities of the features of two
PREPROCESSING
FEATURE
&
FLTERINC
EXTRACTION
-4
FIG.29. The scheme shows the different processes involved in the image analysis application through the WD, from the raw data .f'(x,y) to the cdte gorization process w t (0,.
IMAGE FILTERING/ANALYSIS THROUGH THE WIGNER DISTRIBUTION
365
classes, and is given by where x and p ( x I Si) represent a feature vector and the conditional density probability for class Si.In the case of bayesian classifiers, the B-distance is monotonically related to the Chernoff bound of the probability of classification error. This lower bound is given by (Fukunaga 1972) P, I [ P (S , ) P(S,)] 1'2e-B(sI*s2),
(65)
where P(S,) represents the a priori probability associated with the class i. For Gaussian probability densities, the B-distance between a pair of texture classes is given by
mi and Xi represent the feature mean vector and feature covariance matrix of the class i (Fukunaga, 1972). For equally likely texture field pairs, i.e., P(S,) = P(Sj), a B-distance of 4 or greater corresponds to a classification error of about 1% (see Table 111). The selection of the image windowing process was done as follows. Figure 30a shows 16 product images corresponding to the cotton canvas texture image at points (128,48), (128,50), . . . , ( 1 28,80), and Figure 31a shows the corresponding product images for the straw texture at these points. The associated Fourier transform at each point, i.e., its WD, is displayed in Fig. 30b and 31b respectively, which show the effects of the object periodicity and directionality. Similar results would be obtained in the case of the sand and raffia textures. TABLE 111 5-DISTANCE vs. THE ERRORPROBABILITY, ACCORDING TO EQ. (65) B(S,,Sj)
Error bound ( x 184
61.6 24.9 9.16 3.37 1.23
366
GABRIEL CRISTOBAL et al.
FIG.30. (a) Product images corresponding to cotton canvas texture at points (128,112), (128, l14), ...,( 128, 142) (origin at top left); (b) WD associated with these points (Reprinted by permission from Cristobal, Bescos, and Santamaria, 1989.)
FIG.31. Product images corresponding to straw texture at points (128, 112), (128, I14),...,( 128, 142) (origin at top left);(b) WD associated with these points. (Reprinted by permission from Cristobal, Bescos, and Santamaria, 1989.)
The texture feature extraction method here is based on the features previously proposed in Eqs. (59)-(63). The ability to discriminate texture pairs using the WD features was compared with the classic Fourier spectral energy method. In the Fourier method, the sampling schema used was the summation within five concentric rings centered in each 64 x 64 spectrum, giving the feature vector f = [fi,f2,f3,f4,fs]T. The results of the pairwise B-distance computation have been tabulated in Table IV. In the Fourier method, the sand-straw pair is the most similar, i.e., the error probability is the biggest, and the cotton-raffia is the least similar. In contrast, in the WD method, the sand-raffia pair is the most similar and
IMAGE FILTERING/ANALYSIS THROUGH THE WIGNER DISTRIBUTION
367
TABLE IV TEXTURAL 8-DISTANCF IN THE FOUKIER A N D WICNER FEATURE EXTRACTION METHODS Texture pair
Fourier method
Wigner method
Sand -straw Sand - cotton Sand-raffia Straw-raffia Straw-cotton Cot ton- raffia Mean pair
0.9 1.86 0.99 2.18 0.76 2.59 1.55
104.74 48.44 4.84 24.31 71.92 34.17 48.07
the sand-straw pair the least similar. From Table IV one can argue that the WD discrimination of natural textures (generally random) outperforms the Fourier methods. In contrast, the artificial textures (often periodic and deterministic) are well discriminated by the Fourier methods, as one can expect because of their intrinsic harmonic expansion. The B-distance has been computed using a conventional statistical package (IMSL, 1982). In addition to the pairwise linear discrimination, we have considered multiple discriminant analysis in order to obtain the 2-D subspace that maximizes the ratio of between-class scatter to within-class scatter (Fisher's discriminant) (Duda and Hart, 1973). If w is the direction that defines the linear discriminant plane, the Fisher ratio is given by J ( w ) = Im1 - ?%IZ o: +o:
'
where m , and ni, correspond to the mean projections along the discriminant direction defined by w, i.e., mi = mTw; m represents the mean vector associated to each class, and 0 : and 0 : the summed square difference between the projected classes and the mean class, i.e.,
Projections of the samples from the 5-D space to this plane give a scatter diagram of the texture classes, which is the discriminant representation a suitable class. Figure 32a shows the best scattered plane for the Fourier feature extraction method, and Fig. 32b and 32c show the corresponding scattered diagram for the co-occurrence method and WD method, respectively (the scales have been adjusted to aid the comparison). In these plots we can
Or
FOURIER
-4t
L
.SW o Strow
faolur. I
Calon- Convos 0 Rollio 0
(a)
'r
CO-OCCURRENCE
-6
"
-8
"
-6
"
-4
a '
-2
'
"
0
"
2
' 4
*
I
I
Faorur. I
Sand 0
Slror
0
caton. connos
0 Rollio
I
I 3
0.5
2 0
Sand S1row
0
Cotton- Convos
4
5 Fealure I
0 Raflia
(4 FIG. 32.
Two-dimensional scatter diagram, associated with (a) the Fourier spectral energy measures, (b) the co-occurrence method and (c)the WD method. (Reprinted by permission from Cristobal, Bescos, and Santamaria, 1989.)
IMAGE FILTERINGIANALYSIS THROUGH THE WIGNER DISTRIBUTION
369
observe a higher interclass discrimination in the WD method in comparison with the other methods used; in contrast, the clusters are less compact (worse intra-class discrimination) than the Fourier or co-occurrence methods. The co-occurrence methods are based on the estimation of the second-order joint conditional probability density functions p(i, j d, e), where each p represents the joint probability from gray level i and j , given the intersample distance d and orientation 8. The estimated values can be written in a matrix form that constitutes a second-order histogram. From this histogram several textural features can be extracted. Conners and Harlow (1980) have proposed the use of five descriptors extracted from the co-occurrence matricesenergy, inertia, entropy, correlation, and local homogeneity -originally proposed by Haralick (1979). For a discussion about the texture analysis methods see (Van Goo1 et al. (1985).*
I
C . Texture Discrimination In this section we consider the WD’s texture discrimination abilities. Image description requires segmenting it into regions that are homogeneously textured. One of the most common methodologies in image segmentation involves edge detection processes where a transition occurs from one uniform region to another. An alternative approach to segmentation is region growing, starting from small uniform regions and expanding the process until the uniformity is broken. The segmentation processes can be evaluated by the use of texture pairs. The significance of texture pairs comes from the fact that any texture analysis problem can be split into an equivalent texture pair (Davis and Mitchie, 1981). A simple l-D texture discrimination mechanism is proposed based on the use of the WD. The pixel categorization of the texture pixels was formulated by Davis and Mitchie (1981) in terms of edge pixel, near-edge pixel, and interior pixel. Here, the texture edge detection is based on the computation of differences between the WD in adjacent points along a selected direction and the WD mean. Figure 33a shows a texture pair (cotton canvas-raffia), and Fig. 33b shows from left to right and top to bottom the WD at the 16 points selected along the x-axis. The window size was 64 x 64 pixels in order to obtain a good statistical resolution. The evaluation of the edge detection was . is realized by introducing the scalar parameter e , for a given point ( i o , j o ) It
* A complete bibliography about vision is published annually by A. Rosenfeld in the journal Computer Vision. Graphics and Image Processing (Academic Press). An on-line version can be obtained via ftp at the host ads.com.
370
GABRIEL CRISTOBAL et al.
FIG.33. (a) Cotton canvas-raffia texture Brodatz pair; (b) (from left to right and top to bottom) WD associated with the points (128,112), (128, 114), ...,( 128,142). (Reprinted by permission from Cristobal, Bescos, and Santamaria, 1989.)
given by
where W is the W D mean associated with the N points selected,
Figure 34 shows the plots of ew vs. d (interpixel distance) for four texture pairs. A pronounced minimum can be observed in the four examples, denoting the presence of a textural border. This fact permits the conclusion that the W D method might be give a good discriminability in the case of edge pixels and near-edge pixels. Jau and Chin (1988) have proposed the use of the change in texture density (texture gradient) as an estimator of surface shape. The method is based on measuring the high-frequency WD local contribution at each location of the image. From these measurements, a map description can be used to estimate the surface orientation. The method was implemented in the case of planar surfaces. Recently, Reed and Wechsler (1990) have proposed the use of a relaxation method for the boundary edge detection in the case of synthetic textures and Brodatz textures. The region labeling was performed by a double process of averaging and squashing transformations. Afterwards, they have presented the segmentation results in the case of synthetic textures different in phase only (identical power spectra), showing that because the WD encodes the magnitude and phase information, the formulation of features to take into account the phase is not required. Finally, they have
IMAGE FILTERING/ANALYSIS THROUGH THE WIGNER DISTRIBUTION
37 1
Canvas
FIG.34. Results of the edge detection mechanism by using the WD. The plots represent the edge operator versus the interpixel distance along a line perpendicular to the texture border: (a) sand-straw pair; (b) straw-raffia pair; (c) cotton canvas-raffia pair; (d) sand-cotton canvas pair. The range of ew has been normalized. (Reprinted by permission from Cristobal, Bescos, and Santamaria, 1989.)
applied the relaxation mechanism to the gestalt (proximity) clustering. The results reported here on the use of the WD for image segmentation require further study and extension to a large image database. Related works on the use of space-frequency distributions for image segmentation purposes have been reported in the literature. Mostly of them are related to the use of the Gabor representation. Porat and Zeevi (1989) have proposed a texture feature extraction and segmentation based on the use of Gabor expansion coefficients. Perry and Lowe (1989) have proposed the use of an iterative edge segmentation mechanism by local Gabor filters. Malik and Perona (1989) have considered the use of radial and directional
372
GABRIEL CRIST6BAL et al.
gaussian derivative filters in the fitting of the receptive field data according to Young’s model (Young, 1985).They applied the model for texture boundary detection in the case of natural and artificial scenes, and evaluated the degree of discriminability for different textured pairs used in psychophysical experiments. On the other hand, a computational approach for analyzing textures based on multiresolution schemes of representation (also known as multichannel or pyramidal schemes) through the use of space-frequency distributions has recently been considered. Bovik et al. (1990) have proposed a multichannel scheme of representation, based on Gabor filters, by encoding the images using multiple narrow band-pass and orientation channels. By comparing the magnitude of the response of different channels, textural border information can be extracted. The scheme was applied in the case of natural and synthetic textures for segmentation purposes. Tan and Constantinides (1990) have proposed a similar multichannel system based on the use of Gabor filters for texture segmentation for natural and artificial textures, reporting very satisfactory segmentation results. The importance of the local phase in feature detection and texture segmentation has been considered recently in different works. It is well known that the symmetry of edges and lines is reflected in the phase spectrum. Concetta and Burr (1988) have proposed a biological plausible model of feature detection based on the use of the local phase through filters in quadrature, and they report some remarkable experiments on the prediction of the position of perceived features. Zeevi and Porat (1988) and Behar et al. (1988) have reported the use of the localized phase by using Gabor filtering for image reconstruction. Finally, Bovik et al. (1990), in the work already discussed, have considered the use of the local phase for image segmentation. The use of the localized phase is especially useful for detecting boundaries between textures having identical amplitude spectra. A pair of mirrored images constitutes a good example of this situation. In these cases the amplitude-based methods fails to discriminate between the different regions with the same amplitude spectra.
VI. APPLICATIONS OF THE SPACE (TIME)-FREQUENCY REPRESENTATIONS A. Review of Applications
In this section we will consider the different areas of research in which the space (time)-frequency representations have been applied. Obviously, the use of such representations is quite appropriate for every field where nonstation-
IMAGE FILTERING/ANALYSIS THROUGH THE WIGNER DISTRIBUTION
373
ary signals appear. As Cohen has pointed out in a recent review (Cohen, 1989), the applications can be broadly categorized according to two basic ideas: the extraction of relevant information from the distribution, or the use of a particular property that clearly represents the space (time)-frequency contents. Table V presents an updated survey of the application areas in which the use of these representations has been considered. The classification has been done according to the main topics of interest in which the distributions have been applied: signal, speech, and vision. The vision and speech applications are considered separately from the rest of the applications because of their importance in the study of perceptual signals. Here, the vision applications have been emphasized, and the next section is concerned with the importance of these distributions in the modeling of the visual early processes. In relation to the signal processing applications (see Table V),perhaps the two main domains considered up to now would be seismic signal exploration and biological signal processing. Boashash was the first to use features extracted from the Wigner distribution in the computation of the dispersion and attenuation of seismic signals (Bouachache, 1978). In the case of biological signal processing, these distributions have been applied in the analysis and design of ultrasonic transducers (Marinovic and Smith, 1986) for medical imaging purposes. The time-frequency representations constitute an excellent aid to transducer design, by means of which the desired response can be modified and visualized, One of the main difficulties in the analysis of speech signals derives from their intrinsically nonstationary character, i.e., the signal’s frequency content varies with time. This fact justifies the considerable interest in the use of timefrequency representations, especially in the earlier stages of speech processing. From the pioneering work of Gabor, on the use of joint representations in the analysis of hearing (Gabor, 1946), different speech applications have been considered in the literature. Table V summarizes the applications reported in the area of speech applications. Recently, interest in the use of space-frequency representations has been extended to vision research through the use of the Wigner distribution (Jacobson and Wechsler, 1982a, 1982b, 1984;Cristobal et al., 1986,1987,1989; Gonzalo et al., 1989, 1990; Zhu et al., 1990b). The WD gives an image joint representation suitable for the study of the nonstationarities such as spatialvariant degradations: progressive blurring, motion, texture, etc. Therefore, the WD entails a powerful framework for image analysis because of its intrinsic local nature (see Table V). In a recent study (1988), Jacobson and Wechsler have concluded that the resolution (uncertainty) attained by the WD cannot be improved by some other different joint representation derived from Cohen’s class, as a consequence of the smoothing effect derived from the kernel’s selection. However, a considerable amount of work has been reported
TABLE V
SURVEY OF TIME (SPACE)-FREQUENCY APPLICATIONS Area 1: Signal Processing Subarea
Comments
Seismic exploration
Absorption and dispersion measurements by the WD
Pattern recognition
Classification of linear FM signals by the WD Extraction of optimization criteria by the WD Analysis of temperature gradient records by WD
Loudspeaker design Turbulence microstructure Ultrasonic transducers Machine noise Muscle sounds Temporomandibular sounds Radar imaging ECG analysis
Sonar
Signaturing, detection and identification by WD Signal analysis by using the Choi- Williams distribution WD-based noninvasive diagnosis technique Detection of P-waves by the WD Body surface potential mapping series of ECGs Use of cone-kernel for range-doppler estimation by W D
Sources (Boachache, 1978), (Bazelaire and Viallix, 1987) (Kumar and Carroll, 1983) (Janse and Kaizer, 1983) (Imberger and Boashash, 1986) (Marinovic and Smith, 1986) (Boashash and OShea, 1988) (Choi and Williams, 1989) (Zheng et al., 1989) (Boashash, 1990) (Abeysekera and Boashash, 1989) (Usui and Araki, 1990) (Atlas et a/., 1990)
Area 11: Speech Processing Subarea Speech discrimination
Comments Analysis of hearing by joint representations
Study of combined representations Speech recognition
Survey bilinear representations Speech and musical analysis Formants structure
Sources (Gabor, 1946) (Preis, 1982)
Time-delay representations Time-varying filtering by short-time Fourier transform Analysis of hearing by joint representations Formant’s visualization by wavelet transforms Speech formant segmentation by the WD Analysis of consonant - vowel transitions Cone-kernel definition to reduce cross-terms by the WD
(Chester et al., 1983) (Waibel, 1989) (Asi and Saleh, 1990) (Szu, 1982) (Kronland-Martinet et a!., 1987) (Riley, 1987, 1989) (Velez, 1989) (Atlas et al., 1990) (Zhao et al., 1990)
IMAGE FILTERING/ANALYSIS THROUGH THE WIGNER DISTRIBUTION
375
TABLE V (Conr.) Area 111: Vision Processing Subarea Conformal mapping Image analysis
Speckle imaging Optical flow
Image restoration Texture analysis and clustering
Shape from texture Stereo Motion Image compression
Face recognition
Comments
Sources
Invariant pattern recognition through the WD Texture classification and discrimination through the WD Analysis of local spectra by using analytic signals Discrimination using Gabor filters Spatiotemporal velocity analysis through the WD Spatiotemporal motion -energy Gabor filters Spatial-variant filtering using the WD Segmentation through the WD
(Jacobson and Wechsler, 1982a,b, 1984)
Feature extraction using Gabor filtering Segmentation by Gabor and wavelet filtering Surface orientation by texture gradient using WD Local disparity information by Gabor filtering Spatiotemporal energy filtering using Gabor filters Gabor filtering encoding by neural network relaxation Multiresolution techniques by wavelet filtering Graph matching by Gabor filtering
(Cristobal er ul., 1986, 1987, 1989) (Zhu et a/., 1990a,b) (Paler and Bowler, 1986) (Jacobson and Wechsler, 1987) (Heeger, 1987) (Berriel-Valdoset al., 1988), (Gonzalo et a/., 1989, 1990a,b) (Reed and Wechsler, 1988, 1990) (Porat and Zeevi, 1989) (Perry and Lowe, 1989) (Jau and Chin, 1988) (Jenkin, 1988).(Sanger, 1988) (Adelson and Bergen, 1985) (Daugman, 1988) (Mallat, 1989a) (Buhmann et al., 1989)
on the application of the Gabor transform in image applications; some of this has been summarized in Table V. Perhaps the principal reason for this interest is the fact that the Gabor scheme of representation is biologically plausible, and hence is able to serve as a suitable vision model at the retinal ganglion cells, the lateral geniculate nucleus, and the primary cortex (De Valois and De Valois, 1988); Van Essen and Anderson, 1990). The same argument can be applied in the case of speech signals, in which some neurophysiological (Kay and Matthews, 1972)and psychophysical (M$ller, 1978)studies have demonstrated that a large population of the auditory cells in the mammalian cochlear
376
GABRIEL CRIST6BAL et a/.
nucleus do not respond optimally to continuous tones, but instead respond to different preferred modulation slopes (directional selectivity). By the time this review appears, a monograph on recent time-frequency applications will have been edited by Boashash (1991). B. Trends toward Biological Image Modeling
One of the trends in vision research is concerned with learning as much as possible from biological visual systems, for implementation in artificial systems. From the pioneering experimental work of Hubel and Wiesel(l962) on neurons in the visual cortex, different biologically based models have been considered in the literature (Marcelja, 1980; Daugman, 1980; Young, 1985). In most of them, the main interest is centered on the specification of the basic primitives of early vision that can be inferred from experimental biological recordings. The receptive field profiles in the retina and the visual cortex can be approximated by Gabor functions. However, it is necessary to remark which alternate mathematical descriptions of receptive fields are possible, with equal or even better results, as far as the physiological data modeling is concerned. The Gabor approach is related to the representation of timevarying signals in terms of elementary functions (“logons”) that are simultaneously localized in time and frequency. Following a quantum mechanics formalism, Gabor proved, by the use of the Schwarz inequality, the family of functions that achieve the lower bound of uncertainty in the joint timefrequency domains. In the frequency domain, the Gabor functions constitute a family of band-pass filters that capture the most salient properties of spatial frequency and orientation selectivity. The compactness of Gabor functions in the frequency domain also implies that Gabor’s original scheme should be nearly locally complete (i.e., close to encoding all the input information with negligible aliasing) (Geisler and Hamilton, 1986). The Gabor scheme of representation is defined by a filtering process based on the use of complex-valued weighting functions given by
h(x, Y ) = f(x, Y ) +
Y ) = f ( x , Y)
- j&x3
Y),
(71)
where f(.),g(.) are real-valued functions, and E(x, y) represents the Hilbert transform off(-). In signal analysis and optics, the complex function h(-) is known as the analytic signal associated with f(-). The Hilbert transform g(-) is referred as the quadrature function off(-). This scheme of complex filtering can be implemented in a real visual system by a pair of receptive fields in quadrature. The family of Gabor filters is given by gaussians tapered by
IMAGE FILTERING/ANALYSIS THROUGH THE WlGNER DISTRIBUTION
sinusoids: =exp{-n[(y)*
+
(-1->'11
g(x,y) = e x p { - n [ ( y > '
+
(?>'I}
f(X,Y)
377
Y - Yo
where a, and a,,represent the gaussian spreads at l/e along the x-axis and yaxis, respectively. The parameters wo and vo represent the tapering sinusoidal modulation envelope, that defines the frequency magnitude: fo = ,/according to the direction 0, = tg-'(uo/uo). If a circularly symmetric condition is imposed (a = a, = o,,), the Gabor filter is uniquely defined by three parameters o,f o , and 0,. The Fourier transform of the complex filter X ( u , u), defined by Eq. (71), is
for a given u, where 9 ( u , u) represents the Fourier transform associated with f ( x ,y). From Eq. (74),it is obvious that for positive spatial frequencies &(.) is identical to 9(-) and for negative frequencies is equal to 0. This means that the complex filter h(.) transmits the same information as its real part f ( - ) . Similar relations could be obtained in Eq. (74) by using the function sgn(u). Iff(-) is a Gabor filter (gaussian tapered by a sinusoid), then g ( - ) is given by the quadrature filter (gaussian modulated by a co-sinusoid). Figure 35 shows a pair of Gabor filters in quadrature in the case of 1-D signals. Figure 36 represents in a 3-D space a parametric plot of the analytic filter associated with the same quadrature filters defined in Fig. 35. Once the parameters of the filtering process have been defined, the extraction of the local energy and local phase can be done by
a a 0
t
d
I I
In hl
1 I
C
c
Ln
0
I I
0
rT)
r-
0
>
a, 3
w
I ri
0 W
0
tn
I
I
Ln N
1 I
I
k
I
Lo N
0 rl
I
1
0 I
Lo
0
1
0
In
r-
0
1 I
Ln
I
0
I I
tn P
I
0
i I
d
T2 m N
I
6
03 .
a
Y
m BN
u
Q,
d
0
5
L
-0 0
0
d3
380
GABRIEL CRISTOBAL et 01.
FIG.36. Analytical signal associated with the Gabor functions represented in Fig. 35a, b. This helicoidal pattern resembles the momentum states’ description in quantum mechanics, giving an harmonic description in terms of pure tones corresponding to different moment values that a particle might have. The projections of this helicoidal pattern onto the x y and yz planes correspond to the even and odd Gabor functions respectively.
where i(*) represents the input image, and f(.) and g(.) are the filters in quadrature. Equation (77) gives the amplitude of the analytical signal and provides information about the local energy that is independent of the phase. This operation embodies a half-wave rectification mechanism. The rectification process has a biological foundation in the fact that neurons can give only a nonnegative response. A mechanism that computes the square outputs of a quadrature pair of filters is known as an energy mechanism. The importance of the remaining information encoded in the phase (Eq. (78) has been pointed out by Zeevi and Porat (1988). In the cited work, they demonstrated that the local phase mechanism preserved most of the edge information content of an image (in a similar manner to the Fourier phase analysis).
IMAGE FILTERING/ANALYSIS THROUGH THE WIGNER DISTRIBUTION
38 1
In this section, we summarize some experimental results in order to have a quantitative measure of the degree of biological plausibility.In the physiological experiments, recordings are generally reported from cat and monkey retinal ganglion cells, lateral geniculate cells, and cortical cells (area V1) because of the high degree of similarity with the human visual system. As already has been mentioned, Hubel and Wiesel(19621,studying the cat’s visual cortex, reported that most cortical cells have orientation and frequency selectivity. From a qualitative approach, they described the receptive field response of the cortical simple cells as a composition of excitatory and inhibitory response. De Valois and De Valois recently reported that the bandwidth of macaque cortical cells ranges from 0.5 to 8 octaves, the median spatial frequency bandwidth being about 1.4 octaves (De Valois and De Valois, 1988). Quantitative measures of the receptive fields have been recently obtained by Movshon et al. (1978),Webster and de Valois, (1985), Field (1986), Jones and Palmer (1987), Hawken and Parker (1987), Emerson et al. (1987), and de Valois and De Valois (1988). Figure 37 shows an example of the SPACE DOMAIN
30
FIG.37. (a) Receptive field profilesin space (top)and frequency (bottom)domains of a fairly narrowly tuned cat simple cell. (b) Cross-section corresponding to the receptive field profiles shown in (a) (solid lines), and the best-fitting Gabor function (dashed lines). (Reprinted by permission from Webster and de Valois, 1985).
382
GABRIEL CRISTOBAL ef al.
responses of cells to gratings of a wide range of spatial frequencies and orientations. It can be seen that the receptive field response closely approximates a Gabor function in both coordinates, x and y. Elsewhere, Field and Tolhurst, (1986) and Jones and Palmer (1987) have reported different statistical measures in order to validate the quality of fitting of the Gabor approach. Next, we give a short historical perspective on receptive field’s modeling. Mach in 1868 was the first to suggest that retinal interactions can be described in terms or second differential operators (laplacian operators) (Ratliff, 1965). Kovasznay and Joseph (1953) were the first to apply the laplacian operator to image processing. Marr and Hildred (1980) have proposed the use of the laplacian of a gaussian for early visual edge detection, showing that the simple difference of gaussians (Mexican hat filters) can approximate the cat ganglion cell receptive fields. Marcelja (1980) and Daugman (1980) have proposed the use of Gabor filters for 1-D and 2-D signals, respectively. One alternative to the use of Gabor functions is the use of directional gaussian derivatives proposed by Young (1985). The receptive field description is basically the same in both models, the main difference being the location of the zero-crossings. The similarities between both models are not surprising because in the limit, the two theories become the same. More recently, Canny has proposed an edge detection method based on the use of directional derivative gaussian functions (Canny, 1986). Another important characteristic is related to the receptive field symmetry. Hubel and Wiesel also reported the presence of even-symmetric and oddsymmetric cells, responding optimally in phase quadrature (Hubel and Wiesel, 1962). Pollen and Ronner (1981) have obtained some recordings in the cat striate cortex from two adjacent simple cells; they found one member of the pair to be even-symmetric and the other to be odd-symmetric. More recently, Field and Tolhurst (1986) have found that pairs of adjacent cells differ by 4 2 but appear in a variety of different forms (not necessarily in even- and oddsymmetric categories). A more detailed psychophysical study concerning the importance of the phase can be found in Concetta and Burr (1988). The experimental work reported here leads to the conclusion that the Gabor scheme or the gaussian derivative model do not necessarily provide the best possible fit to all the recordings registered. In fact, some other mathematical functions can be tested. Young tested many other mathematical functions (Bessel, sinc, parabolic cylinder, etc.), and he found that the gaussian derivative as well as the Gabor functions provided the best fits to the recordings registered (Young, 1985). However, one can say the main advantages of the Gabor/gaussian derivative models come from their effectiveness in providing a good fitting of the receptive field shapes with a limited number of free parameters (three in the case of the Gabor models).
IMAGE FILTERING/ANALYSIS THROUGH THE WIGNER DISTRIBUTION
383
Mallat has recently pointed out that the use of the Gabor transform in computer vision presents several drawbacks when applied to image analysis (Mallat, 1989a,b).The main difficulty comes from the constant resolution in the spatial and spatial-frequency domains. This fixed resolution introduces some trouble, especially if the image has important features of very different sizes. In other words, it is difficult to analyze simultaneously both the fine and coarse structure. In order to overcome these inconveniences, Grossmann and Morlet defined a decomposition scheme based on expansions in terms of translations and dilations of a unique function named “wavelet” $(x) (Grossmann and Morlet, 1984). The wavelet transform of a function f ( - ) is given by
where $(.) is the basic wavelet. The parameters a and b can be chosen to vary continuously or discretely. The wavelet transform and the Gabor transform have many features in common. Both transforms analyze the frequency content of a signal locally in space. But the wavelet transform provides different resolutions for high- vs. low-frequency wavelets. That is, the basic functions I,$(*) have variable width and they are adapted to their frequency range: the higher the range, the more narrow they are (Daubechies, 1989).The different wavelet functions can be generated from a basic one through the following expression: $,,,(x) = a;m/2tj(a;mx - nb,), where a, and b, are constants, and m and n define the position and size of the new function. Some particular examples of wavelets have been obtained through the previous equation (Fig. 38). By using a multiresolution representation, Mallat has applied the wavelet transform to image compression, texture discrimination, and fractal analysis (Mallat, 1989a).This kind of representation is especially well suited for evaluating the self-similarityof a signal and its fractal properties (West, 1990). However, one of the main drawbacks of this approach comes from the fact that it is not invariant through translations, and therefore the interpretation in the case of pattern recognition applications might be more difficult. The wavelet transform is an example of coherent state decompositions used in quantum mechanics and renormalization group theory. The basic idea is to decompose a function into building blocks of constant shape but different size (Daubechies, 1989, 1990) An interesting recent approach to receptive field modeling proposed by Poggio and Girosi (1989)uses gaussian radial basis functions. The radial basis function (RBF) method is well known in statistics as a possible solution to the real multivariate interpolation problem. By using a factorizable radial basis functions schema (in the case of gaussian functions), receptive fields can be
0.1--
0.05-
-0.05--
-0.1-
0
I
I r)
0
0
--m
0
--CJ
--o
fL
d
I
I
0
m
0
0 I
I I L n
0
0 --d I
0
I
--cJ
I
0
-- m
4 r)
I
0
rzd
IMAGE FILTERING/ANALYSIS THROUGH THE WIGNER DISTRIBUTION
387
readily implemented. The RBF method is closely related to such pattern recognition methods as Parzen windows and potential functions (Duda and Hart, 1973) and several neural network algorithms. In some sense, the use of RBF in neural network research has come to change the classical perspective of computation, performing the computations by gaussian RBF instead of threshold functions.
VII. CONCLUSIONS
The Wigner distribution function (WD)is always real, directly encodes the phase information of the Fourier transformation, has a high resolution in both space and spatial-frequency domains, and is invariant within linear transformations. These are only some of the W D s characteristics that have motivated the use of this distribution in some areas of the image processing field, as in image filtering and analysis. Moreover, the WD embodies a simultaneous space-spatial frequency representation very suitable for encoding the main low-level image characteristics, including the spectral local variation. The application of the WD in several image-processing tasks has been considered, specifically for filtering and analysis purposes. The previous step to obtain different results in these areas is the generation of the distribution. In this work, we have tried to present an extended vision of the different strategies for generating the WD, depending of the requirements of a particular problem. Thus, the optical Wigner processors yield a useful tool for processing where a decrease in computer time is the more important aspect. In other cases, it may be more interesting to work without spurious noise; then, digital Wigner implementations can be a good solution. In most situations, a trade-off solution must be found between these aspects, and the hybrid Wigner processor is the best solution. More recently, VLSI special-purpose processors have been proposed for generating the WD and other joint representations. Also, the reduction in cross-terms introduced by the bilinear nature of the definition has been considered, taking into account the recent results reported about this issue. Some of the Wigner implementations allow different image-filtering operations to be carried out. The interpretation of the WD as a local spectrum associated with each image point suggested the performance of space-variant filtering inspired by the traditional Fourier filtering. Therefore, each spectrum should be multiplied by a diffetent filter function in order to retrieve spacevariant filtered images. In particular, this operation can be used to model space-variant degradations and to restore them. This kind of filtering can be carried out optically or digitally.
388
GABRIEL CRIST6BAL et al.
The application of the WD for texture classification and discrimination has been considered in particular by using pairwise and multiple discriminant analysis. Several textural features have been extracted from the local spectra generated by the WD in the case of Brodatz texture. The results have been compared with the canonical Fourier spectral methods. On the other hand, the W D s texture discrimination capabilities have also been evaluated by using several pairwise texture edge detection tests. A review of the different areas of application of space (time)-frequency representations has been presented, emphasizing in particular the visionoriented and detailing the specific areas in which it has been considered. The importance of the use of these distributions in the modeling of the early visual processes has been noted in the context of physiological and psychophysical experiments reported in the literature. Although there have been many contributions considering both the theoretical and applicability issues, further research is necessary for a better knowledge of the space (time)-frequency distributions. As Cohen has recently summarized, some problems still remain, such as consistency-i.e., to be useful in a broad range of different situations, defining the “best” distribution and the use of nonbilinear functionals (Cohen, 1989). However, the use of these distributions constitutes an excellent tool for the analysis and modeling of neural systems, especially in the case of vision and speech applications.
ACKNOWLEDGMENTS We would like to thank to Profs. L. Cohen and W. Williams for agreeing to comment on the original manuscript and for permitting reproduction of some material related to their respective research. The first author wishes to acknowledge Prof. J. Feldman for providing the ICSI’s facilities to support his work. Research at the ICSI of G.C. is supported by a fellowship from the Spanish Ministry of Education and Science. We thank all the staff members at the Instituto de Optica del CSIC (Madrid, Spain) who have contributed to the realization of this work, and especially Ana Plaza. This collaboration would have been impossible without Internet/CSnet, although their daemons are sometimes frisky and unforeseeable.
REFERENCES Abeysekera, R. M. S. S., and Boashash, B. (1989). “Time-frequency domain features of ECG signals: their application in P wave detection using the cross Wiper-Ville distribution,” IEEE Int. Conf. on Acoust. Speech and Signal Proe., Glasgow, Scotland, pp. 1524-1521. Adelson, E. H., and Bergen, J. R. (1985). “Spatiotemporal energy models for the perception of motion,” J. Opt. SOC.Am. 2,2, 284-299.
IMAGE FILTERING/ANALYSIS THROUGH THE WIGNER DISTRIBUTION
389
Ashjari, B., and Pratt, W. K. (1980). “Supervised classification with singular value decomposition texture measurement,” USC-IPI TR No. 860, Image Processing Institute, University of Southern California, pp. 52-62. Asi, M. K., and Saleh, B. E. A. (1990). “Time-scale modification of speech based on the short-time Fourier transform,” l E E E Trans. on Acoust., Speech and Signal Proc. (in press). Athale, R. A.. Lee, J. N., Robinson, E. L., and Szu, H. H. (1983). “Acousto-optic processors for real-time generation of time-frequency representation,” Opt. Lert. 8, 166- 168. Atlas, L. E., Loughlin, P. J., Pitton, J. W., and Fox, W. L. J. (1990).“Applications of cone-shaped kernel time-frequency representations to speech and sonar signals,” Int. Symposium on Signal Processing and its Applications, Gold Coast , Queensland, Australia (in press). Bajcsy, R., and Liebermann, L. (1967). “Texture gradient as a depth cue,” Comput. Graphic: and Image Proc. 5,52-67. Bamler, R., and Gliinder, H. (1983a). “Coherent-optical generation of the Wigner distribution function of real-valued 2 D signals,” l E E E Proc. 10th. Int. Optical Computing Conf.,pp. 117121. Bamler, R., and Gliinder, H. (l983b). “The Wigner distribution function of two-dimensional signals. Coherent-optical generation and display,” Optica Acta 30(12), 1789- 1803. Bartelt, H. O.,Brenner, K. H., and Lohmann, A. W. (1980).“The Wigner Distribution function and its optical production,” Opt. Comm. 32(1), 32-38. Bastiaans, M. J. (1978). “The Wigner distribution function applied to optical signals and systems,” Opt. Comm. 25,26-30. Bastiaans, M. J. (1980).“Wigner distribution function and its application to first-order optics,” J. Opt. SOC. Amer. A 69,1710-1716. Bastiaans, M. J., (1981a). “The Wigner distribution function of partially coherent light,” Optica Acta 28, 1215-1224. Bastiaans, M. J. (1981b). “Signal description by means of a local frequency spectrum,” Proc. SPIE 373, Transformations in Optical Signal Processing, pp. 49-62. Bazelaire, E., and Viallix, J. R. (1987). “Theory of seismic noise,” Proc. 49th. Eur. Ass. Explor. Geophys. M t y (Belgrade, Yugoslavia, 1987), 1-2. Behar, J., Porat, M., and Zeevi, Y. Y. (1988). “The importance of localized phase in vision and image representation,” SPIE 1001, Visual Communications and Image Processing, pp. 61-68. Berriel-Valdos, L. R., Gonzalo, C., and Bescos, J. (1988).“Computation of the Wigner distribution function by the Hartley transform. Application to image restoration,” Opt. Comm. 68(5), 339-344. Bescos, J., and Strand, T. C. (1978). “Optical pseudocolor encoding of spatial frequency information,” Applied Optics 17, 2524-253 1. Boashash, B. (1984). “High resolution signal analysis in the time-frequency domain,” I E E E lnt. Conf. on Computers. Systems and Signal Processing, Bangalore, India, pp. 345-348. Boashash, B. (1990a). “Time-frequency signal analysis,” in Advances in Spectral Analysis, S . Haykin (ed.), Prentice Hall, Englewood Cliffs, New Jersey. Boashash, B. (ed.) (1991). “Time frequency methods and applications,” Longman Cheshire, Melbourne, Australia. Boashash, B., and Escudie, 9. (1985). “Wigner-Ville analysis of asymptotic signals and applications,” Signal Processing 8, 315-327. Boashash, B., and Black, P. J. (1987).“An efficient real-time implementation of the Wigner-Ville distribution,” l E E E Trans. on Acoust.. Speech and Signal Processing 35(1 I), 161 1-1618. Boashash, B., and OShea, P. (1988). “Application of the Wigner-Ville distribution to the identification of machine noise,” SPIE Conference, San Diego, California, Vol. 975, pp. 209220. Bouachache, B. (1978).“Representation temps-frequence,” SOC.Nat. ELF Aquitaine, Pau, France, Publ. Recherches. 373-378.
390
GABRIEL CRISTOBAL et al.
Born, M., and Wolf, E. (1959). “Principles of Optics,” Pergamon Press, London. Bovik, A. C., Clark, M., and Geisler, W. S. (1990). “Multichannel texture analysis using localized spatial filters,” IEEE Trans. Pattern Anal. Machine Intell. 12(1), 55-73. Bracewell, R. N. (1983). “Discrete Hartley transform,” J. Opt. SOC.Am. 73, 182-183. Bracewell, R. N. (1986). “The Fourier transform and its applications,” Mc Graw Hill, New York, 2nd ed. Brenner, K. H. (1983). “A discrete version of the Wigner distribution function,” Proc. EURASIP, Signal Processing I I : Theories and Applications, pp. 307-309. Brenner, K. H.,and Lohmann, A. W. (1982). “Wigner distribution function display of complex I-D signals,” Opt. Comm. 42,310-314. Brodatz, P. (1966). Textures: A photographic album for artists and designers,” Dover, New York. Buhmann, J., Lange, J., and von der Malsburg, C. (1989). “Distortion invariant object recognition by matching hierarchically labeled graphs,” l E E E Int. Conf. on Neural Networks, Washington D.C., pp. 155-159. Burt, P. (1984). “The pyramid as a structure for efficient computation,” in Multiresolution Image Processing and Analysis, A. Rosenfeld, (ed.). Springer, New York. Canny, J. (1986). “A computational approach to edge detection,” IEEE Trans. on Pattern Anal. Machine Intell. 8(6), 679-698. Carter, W. H., and Wolf, E. (1977). “Coherence and radiometry with quasihomogeneous sources,’’ J . Opt. SOC.Am. 67, 785-796. Casa&nt, D. (1974). “A hybrid digital/optical computer system,” IEEE Trans. on Computers 22, 852-858.
Casasent, D., and Casasayas, F. (1975). “Optical processing of pulsed Doppler and FM stepped radar signals,” Applied Optics 14, 1364- 1372. Chan, D. S. K. (1982). “A non-aliased discrete-time Wigner distribution for time-frequency signal analysis,” Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing. Paris, pp. 13331336.
Chester, D. B., Carney, R.R., Damerow, D. H., and Riley, C. A. (1989). “Hybrid implementation of the Wigner Distribution and other time-frequency analysis techniques,” IEEE Proc. Int. Symposium on Circuits and Systems, Portland, Oregon, pp. 1252- 1255. Chester, D., Taylor, F. J., and Doyle, M. (1983). “On the Wigner distribution,” Proc. IEEE Int. Conf. on Acoust., Speech and Signal Processing, Boston, pp. 491-494. Choi, H., and Williams, W. J. (1989). “Improved time-frequency representation of multicomponent signals using exponential kernels,” IEEE Trans. on Acoust. Speech, Signal Processing 47(6), 862-871.
Claasen, T. A. C. M., and Mecklenbrauker, W. F. G. (1983). “The aliasing problem in discrete-time Wigner distributions,” IEEE Trans. on Acoust.. Speech and Signal Processing 31(5), 10671072.
Claasen, T. A. C. M., and Mecklenbrauker, W. F. G. (1980a). “The Wigner distribution: A tool for time-frequency signal analysis; Part I: Continuous-time signals,” Philips J. Res. 35,217-250. Claasen, T. A. C. M., and Mecklenbrauker, W. F. G. (1980b). “The Wigner distribution: A tool for time-frequency signal analysis; Part 11: Discrete time signals,” Philips J. Res. 35, 276-300. Claasen, T. A. C. M., and Mecklenbrauker, W. F. G. (1980~).“The Wigner distribution: A tool for time-frequency signal analysis; Part 111: Relations with other time-frequency signal transformations,” Philips J . Res. 35,372-389. Cohen, L. (1989). “Time-frequency distributions-A review,” Proc. IEEE 77, 941 -981. Cohen, L. (1966). ‘Generalized phase-space distribution functions,” J . Math. Phys. 7, 781-786. Concetta, M. C., and Burr, D. C. (1988). “Feature detection in human vision: A phase-dependent energy model,” Proc. R. SOC.Lond. B 23!5,221-245. Conner, M., and Li, Y. (1985). “Optical generation of the Wigner distribution of signals,” Applied Optics 24,3825-3829.
IMAGE FILTERING/ANALYSIS THROUGH THE WIGNER DISTRIBUTION
391
Conners, R. W., and Harlow, C. A. (1980). “A theoretical comparison of texture algorithms,” IEEE Trans. Pattern Anal. Machinil Intell. 2, 204. Cristobal, G. (1990).“Receptive field image modeling through cellular neural networks,” Summer Workshop on Analysis and Modeling of Neural Systems, Clark Kerr Campus, Berkeley, CA. Cristobal, G., Bescos, J., and Santamaria. J. (1986). “Application of the Wigner distribution for image representation and analysis,” I E E E Int. Conf. Pattern Recognition. Paris, pp. 998IOOO. Cristobal, G., Bescos, J., and Santamaria, J. (1989).“Image analysis through the Wigner distribution function.” Appl. Opt. 24,262-271. Cristobal, G., Bescos, J., Santamaria, J., and Montes, J. (1987). “Wigner distribution representation of digital images,” Patt. Rec. Lett. 5, 215-221. Cutrona, L. J., Leith, E. N., Palermo, C. J., and Parcello, L. J. (1960).“Optical data processing and filtering systems,” IRE Trans. h f . Theory IT-6,386-400. Cutrona, L. J. (1965). “Recent developments in coherent optical technology,” in Optical and Electro-Optical Informution Processing (J. T. Tippett, D. A. Berkowitz, L. C. Clapp, C. J. Koester and A. Vanderburg Jr., eds.). Chapter 6, MIT Press, Cambridge, Massachusetts. Daubechies, I. (1 990). “The wavelet transform, time-frequency localization and signal analysis,” I E E E Trans. Inform. Theory 36(5), 961-1005. Daubechies, I. (1989). “Orthonormal bases of wavelets with finite support-connection with discrete filters,” in Wauelets: Time-Frequency Methods and Phase Space (J. M. Combes, A. Grossmann, and Ph. Tchamitchian, eds.). Springer Verlag, New York. Daugman, J. G. (1988). “Complete discrete 2-D Gabor transforms by neural networks for image analysis and compression,” IEEE Trans. on Acoust., Speech and Signal Processing 36, 11691179. Daugman, J. G . (1980). “Two-dimensional spectral analysis of cortical receptive fields profiles,” Vision Research 20,847-856. Davis, L. S., and Mitchie, A. (1981).“Edge detection in textures,”in Image Modeling (A. Rosenfeld, ed.), Academic Press. New York. De Bruijn, N. G. (1973). “A theory of generalized functions, with applications to Wigner distribution and Weyl correspondence.” Nieuw Archiej- w o r Wiskunde 3(21), 205-280. De Valois, R. L., and De Valois K. K. (1988).Spatiul Vision, Oxford University Press, New York. Duda, R. O., and Hart, P. E. (1973). Pattern Classification and Scene Analysis, Wiley, New York. Easton, R. L., Ticknor, A. J., and Barret, H. H. (1984). “Application of the Radon transform to optical production of the Wigner distribution function,” Optical Eng. 23(6),738-744. Eichman. G., and Dong, B. 2. (1982).“Two-dimensional optical filtering of I-D signals,” Applied Optics 21, 3152-3156. Einziger, P. D., and Hertzberg, Y. (1986). “On the Gabor representation and its digital implementation,” Dept. Elec. Eng., Technion. Israel. EE Publ. 587. Emerson, R. C., Citron, M.C., Vaughn, W. J., and Klein, S. A. (1987). “Nonlinear directionality selective subunits in complex cells of cat striate cortex,” J. Neurophysiol. !%(I), 33-65. Fargetton, H., Glandeaud, F.. and Jourdain, G. ( 1979). “Filtrage dans le plan temps-frequence. Caractensation de signaux UBF et du milieu magnetospherique,” Ann. Telecommunic. 34, 1 / I 0- 10/10. Field, D. J., and Tolhurst, D. J. (1986).“The structure and symmetry of simple-cell receptive-field profiles in the cat’s visual cortex,” Proc. R. Sot. Lond. B 228,379-400. Fukunaga, K. (1972). Introduction t o Statistical Pattern Recognition, Academic Press, New York. Gabor, D. (1946).“Theory of communication,” J. IEE (London) 93(III),429-457. Gagalowitz, A. (1981).“A new method for texture field synthesis: some applications to study of human vision,” I E E E Trans. Pattern Anal. Machine Intell. 3(5),520-533. Ceisler, W. S., and Hamilton, D. B. (1986). “Sampling-theory analysis of spatial vision,” J . Opt. Soc. Am. A 3(1). 62-70.
392
GABRIEL CRISTOBAL et al.
Gonzalez, R. C., and Wintz, P. (1987). Digital Image Processing. Addison Wesley, Reading, Massachusetts, 2nd ed. Gonzalo, C. (1990) “Use o f the 4-D Discrete Wigner distribution function in simulation and restoration of space variant degraded images,” Appl. Opt., (in press). Gonzalo, C., Bescos, J., Berriel-Valdos, L. R., and Artal, P. (1990)“Optical-digital implementation of the Wigner distribution function: Use in space variant filtering of real images,” Appl. Opt. 29(17), 2569-2575. Gonzalo, C., Bescos, J., Berriel-Valdbs, L. R., and Santamaria, J. (1989). “Space-variant filtering through the Wigner distribution function,” Appl. Opt. 28(4),730-736. Goodman, J. W. (1968).Introduction to Fourier optics. McGraw-Hill, New York. Grossmann, A., and Morlet, J. (1984). “Decomposition of Hardy functions into square integrable wavelets of constant shape,” S l A M J . Math. 15,723-736. Gupta, A. K., and Asakura, T. (1986). “New optical system for the efficient display of Wigner distribution function using a single object transparency,” Optics Communications 60, 265268. Haralick, R. M. (1979). “Statistical and structural approaches to texture,” Proc. IEEE 67(5), 786804. Hartley, R. V. L. (1942).“A more symmetrical Fourier analysis applied to transmission problems,” Proc. IRE 30,144-150. Hawken, M. J., and Parker, A. J. (1987). “Spatial properties of neurons in the monkey striate cortex,” Proc. R. SOC.Lond. B 231,251-288. Heeger, D. (1987). “Model for the extraction of image flow,” J. Opt. SOC.Am. A 4(8), 1455-1471. IMSL (1982).International Mathematical and Statistical Libraries, IMSL Inc., Houston, Texas. Hopkins, H. H. (1955). “The frequency response of a defocused optical system,” Proc. R. SOC. London Ser. A 231,91-103. Huang, T. S., and Kasnitz, H. L. (1967). Proc. SOC.Photo and Instru. Engrs., Seminar Computerized Imaging Techniques. Hubel, D., and Wiesel, T. (1962). “Receptive field, binocular interaction, and functional architecture in the cat’s visual cortex,” J. Physiol. (London) 160, 106-154. Imberger, J., and Boashash, B. (1986). “Application of the Wigner-Ville distribution to temperature gradient microstructure: A new technique to study small-scale variations,” J. Phys. Oceanogr. 16(12), 1997-2012. Jacobson, L., and Wechsler, H. (1988). “Joint spatial/spatial-frequencyrepresentations,” Signal Proc. 14(1), 37-68. Jacobson, L., and Wechsler, H. (1987). “Derivation of optical flow using a spatiotemporalfrequency approach,” Comp. Vision, Graphics and Image Proc. 38,29-65. Jacobson, L., and Wechsler, H. (1984). “A theory for invariant object recognition in the frontoparallel plane,” IEEE Trans. Pattern Anal. Machine Intell. 6,325-331. Jacobson, L., and Wechsler, H. (1983). “The composite pseudo Wigner distribution (CPWD): A computable and versatile approximation to the Wigner distribution (WD),” Proc. Int. Conf. on Acoustics, Speech and Signal Proc., Boston, pp, 254-256. Jacobson, L., and Wechsler, H. (1982a). “The Wigner distribution as a tool for deriving an invariant representation of 2-D images,” Proc. Int. Conf. on Pattern Recognition and Imuge processing, Las Vegas, Nevada, pp. 218-220. Jacobson, L., and Wechsler, H. (1982b).“The Wigner distribution and its usefulness for 2-D image processing,” Proc. Int. Conf. on Pattern Recognition, Munich, Germany, pp. 538-541. Jain, A. K. (1989). Fundamentals of Digital Image Processing. Prentice Hall, Englewood Cliffs, New Jersey. Janse, C.P., and Kaizer, A. J. M. (1983). ‘Time-frequency distributions of loudspeakers: The application of the Wigner distribution,” J . Audio Eng. SOC.31(4), 198-223.
IMAGE FILTERING/ANALYSIS THROUGH THE WIGNER DISTRIBUTION
393
Jau, Y.C., and Chin, R. T. (1988).“Shape from texture using the Wigner distribution,” Proc. IEEE Int. Cons. on Computer Vision und Pattern Recognition. Ann Arbor. Michigan, pp. 515-523. Jenkin, M. R. M. (1988). “Visual stereoscopic computation,” Ph.D. Thesis, Dept. of Computer Science, University of Toronto, Toronto, Ontario, Canada. Jernigan, M. E., and DAstous, F. (1984).“Entropy-based texture analysis in the spatial frequency domain,” IEEE Trans. on Pattern Anal. Machine Intell. 6(2), 237-243. Jones, J., and Palmer L. (1987).“An evaluation of the two-dimensional Gabor filter model of simple receptive fields in cat striate cortex,” J. Neurophys. 58. 1233-1258. Julesz, B., and Bergen, J. R. (1983).“Textons, the fundamental elements in preattentive vision and perception of textures,” Bell Syst. Tech. J . 62(2), 1619-1645. Kay, R., and Matthews, D. (1972). “On the existence in human auditory pathways of channels selectively tuned to the modulation present in frequency-modulated tones,” J. Physiol. 225, 657-677. Kay, S., and Boudreaux-Bartels, G. F. (1985). “On the optimality of Wigner distribution for detection,” IEEE lnt. Cons. on Acoustics, Speech, and Signal Processing, Tampa, Florida, pp. 10 17- 1020. Kirkwood, J. G. (1933).“Quantum statistics of almost classical ensembles,” Phys. Rev. 44,31-37. Kovasznay, L. S. G., and Joseph, H. M. (1953). “Processing of two-dimensional patterns by scanning techniques,” Science 118,475-477. Kronland-Martinet, R., Morlet, J., and Grossmann, A. ( 1987). “Analysis of sounds patterns through wavelet transforms,” lnt. J. of Pattern Recognition and Artificial lntelligence 1(2), 273- 302. Kruger, R. P., Thompson, W. B., and Turner, A. F. (1974). “Computer diagnosis of pneumoconiosis,” IEEE Trans. Sys. Man Cyber. 4,40-49. Kumar, B. V. K. V., and Carroll (1983).“Pattern recognition using Wigner distribution function,” IEEE Proc. lUth Int. Optical Computing Coif., MIT. Cambridge, Massachusetts, pp. 130-135. Kumar, B. V. K. V., and Carroll, C. W. (1984).“Effects of sampling on signal detection using the cross-Wigner distribution function,” Applied Optics 23,4090-4094. Lendaris, G . G., and Stanley, G . L. (1977). “Diffraction pattern sampling for automatic pattern recognition,” in Computer Methods in lmage Analysis (J. K. Aggarwal, R. 0. Duda and A. Rosenfeld, eds.). IEEE Computer Society, Los Angeles. Li, Y., Eichmann, G., and Conner, M. (1988). “Optical Wigner distribution and ambiguity function for complex signals and images,” Optics Communications 67, 177- 179. Malik, J., and Perona, P. (1989). “A computational model of texture segmentation,” IEEE In/. Con$ on Computer Vision and Pattern Recognition, San Diego, CA, 326-332. Mallat, S. G. (1989a). “A theory for multiresolution signal decomposition: The wavelet representation,” IEEE Trans. Pattern Anal. Machine Intell. 11(7),674-693. Mallat, S. G. (l989b). “Multifrequency channel decompositions of images and wavelet models,” IEEE Trans. Acous. Speech, Signal Processing 37( 12), 209-21 10. Marcelja, S. (1980). “Mathematical description of the responses of simple cortical cells,” J . Opt. Sac. Am. A 70(11), 1297-1300. Margeneau, H., and Hill, R. N. (1961).“Correlation between measurements in quantum theory,” Prog. Theor. Phys. 26,722-738. Marinovic, N. M., and Smith, W. A. ( I 986). “Application of joint time-frequency distributions to ultrasonic transducers.” Proc. IEEE lnt. Symp. Circuits and Systems, San Jose, California, pp. 50-54. Mark. W. D. (1970). “Spectral analysis of the convolution and filtering of non-stationary stochastic processes,” J . Sound Vib. 11, 19-63. Marks, R. J., and Hall, M. W. (1979).“Ambiguity function display using a single input,” Applied Optics 18,2539-2540.
394
GABRIEL CRIST6BAL et al.
Marks, R. J., Walkup, J. F., and Krile, T. F. (1977). “Ambiguity function display: An improved coherent processor,” Applied Optics 16.746-750. Marr, D., and Hildred, E.(1980). “Theory of edge detection,” Proc. Royal SOC.of London B 207, 187-217.
Mateeva, T., and Sharlandjiev, P. (1986). “The generation of a Wigner distribution function of complex signals by spatial filtering,” Opt. Comm. 57, 153-155. MBller, A. (1978). “Coding of time-varying sounds in the cochlear nucleus,” Audiology 17,446468.
Movshon, J. A., Thompson, I. D., and Tolhurst, D. J. (1978).“Spatial summation in the receptive fields of simple cells in the cat’s striate cortex,” J. Physiol. London 283, 53-77. Oppenheim, A. V., and Lim, J. S. (1981). “The importance of phase in signals,” Proc. IEEE 69(5), 529 - 54 1. Pacut, A., Kolodziej, W. J., and Said, A. (1989). “Discrete domain Wigner distributions-a comparison and an implementation,” Proc. IEEE Int. Symposium on Circuits and Systems. Portland, Oregon, 1264-1267. Page, C. H. (1952). “Instantaneous power spectra,” J. Appl. Phys. 23, 103-106. Paler, K.,and Bowler I. W. (1986). “Gabor filters applied to electronic speckle pattern interferometer images,” IEE Int. Con$ on Image Processing and Its Applications, Imperial College, U. K.,pp. 258-262. Perry, A., and Lowe, D. G . (1989). “Segmentation of textured images,” IEEE Int. Conf. on Computer Vision and Patt. Recogn., San Diego, California, pp. 319-325. Peyrin, F., and Prost, R. (1986). “A unified definition for the discrete-time, discrete-frequency, and discrete-time/frequency Wigner distributions,” IEEE Trans. on Acoust., Speech and Signal Processing 34(4), 858-867. Poggio, T., and Girosi, F. (1989). “A theory of networks for approximation and learning,” M. 1. T. A. I. Memo No. 1140. Pollen, D., and Ronner, S. (1981), “Visual cortical neurons as localized spatial frequency filters,” IEEE Trans. on Systems, Man and Cybernetics 13(5), 907-916. Porat, M., and Zeevi, Y.Y.(1989). “Localized texture processing in vision: analysis and synthesis in the Gaborian space,” IEEE Trans. on Biomedical Eng. 36(1), 115-129. Pratt, W. K. (1980). “Decorrelation methods of texture feature extraction,” USC-IPI Report 860, Image Processing Institute, University of Southern California, pp. 3-17. Preis, D. (1982). “Phase distortion and phase equalization in audio signal processing-a tutorial review,” J. Audio Eng. Soc. 30,pp. 774-794 Preston, K. (1972). Coherent Optical Computers. McGraw Hill, New York. RatliK, F. (1965). Mach Bands: Quantitative Studies on Neural Networks in the Retina. Holden Day, San Francisco. Read, P. R., and Treitel, S. (1973). “The stabilization of two-dimensional recursive filters via the discrete Hilbert transform,” IEEE Trans. Geosci. Electron. 11, 153-207. Reed, T., and Wechsler, H. (1988). “Texture analysis and clustering using the Wigner distribution,” Proc. 9th. Int. Conf. on Pattern Recognition, Rome, pp. 770-772. Reed, T. R., and Wechsler, H.(1990). “Segmentation of textured images and Gestalt organization using spatial/spatial-frequency representations,” IEEE Trans. Pattern Anal. Machine Intell. 12(1), 1-12.
Rihaczek, A. W. (1968). “Signal energy distributions in time and frequency,” IEEE Trans. Inform. Theory 14,369-374. Riley, M. D.(1987. “Beyond quasi-stationarity: designing time-frequency representations for speech signals,” Proc. IEEE Int. Conf. on Acoust.. Speech and Signal Processing, Dallas, Texas, pp. 657-660.
IMAGE FILTERING/ANALYSIS THROUGH THE WIGNER DISTRIBUTION
395
Riley, M. D. (1989). Speech time-frequency representations. KIuwer Academic Publishers, Boston, Massachussets. Saleh, B. E. A,, and Subotic, N. S. (1985). “Time-variant filtering of signals in the mixed timefrequency domain,” IEEE Trans. ASSP 33(6), 1479-1485. Sanger, T. D. (1988). “Stereo disparity computation using Gabor filters,’’ Biol. Cybern. 59,405418.
Stark, H. (1982). Applications of optical Fourier transjorms. Academic Press, New York. Subotic, N. S., and Saleh, B. E. A. (1984a). “Optical time-variant processing of signals in the mixed time-frequency domain,” Opt. Comm. 52(4), 259-264. Subotic, N. S., and Saleh, E. A. (1984b). “Generation of the Wigner distribution function of twodimensional signals by a parallel optical processor,’’ Optics Letters 9,471-473. Sun,M., Li, C. C., Sekhar, L. N., and Sclabassi, R. J. (1989a). “Efficient computation of the discrete pseudo-Wigner distribution,” IEEE Trans. on Acoustics, Speech and Signal Processing 37( 1 I), 1735- 1742.
Sun,M., Li, C. C., Sekhar, L. N., and Sclabassi, R. J. (1989b). “Elimination of cross-components of discrete pseudo Wigner distribution via image processing,” IEEE Int. Con!. on Acoust., Speech and Signal Processing, Clasgow. Scotland, pp. 2230-2233. Sutton, R. N., and Hall, E. L. (1972). Texture measures for automatic classification of pulmonary disease,” IEEE Trans. Computers 21,667-676. Szu, H. H. (1982). “Two-dimensional optical processing of one-dimensional acoustic data,” Optical Engineering 21,804-813. Tan, T. N., and Constantinides, A. G. (1990). “Texture analysis based on a human visual model,” IEEE Int. Conf. on Acoustics, Speech and Signal Processing, Albuquerque, New Mexico, pp. 2137-2140. Tanimoto, S. L., Ligocki, T. L. and Ling R. (1987). “A prototype pyramid machine for hierarchical cellular logic,” in Paralie[ Computer Vision (L. Uhr, ed.). Academic Press, Orlando, Florida. Usui, S., and Araki, H.(1990). ”Wigner distribution analysis of BSPM for optimal sampling” IEEE Engin. in Medicine and Biology 9(1), 29-32. Van Essen, D. C. and Anderson, C. (1990) “Information processing strategies and pathways in the primate retina and visual cortex,” in Introduction to Neural and Electronic Networks (S. F. Zornetzer, J. L. Davis and C. Lau, eds.). Academic Press, Orlando, Florida. Van Gool, L., Dewaele, P., and Oosterlink, A. (1985). “Texture analysis anno 1983,” Computer Vision, Graphics and lmage Processing 29,336-357. Velez, E. F., and Absher, R. G. (1989). “Transient analysis of speech signals using the Wigner timefrequency representation,” Proc. IEEE Int. Canf. Acoust.. Speech and Signal Processing, Glasgow, Scotland, pp. 2242-2245. Ville, J. (1948). “Thtorie et applications de la notion de signal analitique,” Cables et Transmission 2A, 61-74.
Waibel, A. (1989). “Modular construction of time-delay neural networks for speech recognition,” Neural Computation 1,39-46. Walther, A. (1973). “Radiometry and coherence,” J . Opt. SOC.Am. 63, 1622-1623. Walther, A. (1968). “Radiometry and coherence,” J . Opt. Soc. Am. 58, 1256-1259. Weber, A. G. (1989). “Image data base.” Signal and Image Processing Institute, University of Southern California, Los Angeles, CA. Webster, M. A,, and de Valois, R. L. (1985). “Relationship between spatial-frequency and orientation tuning of striate-cortex cells,” J . Opt. Soc. Am. A 2, 1124-1 132. West, B. J. (1990). “Sensing scaled scintillations,” Special Issue on Fractals in the imaging Sciences, J . Opt. Soc. Am. A 7(6), 1074-1 100.
396
GABRIEL CRISTOBAL et a1.
Weszka, J. S., Dyer C. R., and Rosenfeld, A. (1976).“A comparative study of texture measures for terrain classification,” IEEE Trans. Syst. Man Cybern. 6,269-285. Wigner, E.(1932).“On the quantum correction for thermodynamic equilibrium,” Phys. Rev. 40, 749-759. Williams, W. J., and Jeong, J. (1989).“New time-frequency distributions: theory and applications,” Proc. Int. Symp. on Circuits and Systems, Portland. Oregon, 1243-1247. Woodward, P. M. (1953).Probability and information theory with application to radar. Pergamon, London. Young, R. A. (1985). “The gaussian derivative theory of spatial vision: analysis of cortical cell receptive field line-weighting profiles,” General Motors Rep. No. GMR-4920, Warren, Michigan. Zeevi, Y. Y., and Porat, M.(1988). “Computer image generation using elementary functions matched to human vision,” in Theoretical Foundations of Computer Graphics (R. E. Harnshaw, ed.). NATO AS1 Series Vol. 40,pp. 1197-1241, Springer, Berlin. Zhao, Y., Atlas, L., and Marks, R. (1990).“The use of cone-shaped kernels for generalized timefrequency representations of nonstationary signals,” IEEE Trans. on Acoustics, Speech and Signal Processing 38(7), 1084-1091. Zheng, C., Wildmalm, S. E., and Williams, W. J. (1989). “New time-frequency analysis of EMG and TMJ sound signals,” IEEE Int. Conf. on Engineering in Medicine and Biology, pp. 741-
I@.
Zhu, Y. M., Goutte, R., and Peyrin, F. (1990a). “On the use of 2-D analytic signals for Wigner analysis of 2-D real signals,” IEEE Int. a n j . Acoust., Speech and Signal Proc., Albuquerque New Mexico, pp. 1989-1992. Zhu, Y. M., Peyrin, F., and Goutte, R. (1990b). “The use of a two dimensional Hilbert transform for Wigner analysis of two-dimensional real signals,” Signal Proc. 19,205-220.
ADDITIONAL GENERAL REFERENCES Bastiaans, M. J. (1984). “Use of the Wigner distribution in optical problems,” Proc. ECOOSA. Amsterdam. The Netherlands, pp. 251-262. Boashash, B. (1983). “Wigner analysis of time-varying signals: An application to seismic prospecting,” in Proc. Signal Proc. 11: Theories and Applications (E. W. Schiissler, ed.), pp. 703-706. Elsevier Science Publishers (North Holland), Amsterdam. Bracewell, R. N., Bartelt, H., Lohmann, A. W., and Streibl, N. (1985). “Optical synthesis of the Hartley transform.” Applied Optics 24, 1401- 1402. Brousil, J. K., and Smith, D. R. (1967).“A threshold logic network for shape invariance,” IEEE Trans. on Ele. Computers 16,818-828. Casasent, D., and Psaltis, D. (1976). “Position, rotation, and scale invariant optical correlation,” Applied Optics 15, 1795-1797. Castleman, K.R. (1979). Digital Image Processing. Prentice Hall, Englewood Cliffs, New Jersey. Cohen, L., and Posch, T. E. (1985). “Positive time-frequencydistributions functions,” IEEE Tran. Acoustics, Speech and Signal Processing 33(1), 3 1-37. Combes, A., Grossman, P.H., and Tchamitchian, Ph., eds. (1989). Wawlets: Time-Frequency Methods and Phase Space. Springer, Berlin. Escudie, B. (1979). “Representation en temps et frkquence des signals d’energie finie: analyse et observation des signaux,” Ann. Tdlkcommunic. 34(3-4), 101-1 11. Faugeras, O., and Pratt, W. K. (1980). “Decorrelation methods of texture feature extraction,” IEEE Trans. Pattern Anal. Machine Intelligence 2, 323.
IMAGE FILTERING/ANALYSIS THROUGH THE WIGNER DISTRIBUTION
397
Field, D.J. (1987).“Relation between the statistics of natural images and the response properties of cortical cells,” J. Opt. Soc. Am. A 4(12), 2379-2394. Lizuka, K. (1983). Engineering Optics. Optical Sciences Series, Springer Verlag, Berlin. Martin, W., and Flandrin, P. (1985).“Detection of changes of signal structure using the WignerVille spectrum,” Signal Proc. 8, 21 5-233. Porat, M.,and Zeevi, Y. Y. (1988). “The generalized scheme of image representation in biological and machine vision,” I E E E Trans. Pattern Anal. Machine Intell. 10(4), 452-467. Watson, A. B. (1987). “The cortex transform: rapid computation of simulated neural images,” Compuier Vision, Graphics and lmage Processing 39,3 11-327. Young, R. A. (1987). “The gaussian derivative model for spatial vision: I. Retinal mechanisms,” Spatial Vision 2(4), 273-293.
This Page Intentionally Left Blank
Index A
Adder binary bit-level systolic, 157-158 finite ring bit-level systolic, 158 Additive conjugate, 266 Agarwal-Cooley, 16 Algebra, see also Finite algebras Boolean algebra, 253 heterogeneous or many-valued algebra, 246 homogeneous or single-valued algebra, 246 image algebra, 246-273 minimax algebra, 273,284 Algebraic structure, residue number systems, 122-124 Aliasing, 318-319, 330, 353 Allocation matrix, 195 Ambiguity function, 315, 332 Analytical signal, 323, 376-377. 380 Arithmetic elements, modular, 140- 141 Associated mixed radix number system, 126- I27 Auto-sort, 16
B Backward additive maximum, 270-27 I Backward convolution operator, 266 Backward transform, 263 Band-pass filter, 349 Base extension. residue number systems, 129-131 Bhattacharyya distance, 363 Bit reversal, 13
Cardinality, 254 Characteristic function, 251 Chernoff bound, 365 Chinese remainder theorem, 24, 121-122, 124- 126 Choice function, 254 Choi-Williams distribution, 318, 321-322, 324
Cohen general distribution, 321 Coherent noise, 341 processor, 332 Commutation theorem, 7 Complete preording, 190 Complex energy spectrum, 320 signal, 334,338 spectrogram, 346 Compound experiment, 190 Co-occurrence matrices, 369 Cooley-Tukey algorithm, 11.21 mixed-radix, 14, 16 multidimensional, 16-25 radix-two, 12 Covariance matrix, 365 CRT mapping, 145 Cylindrical lens, 333
D Decimated function, 21 Decimation in frequency, 12 in time, 12 algorithm, 107-108 FFT algorithm, 77-79 Decoding, binary, 147- 148 Degradation, 343, 350 Diagram of information, 320,326 Diffraction grating, 337 order, 334 Digital filtering, finite impulse response using DFTs, 73- 74 Digital signal processing, 69- 160 computational intensity, 70 indirect computation of convolution, 79-80 inner product form, 72 number theoretic transforms, see Number theoretic transforms residue number systems, 121-131 Digital two-manifold, 276
399
400
INDEX
Discrete Fourier transform, see Fourier transform Distance transform, 304 Division algorithm, 82-83 D-optimal criterion, 197 Dual operations, 263 Dual subgroup, 20
E Encoding, binary to residue, 143-144 Euclidean isometries, 256 Euler-Fermat theorem, 83 Euler number, 274 Extreme point, 298
F Feature extraction, 326,360,367 band-wedge sampling, 361 vector, 366 Fermat number transforms, 85-92 forward transform, 89-90 integer sequence convolution, 91 -92 over Galois field, 85-86 quadratic residue rings, 110-1 15 Field, finite algebras, 80-84 Filtering images, 310 operations, 344 television pictures, 70-7 1 Finite algebras rings and fields, 80-84 VLSI implementations, see VLSI FIR filter, bit-parallel, 148-151 Fisher ratio, 367 Forward additive maximum transform, 270- 27 1 Forward convolution operator, 266 Fourier transform discrete, 73 finite impulse response digital filtering, 73-74 inverse and convolution property, 75-77 transformation matrix, 87 fast (FFT), 73, 31 1, 343 algorithms, 77-80 decimation in time, 77-79
inverse, 317 phase, 313,316,372,377 pointed, 29 spectrum, 318,361 Fuzzy information, 224 system, 225
G
Gabor transform, 320,375 Galois field, 83-84, 96-97 arithmetic rules, 105 convolution over, 117-1 18 matrix multiplication over, 113-1 15 multiplicative group over, 100-101, 103 multiplicative subgroups over, 105-106 quadratic nonresidue over, 97-99 Gaussian derivative filters, 372,382 filter, 352 Generalized additivity, 166 Generalized product image, 261 matrix, 282-285 Gentleman-Snade, 13,16 Global reduce operation, 252,260 Good-Thomas, 21,24
H Hartley transform discrete, 354 fast, 312 Hilbert transform, 323, 325 Histogram, 260 equalization, 360 Hologram, 334 Hybrid processor, 31 I, 314,326,339,341, 343,354 Hyperreal number, I82
1
Idempotents, 25 Image, 247 classification, 363 constant, 249 domain, 253
401
INDEX extension, 25 1 F-valued, 247 magnification, 268 processing, 309 representation, 310,312 restoration, 359 restriction, 250 unit, 249 value, 247 zero, 249 Index of diversity, 218 Indirect convolution, number theoretic t ransforms, 92 - 95 Induced image operations. 247-253 addition, 247 exponentiation, 248 global reduce operation, 252-253 maximum, 247-248 minimum, 249 multiplication, 247 scalar operations, 249-250 Information energy Bayesian context, 189 continuous random variables, 177 conditional, I79 joint, 178 properties, 179 difierence probability distributions beta, 179 Cauchy, 179 chi-square, 180 double exponential, I80 Erlang, 180 exponential, 180 F-Snedecor, 180 gamma, 180 generalized normal, 180 Gumbel, 181 Laplace, 180 logistic, 180 lognormal, 180 Maxwell-Boltzmann, 180 normal, 180 normal multivariate, 181 Pareto, 180 Rayleigh, 18 1 student-t, 181 triangular, 18 1 uniform, 181 Weibull, 18 I
discrete random variables, 167 characterization, 169 conditional, 171 joint, 171 properties, 167- 169 divergence, I76 in field of hyperreal numbers, 18 1 FIS comparison criterion, 226-227 as index of diversity, 218 point process, 2 13 Inner product step processor, finite ring, 141-143 Inverse filter, 351 Inverse Fourier transform, 31 7 Inversion property, 332,353 Isomorphic mapping function, 124
J Joint representation, 309, 313, 360, 373 analysis, 343 generalized, 345
K Korn-Lambiotte, 13 Kraft inequality, 236
L Latch circuits, dynamic, 152-153 Lateral inhibition, 326 Lattice transform, 265 Leakage, 3 19 Leibowitz approach, 120-121 Line algorithm, 21, 26 Linear transform, 265 Local averaging, 267 image representation, 31 3 power, 310,313,316,330,333,347 spectra, 333, 345, 349 Doppler, 345 Logic problems, I74 Logons. 320
402
INDEX
M Marginal distribution, 310, 316 Markov chains, 223 information energy gain, 223-224 McCellan approach, 120 Mean code word length, 237 image, 256 Measure of uncertainty, 166 Median filter, 323 Mersenne number transforms, 94-95 Moments, as descriptors of regions, 255 Multicomponent signal, 318,323 Multidimensional mapping, 115-1 18 Multiple discriminant analysis, 367 Multiplicative group, 81 over Galois field, 100-101, 103 Multiresolution representation, 372
N Negative infinite support, 270 Neural networks, 292-299 Carpenter-Grossberg net, 296 Hamming net, 295 Hopfield net, 294-295 MAXNET, 296 Nonadditivity, 166 Nonstandard analysis, 182 Null experiment, 190 Number theoretic transforms, 84-1 21 binary implementations, 119-121 decimation-in-time algorithm, 107- 108 decomposition of complex arithmetic to real operations, 96 dynamic range extension, 118-1 19 fast, 108- I10 Fermat number transforms, 85-92 forward table, 135 forward transform, 100-102 look-up results, 137-138 Galois field, 96-97 implementation using residue number systems, 13 1- 140 indirect convolution, 92-95 inverse table, 135 inverse transform look-up results, 137, 139 multidimensional mapping, 115-1 18 multiplication using index calculus, 133- 140
multiplicative group over Galois field, 100-101,103 over extension fields, 95-1 10 parallel computations, 132 quadratic nonresidue over Galois field, 97-99 quadratic residue rings, 110-1 15 ring and field selection, 94 two-dimensional, 116-1 18
0
Octagonal chain code, 275 Optical filtering, 345 processor, 31 I, 326,332-333 transfer function (OTF),349 Wigner processor, 31 1,333,338 Overflow detection, residue number systems, 127-128
P Page distribution, 321-322 Partially ordered set, 300 Pearson X2-divergence,173 Pease, 13 Perceptron, 297-298 single layer perceptron, 297 three layer perceptron, 298 Periodic function, 20 Phase contrast method, 335 filter, 335 Fourier transform, 313,316, 372, 377 Pixel, 247 color, representation, 72 level operation, 254 location, 247 value, 247 Pointed Fourier transform, 29 Point set, 254 Point spread function (PSF), 350,357,359 Posterior distribution, 189 Power spectrum, 321 Prior distribution, 189 Probability of error, 216 and information energy, 21 6
INDEX
Q Quadratic entropy, 172 Quadratic residue rings, number theoretic transforms, 1 10- I 1 5 Quadrature filters, 372, 376 Quantification of fuzzy information, 226
R Radon transformation, 339 Receptive field. 381 Regression experiment, 195 Residue classes, ring, 82-83 Residue number systems, 121-131 algebraic structure, 122-124 associated mixed radix number system, 126-127 base extension, 129-131 Chinese Remainder Theorem, 121-122, 1242126 nonredundant, 123 number theoretic transform implementation, 131-140 overflow detection, 127- 128 scaling, 128- 129 Rihaczek distribution, 320 Ring finite algebras, 80-84 residue classes, 82-83 ROM block diagram, 153 generic cells, VLSl implementations, 151-159 circuit operation, 151- 154 comparative study, 157-159 simulations, 154-156 storage circuitry, 153-154 Row-column method, 17
S
Sampling, 330 Scaling, residue number systems, 128- 129 Self-sorting, 14 Semithresholding, 255 Shifting parameter, 380 Simple connected, 289 Singleton, 13 Slice, 22
403
Small Winograd, 17 Space variant defocusing, 349,354 degradation, 312, 352 filtering, 312, 343-344, 347-348 Spatial domain, 310, 327, 344 filtering, 310, 335 representation, 3 12 samples, 330, 335, 341, 348, 352 variables, 313-314, 326-327 Spectral energy density, 316 Spectrogram, 315,324,345 Speech processing, 333 SPRT, 205 Standard measure, 194 part, 182 Stockham, 14 Standard deviation, image, 256 Stride permutation, 6 Sufficiency Blackwell, 192, 195 Lehmann, 191 Sufficient FIS, 213 statistic. 191
T Target point, 257 Television pictures, filtering, 70- 71 Temperton, 14 Template, 257 composition, 272-273 weak, 286 constant, 259 convex, 289 cross, 290 decomposition, 290-292, 302 disk, 291 F-valued, 257 operations, 257-260 parameterized, 260 quarter-plane, 303 rectangular, 286 recursive, 300 scalar, 259 separable, 286 spherical, 291 support, 257
404
INDEX
Template (Continued) symmetric, 289 translation invariant, 258 transpose, 262 variant, 258 weights, 257 Tensor product factor mixed type, 6 parallel, 6 vector, 6 identities, 5 matrices, 5 multidimensional, 9 vectors, 5 Terminal information energy, 234 Texture classification, 363,367 edge detection, 369 feature extraction, 361 properties, 359 Transmittance, 333 Twiddle factor, 11, 135
V Value set, 247 VLSI implementations of finite algebraic systems, 140-159 binary to residue encoding, 143-144
bit-parallel FIR filter, 148-151 CRT mapping, 145 finite ring IPSP, 141-143 generic residue processing cell, 141-151 modular arithmetic elements, 140-141 ROM generic cells, 151 -1 59 scale/binary decode example, 147- 148 processor, 326,344
W Wavelet transform, 383 Weighted branching property, 235 conditional information energy, 235 information energy, 234 mean code word length, 237 medial axis, 271 probabilistic experiment, 235 Wiener filter, 312, 351, 353, 357 Wigner distribution aliasing, 319 continuous definition, 314 digital implementations, 327 discrete definition, 318, 330 hybrid implementations, 339 interference terms, 318,321,323 inversion property, 317, 319 marginals, 3 16 optical implementations, 332 VLSI implementations, 343