This page intentionally left blank
Scaling, Fractals and Wavelets
This page intentionally left blank
Scaling, Fra...
98 downloads
1564 Views
4MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
This page intentionally left blank
Scaling, Fractals and Wavelets
This page intentionally left blank
Scaling, Fractals and Wavelets
Edited by Patrice Abry Paulo Gonçalves Jacques Lévy Véhel
First published in France in 2 volumes in 2002 by Hermes Science/Lavoisier entitled: Lois d’échelle, fractales et ondelettes © LAVOISIER, 2002 First published in Great Britain and the United States in 2009 by ISTE Ltd and John Wiley & Sons, Inc. Apart from any fair dealing for the purposes of research or private study, or criticism or review, as permitted under the Copyright, Designs and Patents Act 1988, this publication may only be reproduced, stored or transmitted, in any form or by any means, with the prior permission in writing of the publishers, or in the case of reprographic reproduction in accordance with the terms and licenses issued by the CLA. Enquiries concerning reproduction outside these terms should be sent to the publishers at the undermentioned address: ISTE Ltd 27-37 St George’s Road London SW19 4EU UK
John Wiley & Sons, Inc. 111 River Street Hoboken, NJ 07030 USA
www.iste.co.uk
www.wiley.com
© ISTE Ltd, 2009 The rights of Patrice Abry, Paulo Gonçalves and Jacques Lévy Véhel to be identified as the authors of this work have been asserted by them in accordance with the Copyright, Designs and Patents Act 1988. Library of Congress Cataloging-in-Publication Data Lois d’échelle, fractales et ondelettes. English Scaling, fractals and wavelets/edited by Patrice Abry, Paulo Gonçalves, Jacques Lévy Véhel. p. cm. Includes bibliographical references. ISBN 978-1-84821-072-1 1. Signal processing--Mathematics. 2. Fractals. 3. Wavelets (Mathematics) I. Abry, Patrice. II. Gonçalves, Paulo. III. Lévy Véhel, Jacques, 1960- IV. Title. TK5102.9.L65 2007 621.382'20151--dc22 2007025119 British Library Cataloguing-in-Publication Data A CIP record for this book is available from the British Library ISBN: 978-1-84821-072-1 Printed and bound in Great Britain by CPI Antony Rowe Ltd, Chippenham, Wiltshire.
Table of Contents
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
17
Chapter 1. Fractal and Multifractal Analysis in Signal Processing . . . . . Jacques L ÉVY V ÉHEL and Claude T RICOT
19
1.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2. Dimensions of sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.1. Minkowski-Bouligand dimension . . . . . . . . . . . . . . . . 1.2.2. Packing dimension . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.3. Covering dimension . . . . . . . . . . . . . . . . . . . . . . . . 1.2.4. Methods for calculating dimensions . . . . . . . . . . . . . . . 1.3. Hölder exponents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.1. Hölder exponents related to a measure . . . . . . . . . . . . . . 1.3.2. Theorems on set dimensions . . . . . . . . . . . . . . . . . . . 1.3.3. Hölder exponent related to a function . . . . . . . . . . . . . . 1.3.4. Signal dimension theorem . . . . . . . . . . . . . . . . . . . . . 1.3.5. 2-microlocal analysis . . . . . . . . . . . . . . . . . . . . . . . 1.3.6. An example: analysis of stock market price . . . . . . . . . . . 1.4. Multifractal analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4.1. What is the purpose of multifractal analysis? . . . . . . . . . . 1.4.2. First ingredient: local regularity measures . . . . . . . . . . . . 1.4.3. Second ingredient: the size of point sets of the same regularity 1.4.4. Practical calculation of spectra . . . . . . . . . . . . . . . . . . 1.4.5. Refinements: analysis of the sequence of capacities, mutual analysis and multisingularity . . . . . . . . . . . . . . . . . . . 1.4.6. The multifractal spectra of certain simple signals . . . . . . . . 1.4.7. Two applications . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4.7.1. Image segmentation . . . . . . . . . . . . . . . . . . . . . 1.4.7.2. Analysis of TCP traffic . . . . . . . . . . . . . . . . . . . 1.5. Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
19 20 21 25 27 29 33 33 33 36 42 45 46 48 48 49 50 52
. . . . . .
. . . . . .
60 62 66 66 67 68
6
Scaling, Fractals and Wavelets
Chapter 2. Scale Invariance and Wavelets . . . . . . . . . . . . . . . . . . . . Patrick F LANDRIN, Paulo G ONÇALVES and Patrice A BRY
71
2.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 2.2. Models for scale invariance . . . . . . . . . . . . . . . . . . . . . . . . . 72 2.2.1. Intuition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 2.2.2. Self-similarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 2.2.3. Long-range dependence . . . . . . . . . . . . . . . . . . . . . . . . 75 2.2.4. Local regularity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 2.2.5. Fractional Brownian motion: paradigm of scale invariance . . . . 77 2.2.6. Beyond the paradigm of scale invariance . . . . . . . . . . . . . . 79 2.3. Wavelet transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 2.3.1. Continuous wavelet transform . . . . . . . . . . . . . . . . . . . . 81 2.3.2. Discrete wavelet transform . . . . . . . . . . . . . . . . . . . . . . 82 2.4. Wavelet analysis of scale invariant processes . . . . . . . . . . . . . . . 85 2.4.1. Self-similarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 2.4.2. Long-range dependence . . . . . . . . . . . . . . . . . . . . . . . . 88 2.4.3. Local regularity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 2.4.4. Beyond second order . . . . . . . . . . . . . . . . . . . . . . . . . . 92 2.5. Implementation: analysis, detection and estimation . . . . . . . . . . . . 92 2.5.1. Estimation of the parameters of scale invariance . . . . . . . . . . 93 2.5.2. Emphasis on scaling laws and determination of the scaling range . 96 2.5.3. Robustness of the wavelet approach . . . . . . . . . . . . . . . . . 98 2.6. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 2.7. Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 Chapter 3. Wavelet Methods for Multifractal Analysis of Functions . . . . 103 Stéphane JAFFARD 3.1. Introduction . . . . . . . . . . . . . . . . . . . . 3.2. General points regarding multifractal functions 3.2.1. Important definitions . . . . . . . . . . . . 3.2.2. Wavelets and pointwise regularity . . . . 3.2.3. Local oscillations . . . . . . . . . . . . . . 3.2.4. Complements . . . . . . . . . . . . . . . . 3.3. Random multifractal processes . . . . . . . . . 3.3.1. Lévy processes . . . . . . . . . . . . . . . 3.3.2. Burgers’ equation and Brownian motion . 3.3.3. Random wavelet series . . . . . . . . . . . 3.4. Multifractal formalisms . . . . . . . . . . . . . 3.4.1. Besov spaces and lacunarity . . . . . . . . 3.4.2. Construction of formalisms . . . . . . . . 3.5. Bounds of the spectrum . . . . . . . . . . . . . 3.5.1. Bounds according to the Besov domain .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
103 104 104 107 112 116 117 117 120 122 123 123 126 129 129
Contents
7
3.5.2. Bounds deduced from histograms . . . . . . . . . . . . . . . . . . 132 3.6. The grand-canonical multifractal formalism . . . . . . . . . . . . . . . . 132 3.7. Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 Chapter 4. Multifractal Scaling: General Theory and Approach by Wavelets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 Rudolf R IEDI 4.1. Introduction and summary . . . . . . . . . . . . . . . 4.2. Singularity exponents . . . . . . . . . . . . . . . . . 4.2.1. Hölder continuity . . . . . . . . . . . . . . . . . 4.2.2. Scaling of wavelet coefficients . . . . . . . . . 4.2.3. Other scaling exponents . . . . . . . . . . . . . 4.3. Multifractal analysis . . . . . . . . . . . . . . . . . . 4.3.1. Dimension based spectra . . . . . . . . . . . . 4.3.2. Grain based spectra . . . . . . . . . . . . . . . 4.3.3. Partition function and Legendre spectrum . . . 4.3.4. Deterministic envelopes . . . . . . . . . . . . . 4.4. Multifractal formalism . . . . . . . . . . . . . . . . . 4.5. Binomial multifractals . . . . . . . . . . . . . . . . . 4.5.1. Construction . . . . . . . . . . . . . . . . . . . 4.5.2. Wavelet decomposition . . . . . . . . . . . . . 4.5.3. Multifractal analysis of the binomial measure . 4.5.4. Examples . . . . . . . . . . . . . . . . . . . . . 4.5.5. Beyond dyadic structure . . . . . . . . . . . . . 4.6. Wavelet based analysis . . . . . . . . . . . . . . . . . 4.6.1. The binomial revisited with wavelets . . . . . . 4.6.2. Multifractal properties of the derivative . . . . 4.7. Self-similarity and LRD . . . . . . . . . . . . . . . . 4.8. Multifractal processes . . . . . . . . . . . . . . . . . 4.8.1. Construction and simulation . . . . . . . . . . 4.8.2. Global analysis . . . . . . . . . . . . . . . . . . 4.8.3. Local analysis of warped FBM . . . . . . . . . 4.8.4. LRD and estimation of warped FBM . . . . . . 4.9. Bibliography . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . .
139 140 140 142 144 145 145 146 147 149 151 154 154 157 158 160 162 163 163 165 167 168 169 170 170 173 173
Chapter 5. Self-similar Processes . . . . . . . . . . . . . . . . . . . . . . . . . 179 Albert B ENASSI and Jacques I STAS 5.1. Introduction . . . . . . . . . . . . . . 5.1.1. Motivations . . . . . . . . . . . 5.1.2. Scalings . . . . . . . . . . . . . 5.1.2.1. Trees . . . . . . . . . . . . 5.1.2.2. Coding of R . . . . . . . 5.1.2.3. Renormalizing Cantor set
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
179 179 182 182 183 183
8
Scaling, Fractals and Wavelets
5.1.2.4. Random renormalized Cantor set . . . . . . . . . . . . . . . 5.1.3. Distributions of scale invariant masses . . . . . . . . . . . . . . . 5.1.3.1. Distribution of masses associated with Poisson measures . 5.1.3.2. Complete coding . . . . . . . . . . . . . . . . . . . . . . . . 5.1.4. Weierstrass functions . . . . . . . . . . . . . . . . . . . . . . . . 5.1.5. Renormalization of sums of random variables . . . . . . . . . . 5.1.6. A common structure for a stochastic (semi-)self-similar process 5.1.7. Identifying Weierstrass functions . . . . . . . . . . . . . . . . . . 5.1.7.1. Pseudo-correlation . . . . . . . . . . . . . . . . . . . . . . . 5.2. The Gaussian case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.1. Self-similar Gaussian processes with r-stationary increments . . 5.2.1.1. Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.1.2. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.1.3. Characterization . . . . . . . . . . . . . . . . . . . . . . . . 5.2.2. Elliptic processes . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.3. Hyperbolic processes . . . . . . . . . . . . . . . . . . . . . . . . 5.2.4. Parabolic processes . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.5. Wavelet decomposition . . . . . . . . . . . . . . . . . . . . . . . 5.2.5.1. Gaussian elliptic processes . . . . . . . . . . . . . . . . . . 5.2.5.2. Gaussian hyperbolic process . . . . . . . . . . . . . . . . . 5.2.6. Renormalization of sums of correlated random variable . . . . . 5.2.7. Convergence towards fractional Brownian motion . . . . . . . . 5.2.7.1. Quadratic variations . . . . . . . . . . . . . . . . . . . . . . 5.2.7.2. Acceleration of convergence . . . . . . . . . . . . . . . . . 5.2.7.3. Self-similarity and regularity of trajectories . . . . . . . . . 5.3. Non-Gaussian case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.2. Symmetric α-stable processes . . . . . . . . . . . . . . . . . . . 5.3.2.1. Stochastic measure . . . . . . . . . . . . . . . . . . . . . . . 5.3.2.2. Ellipticity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.3. Censov and Takenaka processes . . . . . . . . . . . . . . . . . . 5.3.4. Wavelet decomposition . . . . . . . . . . . . . . . . . . . . . . . 5.3.5. Process subordinated to Brownian measure . . . . . . . . . . . . 5.4. Regularity and long-range dependence . . . . . . . . . . . . . . . . . . 5.4.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.2. Two examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.2.1. A signal plus noise model . . . . . . . . . . . . . . . . . . . 5.4.2.2. Filtered white noise . . . . . . . . . . . . . . . . . . . . . . 5.4.2.3. Long-range correlation . . . . . . . . . . . . . . . . . . . . 5.5. Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
184 184 184 185 185 186 187 188 188 189 189 189 189 190 190 191 192 192 192 193 193 193 193 194 195 195 195 196 196 196 198 198 199 200 200 201 201 201 202 202
Contents
9
Chapter 6. Locally Self-similar Fields . . . . . . . . . . . . . . . . . . . . . . 205 Serge C OHEN 6.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2. Recap of two representations of fractional Brownian motion . 6.2.1. Reproducing kernel Hilbert space . . . . . . . . . . . . . 6.2.2. Harmonizable representation . . . . . . . . . . . . . . . . 6.3. Two examples of locally self-similar fields . . . . . . . . . . . 6.3.1. Definition of the local asymptotic self-similarity (LASS) 6.3.2. Filtered white noise (FWN) . . . . . . . . . . . . . . . . . 6.3.3. Elliptic Gaussian random fields (EGRP) . . . . . . . . . . 6.4. Multifractional fields and trajectorial regularity . . . . . . . . . 6.4.1. Two representations of the MBM . . . . . . . . . . . . . . 6.4.2. Study of the regularity of the trajectories of the MBM . . 6.4.3. Towards more irregularities: generalized multifractional Brownian motion (GMBM) and step fractional Brownian motion (SFBM) . . . . . . . . . . . . . . . . . . . . . . . . 6.4.3.1. Step fractional Brownian motion . . . . . . . . . . . 6.4.3.2. Generalized multifractional Brownian motion . . . 6.5. Estimate of regularity . . . . . . . . . . . . . . . . . . . . . . . 6.5.1. General method: generalized quadratic variation . . . . . 6.5.2. Application to the examples . . . . . . . . . . . . . . . . . 6.5.2.1. Identification of filtered white noise . . . . . . . . . 6.5.2.2. Identification of elliptic Gaussian random processes 6.5.2.3. Identification of MBM . . . . . . . . . . . . . . . . . 6.5.2.4. Identification of SFBMs . . . . . . . . . . . . . . . . 6.6. Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
205 207 207 208 213 213 214 215 218 219 221
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
222 223 224 226 226 228 228 230 231 233 235
Chapter 7. An Introduction to Fractional Calculus . . . . . . . . . . . . . . 237 Denis M ATIGNON 7.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1.1. Motivations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1.1.1. Fields of application . . . . . . . . . . . . . . . . . . . . . . . 7.1.1.2. Theories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1.2. Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1.3. Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2.1. Fractional integration . . . . . . . . . . . . . . . . . . . . . . . . . 7.2.2. Fractional derivatives within the framework of causal distributions 7.2.2.1. Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2.2.2. Fundamental solutions . . . . . . . . . . . . . . . . . . . . . . 7.2.3. Mild fractional derivatives, in the Caputo sense . . . . . . . . . . . 7.2.3.1. Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . .
237 237 237 238 238 239 240 240 242 242 245 246 246
10
Scaling, Fractals and Wavelets
7.2.3.2. Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247 7.2.3.3. Mittag-Leffler eigenfunctions . . . . . . . . . . . . . . . . . . 248 7.2.3.4. Fractional power series expansions of order α (α-FPSE) . . 250 7.3. Fractional differential equations . . . . . . . . . . . . . . . . . . . . . . 251 7.3.1. Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251 7.3.1.1. Framework of causal distributions . . . . . . . . . . . . . . . 251 7.3.1.2. Framework of fractional power series expansion of order one half . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252 7.3.1.3. Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253 7.3.2. Framework of causal distributions . . . . . . . . . . . . . . . . . . 254 7.3.3. Framework of functions expandable into fractional power series (α-FPSE) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255 7.3.4. Asymptotic behavior of fundamental solutions . . . . . . . . . . . 257 7.3.4.1. Asymptotic behavior at the origin . . . . . . . . . . . . . . . 257 7.3.4.2. Asymptotic behavior at infinity . . . . . . . . . . . . . . . . . 257 7.3.5. Controlled-and-observed linear dynamic systems of fractional order 261 7.4. Diffusive structure of fractional differential systems . . . . . . . . . . . 262 7.4.1. Introduction to diffusive representations of pseudo-differential operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263 7.4.2. General decomposition result . . . . . . . . . . . . . . . . . . . . . 264 7.4.3. Connection with the concept of long memory . . . . . . . . . . . . 265 7.4.4. Particular case of fractional differential systems of commensurate orders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265 7.5. Example of a fractional partial differential equation . . . . . . . . . . . 266 7.5.1. Physical problem considered . . . . . . . . . . . . . . . . . . . . . 267 7.5.2. Spectral consequences . . . . . . . . . . . . . . . . . . . . . . . . . 268 7.5.3. Time-domain consequences . . . . . . . . . . . . . . . . . . . . . . 268 7.5.3.1. Decomposition into wavetrains . . . . . . . . . . . . . . . . . 269 7.5.3.2. Quasi-modal decomposition . . . . . . . . . . . . . . . . . . 270 7.5.3.3. Fractional modal decomposition . . . . . . . . . . . . . . . . 271 7.5.4. Free problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272 7.6. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273 7.7. Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273 Chapter 8. Fractional Synthesis, Fractional Filters . . . . . . . . . . . . . . 279 Liliane B EL, Georges O PPENHEIM, Luc ROBBIANO and Marie-Claude V IANO 8.1. Traditional and less traditional questions about fractionals . . . . . . . . 279 8.1.1. Notes on terminology . . . . . . . . . . . . . . . . . . . . . . . . . 279 8.1.2. Short and long memory . . . . . . . . . . . . . . . . . . . . . . . . 279 8.1.3. From integer to non-integer powers: filter based sample path design 280 8.1.4. Local and global properties . . . . . . . . . . . . . . . . . . . . . . 281 8.2. Fractional filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282 8.2.1. Desired general properties: association . . . . . . . . . . . . . . . 282
Contents
8.2.2. Construction and approximation techniques . . . . . . . . . . . 8.3. Discrete time fractional processes . . . . . . . . . . . . . . . . . . . 8.3.1. Filters: impulse responses and corresponding processes . . . . 8.3.2. Mixing and memory properties . . . . . . . . . . . . . . . . . . 8.3.3. Parameter estimation . . . . . . . . . . . . . . . . . . . . . . . . 8.3.4. Simulated example . . . . . . . . . . . . . . . . . . . . . . . . . 8.4. Continuous time fractional processes . . . . . . . . . . . . . . . . . . 8.4.1. A non-self-similar family: fractional processes designed from fractional filters . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.4.2. Sample path properties: local and global regularity, memory . 8.5. Distribution processes . . . . . . . . . . . . . . . . . . . . . . . . . . 8.5.1. Motivation and generalization of distribution processes . . . . 8.5.2. The family of linear distribution processes . . . . . . . . . . . 8.5.3. Fractional distribution processes . . . . . . . . . . . . . . . . . 8.5.4. Mixing and memory properties . . . . . . . . . . . . . . . . . . 8.6. Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 9. Iterated Function Systems and Some Generalizations: Local Regularity Analysis and Multifractal Modeling of Signals . Khalid DAOUDI 9.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2. Definition of the Hölder exponent . . . . . . . . . . . . . . . . 9.3. Iterated function systems (IFS) . . . . . . . . . . . . . . . . . . 9.4. Generalization of iterated function systems . . . . . . . . . . . 9.4.1. Semi-generalized iterated function systems . . . . . . . . 9.4.2. Generalized iterated function systems . . . . . . . . . . . 9.5. Estimation of pointwise Hölder exponent by GIFS . . . . . . . 9.5.1. Principles of the method . . . . . . . . . . . . . . . . . . . 9.5.2. Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.5.3. Application . . . . . . . . . . . . . . . . . . . . . . . . . . 9.6. Weak self-similar functions and multifractal formalism . . . . 9.7. Signal representation by WSA functions . . . . . . . . . . . . . 9.8. Segmentation of signals by weak self-similar functions . . . . 9.9. Estimation of the multifractal spectrum . . . . . . . . . . . . . 9.10. Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.11. Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11
. . . . . . .
. . . . . . .
282 284 284 286 287 289 291
. . . . . . . .
. . . . . . . .
291 293 294 294 294 295 296 297
. . . . . 301 . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
Chapter 10. Iterated Function Systems and Applications in Image Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Franck DAVOINE and Jean-Marc C HASSERY 10.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.2. Iterated transformation systems . . . . . . . . . . . . . . . . . . . . . . 10.2.1. Contracting transformations and iterated transformation systems 10.2.1.1. Lipschitzian transformation . . . . . . . . . . . . . . . . . .
301 303 304 306 307 308 311 312 314 315 318 320 324 326 327 329 333 333 333 334 334
12
Scaling, Fractals and Wavelets
10.2.1.2. Contracting transformation . . . . . . . . . . . . . . . . . . 10.2.1.3. Fixed point . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.2.1.4. Hausdorff distance . . . . . . . . . . . . . . . . . . . . . . . 10.2.1.5. Contracting transformation on the space H(R2 ) . . . . . . 10.2.1.6. Iterated transformation system . . . . . . . . . . . . . . . . 10.2.2. Attractor of an iterated transformation system . . . . . . . . . . . 10.2.3. Collage theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.2.4. Finally contracting transformation . . . . . . . . . . . . . . . . . 10.2.5. Attractor and invariant measures . . . . . . . . . . . . . . . . . . 10.2.6. Inverse problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.3. Application to natural image processing: image coding . . . . . . . . . 10.3.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.3.2. Coding of natural images by fractals . . . . . . . . . . . . . . . . 10.3.2.1. Collage of a source block onto a destination block . . . . . 10.3.2.2. Hierarchical partitioning . . . . . . . . . . . . . . . . . . . . 10.3.2.3. Coding of the collage operation on a destination block . . . 10.3.2.4. Contraction control of the fractal transformation . . . . . . 10.3.3. Algebraic formulation of the fractal transformation . . . . . . . . 10.3.3.1. Formulation of the mass transformation . . . . . . . . . . . 10.3.3.2. Contraction control of the fractal transformation . . . . . . 10.3.3.3. Fisher formulation . . . . . . . . . . . . . . . . . . . . . . . 10.3.4. Experimentation on triangular partitions . . . . . . . . . . . . . . 10.3.5. Coding and decoding acceleration . . . . . . . . . . . . . . . . . 10.3.5.1. Coding simplification suppressing the research for similarities . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.3.5.2. Decoding simplification by collage space orthogonalization 10.3.5.3. Coding acceleration: search for the nearest neighbor . . . . 10.3.6. Other optimization diagrams: hybrid methods . . . . . . . . . . . 10.4. Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
334 334 334 335 335 335 336 338 339 340 340 340 342 342 344 345 345 345 347 349 350 351 352 352 358 360 360 362
Chapter 11. Local Regularity and Multifractal Methods for Image and Signal Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367 Pierrick L EGRAND 11.1. Introduction . . . . . . . . . . . . . . . . . . . . . 11.2. Basic tools . . . . . . . . . . . . . . . . . . . . . 11.2.1. Hölder regularity analysis . . . . . . . . . . 11.2.2. Reminders on multifractal analysis . . . . . 11.2.2.1. Hausdorff multifractal spectrum . . . 11.2.2.2. Large deviation multifractal spectrum 11.2.2.3. Legendre multifractal spectrum . . . . 11.3. Hölderian regularity estimation . . . . . . . . . . 11.3.1. Oscillations (OSC) . . . . . . . . . . . . . . 11.3.2. Wavelet coefficient regression (W CR) . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
367 368 368 369 369 370 371 371 371 372
Contents
11.3.3. Wavelet leaders regression (W L) . . . . . . . . . . . . . . . 11.3.4. Limit inf and limit sup regressions . . . . . . . . . . . . . . 11.3.5. Numerical experiments . . . . . . . . . . . . . . . . . . . . . 11.4. Denoising . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.4.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.4.2. Minimax risk, optimal convergence rate and adaptivity . . . 11.4.3. Wavelet based denoising . . . . . . . . . . . . . . . . . . . . 11.4.4. Non-linear wavelet coefficients pumping . . . . . . . . . . . 11.4.4.1. Minimax properties . . . . . . . . . . . . . . . . . . . . 11.4.4.2. Regularity control . . . . . . . . . . . . . . . . . . . . 11.4.4.3. Numerical experiments . . . . . . . . . . . . . . . . . 11.4.5. Denoising using exponent between scales . . . . . . . . . . 11.4.5.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 11.4.5.2. Estimating the local regularity of a signal from noisy observations . . . . . . . . . . . . . . . . . . . . . . . . . 11.4.5.3. Numerical experiments . . . . . . . . . . . . . . . . . 11.4.6. Bayesian multifractal denoising . . . . . . . . . . . . . . . . 11.4.6.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 11.4.6.2. The set of parameterized classes S(g, ψ) . . . . . . . 11.4.6.3. Bayesian denoising in S(g, ψ) . . . . . . . . . . . . . 11.4.6.4. Numerical experiments . . . . . . . . . . . . . . . . . 11.4.6.5. Denoising of road profiles . . . . . . . . . . . . . . . . 11.5. Hölderian regularity based interpolation . . . . . . . . . . . . . . 11.5.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.5.2. The method . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.5.3. Regularity and asymptotic properties . . . . . . . . . . . . . 11.5.4. Numerical experiments . . . . . . . . . . . . . . . . . . . . . 11.6. Biomedical signal analysis . . . . . . . . . . . . . . . . . . . . . . 11.7. Texture segmentation . . . . . . . . . . . . . . . . . . . . . . . . . 11.8. Edge detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.8.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.8.1.1. Edge detection . . . . . . . . . . . . . . . . . . . . . . 11.9. Change detection in image sequences using multifractal analysis 11.10. Image reconstruction . . . . . . . . . . . . . . . . . . . . . . . . 11.11. Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
372 373 374 376 376 377 378 380 380 381 382 383 383
. . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . .
384 386 386 386 387 388 390 391 393 393 393 394 394 394 401 403 403 406 407 408 409
Chapter 12. Scale Invariance in Computer Network Traffic . . . . . . . . . 413 Darryl V EITCH 12.1. Teletraffic – a new natural phenomenon . . . . . . . 12.1.1. A phenomenon of scales . . . . . . . . . . . . . 12.1.2. An experimental science of “man-made atoms” 12.1.3. A random current . . . . . . . . . . . . . . . . . 12.1.4. Two fundamental approaches . . . . . . . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
413 413 415 416 417
14
Scaling, Fractals and Wavelets
12.2. From a wealth of scales arise scaling laws . . 12.2.1. First discoveries . . . . . . . . . . . . . . 12.2.2. Laws reign . . . . . . . . . . . . . . . . . 12.2.3. Beyond the revolution . . . . . . . . . . 12.3. Sources as the source of the laws . . . . . . . 12.3.1. The sum or its parts . . . . . . . . . . . . 12.3.2. The on/off paradigm . . . . . . . . . . . 12.3.3. Chemistry . . . . . . . . . . . . . . . . . 12.3.4. Mechanisms . . . . . . . . . . . . . . . . 12.4. New models, new behaviors . . . . . . . . . . 12.4.1. Character of a model . . . . . . . . . . . 12.4.2. The fractional Brownian motion family 12.4.3. Greedy sources . . . . . . . . . . . . . . 12.4.4. Never-ending calls . . . . . . . . . . . . 12.5. Perspectives . . . . . . . . . . . . . . . . . . . 12.6. Bibliography . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
419 419 420 424 426 426 427 428 429 430 430 431 432 432 433 434
Chapter 13. Research of Scaling Law on Stock Market Variations . . . . . 437 Christian WALTER 13.1. Introduction: fractals in finance . . . . . . . . . . . . . . . . . . . . . . 13.2. Presence of scales in the study of stock market variations . . . . . . . 13.2.1. Modeling of stock market variations . . . . . . . . . . . . . . . . 13.2.1.1. Statistical apprehension of stock market fluctuations . . . . 13.2.1.2. Profit and stock market return operations in different scales 13.2.1.3. Traditional financial modeling: Brownian motion . . . . . . 13.2.2. Time scales in financial modeling . . . . . . . . . . . . . . . . . . 13.2.2.1. The existence of characteristic time . . . . . . . . . . . . . . 13.2.2.2. Implicit scaling invariances of traditional financial modeling 13.3. Modeling postulating independence on stock market returns . . . . . . 13.3.1. 1960-1970: from Pareto’s law to Lévy’s distributions . . . . . . . 13.3.1.1. Leptokurtic problem and Mandelbrot’s first model . . . . . 13.3.1.2. First emphasis of Lévy’s α-stable distributions in finance . 13.3.2. 1970–1990: experimental difficulties of iid-α-stable model . . . 13.3.2.1. Statistical problem of parameter estimation of stable laws . 13.3.2.2. Non-normality and controversies on scaling invariance . . 13.3.2.3. Scaling anomalies of parameters under iid hypothesis . . . 13.3.3. Unstable iid models in partial scaling invariance . . . . . . . . . 13.3.3.1. Partial scaling invariances by regime switching models . . 13.3.3.2. Partial scaling invariances as compared with extremes . . . 13.4. Research of dependency and memory of markets . . . . . . . . . . . . 13.4.1. Linear dependence: testing of H-correlative models on returns . 13.4.1.1. Question of dependency of stock market returns . . . . . . 13.4.1.2. Problem of slow cycles and Mandelbrot’s second model . .
437 439 439 439 442 443 445 445 446 446 446 446 448 448 448 449 451 452 452 453 454 454 454 455
Contents
13.4.1.3. Introduction of fractional differentiation in econometrics . 13.4.1.4. Experimental difficulties of H-correlative model on returns 13.4.2. Non-linear dependence: validating H-correlative model on volatilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.4.2.1. The 1980s: ARCH modeling and its limits . . . . . . . . . 13.4.2.2. The 1990s: emphasis of long dependence on volatility . . . 13.5. Towards a rediscovery of scaling laws in finance . . . . . . . . . . . . 13.6. Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 14. Scale Relativity, Non-differentiability and Fractal Space-time Laurent N OTTALE
15
455 456 456 456 457 457 458 465
14.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 465 14.2. Abandonment of the hypothesis of space-time differentiability . . . . 466 14.3. Towards a fractal space-time . . . . . . . . . . . . . . . . . . . . . . . . 466 14.3.1. Explicit dependence of coordinates on spatio-temporal resolutions 467 14.3.2. From continuity and non-differentiability to fractality . . . . . . 467 14.3.3. Description of non-differentiable process by differential equations 469 14.3.4. Differential dilation operator . . . . . . . . . . . . . . . . . . . . 471 14.4. Relativity and scale covariance . . . . . . . . . . . . . . . . . . . . . . 472 14.5. Scale differential equations . . . . . . . . . . . . . . . . . . . . . . . . 472 14.5.1. Constant fractal dimension: “Galilean” scale relativity . . . . . . 473 14.5.2. Breaking scale invariance: transition scales . . . . . . . . . . . . 474 14.5.3. Non-linear scale laws: second order equations, discrete scale invariance, log-periodic laws . . . . . . . . . . . . . . . . . . . . . 475 14.5.4. Variable fractal dimension: Euler-Lagrange scale equations . . . 476 14.5.5. Scale dynamics and scale force . . . . . . . . . . . . . . . . . . . 478 14.5.5.1. Constant scale force . . . . . . . . . . . . . . . . . . . . . . 479 14.5.5.2. Scale harmonic oscillator . . . . . . . . . . . . . . . . . . . 480 14.5.6. Special scale relativity – log-Lorentzian dilation laws, invariant scale limit under dilations . . . . . . . . . . . . . . . . . . . . . . . 481 14.5.7. Generalized scale relativity and scale-motion coupling . . . . . . 482 14.5.7.1. A reminder about gauge invariance . . . . . . . . . . . . . . 483 14.5.7.2. Nature of gauge fields . . . . . . . . . . . . . . . . . . . . . 484 14.5.7.3. Nature of the charges . . . . . . . . . . . . . . . . . . . . . . 486 14.5.7.4. Mass-charge relations . . . . . . . . . . . . . . . . . . . . . 488 14.6. Quantum-like induced dynamics . . . . . . . . . . . . . . . . . . . . . 488 14.6.1. Generalized Schrödinger equation . . . . . . . . . . . . . . . . . 488 14.6.2. Application in gravitational structure formation . . . . . . . . . . 492 14.7. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 493 14.8. Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 495 List of Authors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 499 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 503
This page intentionally left blank
Preface
It is a common scheme in many sciences to study systems or signals by looking for characteristic scales in time or space. These are then used as references for expressing all measured quantities. Physicists may for instance employ the size of a structure, while signal processors are often interested in correlation lengths: (blocks of) samples whose distance is several times the correlation lengths are considered statistically independent. The concept of scale invariance may be considered to be the converse of this approach: it means that there is no characteristic scale in the system. In other words, all scales contribute to the observed phenomenon. This “non-property” is also loosely referred to as scaling law or scaling behavior. Note that we may reverse the perspective and consider scale invariance as the signature of a strong organization in the system. Indeed, it is well known in physics that invariance laws are associated with fundamental properties. It is remarkable that phenomena where scaling laws have been observed cover a wide range of fields, both in natural and artificial systems. In the first category, these include for instance hydrology, in relation to the variability of water levels, hydrodynamics and the study of turbulence, statistical physics with the study of long-range interactions, electronics with the so-called 1/f noise in semiconductors, geophysics with the distribution of faults, biology, physiology and the variability of human body rhythms such as the heart rate. In the second category, we may mention geography with the distribution of population in cities or in continents, Internet traffic and financial markets. From a signal processing perspective, the aim is then to study transfer mechanisms between scales (also called “cascades”) rather than to identify relevant scales. We are thus led to forget about scale-based models (such as Markov models), and to focus on models allowing us to study correspondences between many scales. The central notion behind scaling laws is that of self-similarity. Loosely speaking, this means that each part is (statistically) the same as the whole object. In particular, information gathered from observing the data should be independent of the scale of observation.
18
Scaling, Fractals and Wavelets
There is considerable variety in observed self-similar behaviors. They may for instance appear through scaling laws in the Fourier domain, either at all frequencies or in a finite but large range of frequencies, or even in the limit of high or low frequencies. In many cases, studying second-order quantities such as spectra will prove insufficient for describing scaling laws. Higher-order moments are then necessary. More generally, the fundamental model of self-similarity has to be adapted in many settings, and to be generalized in various directions, so that it becomes useful in real-world situations. These include self-similar stochastic processes, 1/f processes, long memory processes, multifractal and multifractional processes, locally self-similar processes and more. Multifractal analysis, in particular, has developed as a method allowing us to study complex objects which are not necessarily “fractal”, by describing the variations of local regularity. The recent change of paradigm consisting of using fractal methods rather than studying fractal objects is one of the reasons for the success of the domain in applications. We are delighted to invite our reader for a promenade in the realm of scaling laws, its mathematical models and its real-world manifestations. The 14 chapters have all been written by experts. The first four chapters deal with the general mathematical tools allowing us to measure fractional dimensions, local regularity and scaling in its various disguises. Wavelets play a particular role for this purpose, and their role is emphasized. Chapters 5 and 6 describe advanced stochastic models relevant in our area. Chapter 7 deals with fractional calculus, and Chapter 8 explains how to synthesize certain fractal models. Chapter 9 gives a general introduction to IFS, a powerful tool for building and describing fractals and other complex objects, while Chapter 10, of applied nature, considers the application of IFS to image compression. The four remaining chapters also deal with applications: various signal and image processing tasks are considered in Chapter 11. Chapter 12 deals with Internet traffic, and Chapter 13 with financial data analysis. Finally, Chapter 14 describes a fractal space-time in the frame of cosmology. It is a great pleasure for us to thank all the authors of this volume for the quality of their contribution. We believe they have succeeded in exposing advanced concepts with great pedagogy.
Chapter 1
Fractal and Multifractal Analysis in Signal Processing
1.1. Introduction The aim of this chapter is to describe some of the fundamental concepts of fractal analysis in view of their application. We will thus present a simple introduction to the concepts of fractional dimension, regularity exponents and multifractal analysis, and show how they are used in signal and image processing. Since we are interested in applications, most theoretical results are given without proofs. These are available in the references mentioned where appropriate. In contrast, we will pay special attention to the practical aspects. In particular, almost all the notions explained below are implemented in the FracLab toolbox. This toolbox is freely available from the following site: http://complex.futurs.inria .fr/FracLab/, so that interested readers may perform hands-on experiments. Before we start, we wish to emphasize the following point: recent successes of fractal analysis in signal and image processing do not generally stem from the fact that they are applied to fractal objects (in a more or less strict sense). Indeed, most real-world signals are neither self-similar nor display the characteristics usually associated with fractals (except for the irregularity at each scale). The relevance of fractal analysis instead results from the progress made in the development of fractal methods. Such methods have lately become more general and reliable, and they now allow to describe precisely the singular structure of complex signals,
Chapter written by Jacques L ÉVY V ÉHEL and Claude T RICOT.
20
Scaling, Fractals and Wavelets
without any assumption of “fractality”: as a rule, performing a fractal analysis will be useful as soon as the considered signal is irregular and this irregularity contains meaningful information. There are numerous examples of such situations, ranging from image segmentation (where, for instance, contours are made of singular points; see section 1.4.7 and Chapter 11) to vocal synthesis [DAO 02] or financial analysis. This chapter roughly follows the chronological order in which the various tools have been introduced. We first describe several notions of fractional dimensions. These provide a global characterization of a signal. We then introduce Hölder exponents, which supply local measures of irregularity. The last part of the chapter is devoted to multifractal analysis, a most refined tool that describes the local as well as the overall singular structure of signals. All the concepts presented here are more fully developed in [TRI 99, LEV 02]. 1.2. Dimensions of sets The concept of dimension applies to objects more general than signals. To simplify, we shall consider sets in a metric space, although the notion of dimension makes sense for more complex entities such as measures or classes of functions [KOL 61]. Several interesting notions of dimension exist. This might look like a drawback for the mathematical analysis of fractal sets. However, it is actually an advantage, since each dimension emphasizes a different aspect of an object. It is thus worthwhile to determine the specificity of each dimension. As a general rule, none of these tools outperform the other. Let us first give a general definition of the notion of dimension. DEFINITION 1.1.– We call dimension an application d defined on the family of bounded sets of Rn and ranging in R+ ∪ {−∞}, such that: 1) d(∅) = −∞, d({x}) = 0 for any point x; 2) E1 ⊂ E2 ⇒ d(E1 ) d(E2 ) (monotonicity); 3) if E has non-zero n-dimensional volume, then d(E) = n; 4) if E is a diffeomorphism T of Rn (such as, in particular, a similarity with non-zero ratio, or a non-singular affine application), then d(T (E)) = d(E) (invariance). Moreover, we will say that d is stable if d(E1 ∪ E2 ) = max{d(E1 ), d(E2 )}. It is said to be σ-stable if, for any countable collection of sets: d ∪n En = sup d En σ-stable dimensions may be extended in a natural way to characterize unbounded sets of Rn .
Fractal and Multifractal Analysis in Signal Processing
21
1.2.1. Minkowski-Bouligand dimension The Minkowski-Bouligand dimension was invented by Bouligand [BOU 28], who named it the Cantor-Minkowski order. It is now commonly referred to as the box dimension. Let us cover a bounded set E of Rn with cubes of side ε and disjoint interiors. Let Nε (E) be the number of these cubes. When E contains an infinite number of points (i.e. if it is a curve, a surface, etc.), Nε (E) tends to +∞ when ε tends to 0. The box dimension Δ characterizes the rate of this growth. Roughly speaking, Δ is the real number such that: Δ 1 , Nε (E) ε assuming this number exists. More generally, we define, for all bounded E, the number: Δ(E) = lim sup ε→∞
log Nε (E) |log ε|
(1.1)
A lower limit may also be used: δ(E) = lim inf ε→∞
log Nε (E) |log ε|
(1.2)
Note that some authors refer to the box dimension only when both indices coincide, that is, when the limit exists. Both indices Δ and δ are dimensions in the sense previously defined. However, Δ is stable, contrarily to δ, so that Δ is more commonly used. Let us mention an ¯ denotes the closure of E (the set of all limit points of important property: if E sequences in E), then: ¯ = Δ(E) Δ(E) This property shows that Δ is not sensitive to the topological type of E. It only characterizes the density of a set. For example, the (countable) set of the rational numbers of the interval [0, 1] has one dimension, which is the dimension of the interval itself. Even discrete sequences may have non-zero dimension: let, for instance, E be the set of numbers n−α with α > 0. Then Δ(E) = 1/(α + 1). Equivalent definitions It is not mandatory to use cubes to calculate Δ. The original definition of Bouligand is as follows:
22
Scaling, Fractals and Wavelets
– in Rn , let us consider the Minkowski sausage: E(ε) = ∪x∈E Bε (x) which is the union of all the balls of radius ε centered at E. Denote its volume by Voln (E(ε)). This volume is approximately of the order of Nε (E) εn . This allows us to give the equivalent definition: Voln E(ε) ; (1.3) Δ(E) = lim sup n − log ε ε→0 – we may also define Nε (E), which is the smallest number of balls of radius ε covering E; or Nε (E), the largest number of disjoint balls of radius ε centered on E. Replacing Nε (E) by any of these values in equation (1.1) still gives Δ(E). Discrete values of ε In these definitions, the variable ε is continuous. The results remain the same if we use a discrete sequence such as εn = 2−n . More generally we may replace ε with any sequence which does not converge too quickly towards 0. More precisely, we require that: lim
n→∞
log εn = 1. log εn+1
This remark is important, as it allows us to perform numerical estimations of Δ. Let us now give some well-known examples of calculating dimensions. EXAMPLE 1.1.– Let (an ) be a sequence of real numbers such that 0 < 2an+1 < an < a0 = 1. Let E0 = [0, 1]. We construct by induction a sequence of sets (En ) such that En is made of 2n closed disjoint intervals of length an , each containing exactly two intervals of En+1 . The sets En are nested, and the sequence (En ) converges to a compact set E such that: E = ∩ n En . Let us consider a particular case. When all the interval extremities En are also interval extremities of En+1 , E is called a perfect symmetric set [KAH 63] or sometimes, more loosely, a Cantor set. Assume that the ratio log an / log an+1 tends to 1. According to the previous comment on discrete sequences, we obtain the following values: δ(E) = lim inf n→∞
n log 2 , |log an |
Δ(E) = lim sup n→∞
n log 2 . |log an |
Fractal and Multifractal Analysis in Signal Processing
23
However, these results are true for any sequence (an ). Even more specifically, consider the case where an = an , with 0 < a < 12 . The ratios an /an+1 are then constant and dimensions take the common value log 2/|log a|. This is the case of the self-similar set which satisfies the following relation: E = f1 (E) ∪ f2 (E) with f1 (x) = a x and f2 (x) = a x + 1 − a. This set is the attractor of the iterated function system {f1 , f2 } (see Chapters 9 and 10). It is also called a perfect symmetric set with constant ratio. EXAMPLE 1.2.– We construct a planar self-similar curve with extremities A and B, A = B as follows: take N + 1 distinct points A1 = A, A2 , . . . , AN +1 = B, such that dist(Ai , Ai+1 ) < dist(A, B). For each i = 1, . . . , N , define a similarity fi (that is, a composition of a homothety, an orthogonal transformation and a translation), such that fi (AB) = Ai Ai+1 . The ratio of fi is ai = dist(Ai , Ai+1 )/ dist(A, B). Starting from the segment Γ0 = AB, define by induction the polygonal curves Γn = ∪i fi (Γn−1 ). This sequence (Γn ) converges to a curve Γ which satisfies the following relation: Γ = ∪i fi (Γ). In other words, Γ is the attractor of the IFS {f1 , . . . , fn }. When Γ is simple, the dimensions δ and Δ assume a common value, which is also the similarity dimension, i.e. the unique solution of the equation N
axi = 1.
i=1
In the particular case where all distances dist(Ai , Ai+1 ) are the same, the ratios ai are equal to a value a such that N a > 1 (necessary condition for the continuity of Γ) and N a2 < 1 (necessary condition for the simplicity of Γ). Clearly, δ(Γ) = Δ(Γ) = log N/|log a|.
0
1/3
2/3
1
Figure 1.1. Von Koch curve, the attractor of a system of four similarities with common ratio
1 3
24
Scaling, Fractals and Wavelets
Function scales The previous definitions all involve ratios of logarithms. This is an immediate consequence of the fact that a dimension is defined as an order of growth related to the scale of functions {tα , α > 0}. In general, a scale of functions F in the neighborhood of 0 is a family of functions which are all comparable in the Hardy sense, that is, for any f and g in F, the ratio f (x)/g(x) tends to a limit (possibly +∞ or −∞) when x tends to 0. Function scales are defined in a similar way in the neighborhood of +∞. Scales other than {tα } will yield other types of dimensions. A dimension must be considered as a Dedekind cut in a given scale of functions. The following expressions will make this clearer: Δ(E) = inf{α such that εα Nε (E) → 0} δ(E) = sup{α such that εα Nε (E) → +∞}
(1.4) (1.5)
these are equivalent to equations (1.1) and (1.2) (see [TRI 99]). Complementary intervals on the line In the particular case where the compact E lies in an interval J of the line, the complementary set of E in J is a union of disjoint open intervals, whose lengths will be denoted by cn . Let |E| be the Lebesgue measure of E (which means, for an interval, its length). The dimension of E may be written as: log E(ε) Δ(E) = lim sup 1 − log ε ε→0 If |E| = 0, the sum of the cn is equal to the length of J. The dimension is then equal to the convergence exponent of the series cn :
α Δ(E) = inf α such that cn < +∞ (1.6) n
Proof. This result may be obtained by calculating an approximation of the length of Minkowski sausage E(ε). Let us assume that the complementary intervals are ranked in decreasing lengths: c1 c2 · · · cn · · · If |E| = 0 and if cn ε > cn+1 , then: |E(ε)| nε +
in
ci
Fractal and Multifractal Analysis in Signal Processing
thus εα−1 L(E(ε)) nεα + εα−1
in ci .
It may be shown that both values
α
inf{α such that nε < +∞} and
25
inf
α such that ε
α−1
ci < +∞
in
are equal to the convergence exponent. It is therefore equal to Δ(E). EXERCISE 1.1.– Verify formula (1.6) for the perfect symmetric sets of Example 1.1. If |E| = 0, then the convergence exponent of cn still makes sense. It characterizes a degree of proximity of the exterior with the set E. More precisely, we obtain log|E(ε) − E| α (1.7) cn < +∞ = lim sup 1 − inf α such that log ε ε→0 n where the set E(ε) − E refers to the Minkowski sausage of E deprived of the points of E. How can we generalize the study of the complementary set in Rn with n 2? The open intervals must be replaced with an appropriate paving. The results connecting the elements of this paving to the dimension depend both on the geometry of the tiles and on their respective positions. The topology of the complementary set must be investigated more deeply [TRI 87]. The index that generalizes (1.7) (replacing the 1 of the space dimension by n) is the fact fractal exponent, studied in [GRE 85, TRI 86b]. In the case of a zero area curve in R2 , this also leads to the notion of lateral dimension. Note that the dimensions corresponding to each side of the curve are not necessarily equal [TRI 99].
1.2.2. Packing dimension The packing dimension is, to some extent, a regularization of the box dimension [TRI 82]. Indeed, Δ is not σ-stable, but we may derive a σ-stable dimension from any index thanks to the operation described below. PROPOSITION 1.1.– Let B be the family of all bounded sets of Rn and α : B −→ R+ . Then, the function α ˆ defined for any subsets of Rn as: α ˆ (E) = inf{sup α(Ei )/E = ∪Ei , Ei ∈ B} is monotonous and σ-stable.
26
Scaling, Fractals and Wavelets
Proof. Any subset E of Rn is a union of bounded sets. If E1 ⊂ E2 , then any covering of E1 may be completed with a covering of E2 . This entails monotonicity. Now, let ε > 0 and a sequence (Ek )k1 of sets whose union is E. For any k, there ˆ (Ek ) + ε2−k . Since exists a decomposition (Ei,k ) of Ek such that sup α(Ei,k ) α E = ∪i,k Ei,k , we deduce that: α ˆ (E) sup α ˆ (Ek ) + ε
2−k = sup α ˆ (Ek ) + ε
k
k
Thus, the inequality α ˆ (E) supk α ˆ (Ek ) holds. The converse inequality stems from monotonicity. The packing dimension is the result of this operation on Δ. We set ˆ Dim = Δ The term packing will be explained later. The new index Dim is indeed a dimension, and it is σ-stable. Therefore, contrarily to Δ, it vanishes for countable sets. The inequality: Dim(E) Δ(E) is true for any bounded set. This becomes an equality when E presents a homogenous structure in the following sense: THEOREM 1.1.– Let E be a compact set such that, for all open sets U intersecting E, Δ(E ∩ U ) = Δ(E). Then Δ(E) = Dim(E). Proof. Let Ei be a decomposition of E. Since E is compact, a Baire theorem entails that the Ei are not all nowhere dense in E. Therefore, there exist an index i0 and an ¯ ∩ U , which yields: open set U intersecting E such that Ei0 ∩ U = E ¯ ∩ U ) Δ(E ∩ U ) = Δ(E) Δ(Ei0 ) = Δ(Ei0 ) Δ(Ei0 ∩ U ) = Δ(E As a result, Δ(E) supi Δ(Ei ), and thus Δ(E) Dim(E). The converse inequality is always true. EXAMPLE 1.3.– All self-similar sets are of this type, including those presented above: Cantor sets and curves. For these sets, the packing dimension has the same value as Δ(E).
Fractal and Multifractal Analysis in Signal Processing
27
EXAMPLE 1.4.– Dense sets in [0, 1], when they are not compact, do not necessarily have a packing dimension equal to 1. Let us consider, for any real p, 0 < p < 1, the set Ep of p-normal numbers, that is, those numbers whose frequency of zeros in their dyadic expansion is equal to p. Any dyadic interval of [0, 1], however small it may be, contains points of Ep , so Ep is dense in [0, 1]. As a consequence, Δ(Ep ) = 1. In contrast, the value of Dim(Ep ) is: Dim(Ep ) =
1 p log p + (1 − p) log(1 − p). log 2
This result will be derived in section 1.3.2. 1.2.3. Covering dimension The covering dimension was introduced by Hausdorff [HAU 19]. Here we adopt the traditional approach through Hausdorff measures; a direct approach, using Vitali’s covering convergence exponent, may be used to calculate the dimension without using measures [TRI 99]. Covering measures Originally, the covering measures were defined to generalize and, most of all, to precisely define the concepts of length, surface, volume, etc. They constitute an important tool in geometric measure theory. Firstly, let us consider a determining function φ: R+ −→ R+ , which is increasing and continuous in the neighborhood of 0, and such that φ(0) = 0. Let E be a set in a metric space (that is, a space where a distance has been defined). For every ε > 0, we consider all the coverings of E by bounded sets Ui of diameter diam(Ui ) ε. Let Hεφ (E) = inf φ diam(Ui ) /E ⊂ ∪i Ui , diam(Ei ) ε . When ε tends to 0, this quantity (possibly infinite) cannot decrease. The limit corresponds to the φ-Hausdorff measure: H φ (E) = lim Hεφ (E) ε→0
In this definition, the covering sets Ui can be taken in a more restricted family. If we suppose that Ui are open, or convex, the result remains unchanged. The main properties are that of any Borel measure: – E1 ⊂ E2 =⇒ H φ (E1 ) H φ (E2 );
28
Scaling, Fractals and Wavelets
– if (Ei ) is a collection of countable sets, then H φ (∪Ei )
H φ (Ei )
i
– if E1 and E2 are at non-zero distance from each other, any ε-covering of E1 is disjoint from any ε-covering of E2 when ε is sufficiently small. Then H φ (E1 ∪ E2 ) = H φ (E1 ) + H φ (E1 ). This implies that H φ is a metric measure. The Borel sets are H φ -measurable and for any collection (Ei ) of disjoint Borel sets, φ H (∪i Ei ) = i H φ (Ei ). The scale of functions tα In the case where φ(t) = tα with α > 0, we use the simple notation H φ = H α . Consider the case α = 1. For any curve Γ the value H 1 (Γ) is equal to the length of Γ. Therefore H 1 is a generalization of the concept of length: it may be applied to any subset of the metric space. Now let α = 2. For any plane surface S, the value of H 2 (S) is proportional to the area of S. For non-plane surfaces, H 2 provides an appropriate mathematical definition of area – using a triangulation of S is not acceptable from a theoretical point of view. More generally, when α is an integer, H α is proportional to the α-dimensional volume. However, α can also take non-integer values, which makes it possible to define the dimension of any set. The use of the term dimension is justified by the following property: if aE is the image of E by a homothety of ratio a, then H α (aE) = aα H α (E)
Measures estimated using boxes If we want to restrict the class of sets from which coverings are taken even more, one option would be to cover E with centered balls or dyadic boxes. In each case, the result is a measure H ∗α which is generally not equal to H α (E); nevertheless, it is an equivalent measure in the sense that we can find two non-zero constants, c1 and c2 , such that for any E: c1 H α (E) H ∗α (E) c2 H α (E) Clearly the H ∗α measures give rise to the same dimension.
Fractal and Multifractal Analysis in Signal Processing
29
Dimension For every E, there exists a unique critical value α such that: – H α (E) > 0 and β < α =⇒ H β (E) = +∞; – H α (E) < +∞ and β > α =⇒ H β (E) = 0. The dimension is defined as
dim(E) = inf α such that H α (E) = 0
= sup α such that H α (E)= + ∞
(1.8)
NOTE 1.1.– This approach is not very different from the one which leads to the box dimension. Compare equation (1.8) with equations (1.4) and (1.5). Once again, it may be generalized by using other function scales than tα . Properties The properties of dim directly stem from those of H α measures. It is a σ-stable dimension, like Dim. To compare all these dimensions, let us observe that δ can be defined in the same manner as the covering dimension, by using coverings made up of sets of equal diameter. This implies the inequality dim(E) δ(E) for any E. The σ-stability property then implies the following: ˆ ˆ dim(E) δ(E) Δ(E) = Dim(E) Δ(E). These inequalities may be strict. However, the equality dim(E) = Dim(E), and even dim(E) = Δ(E), occur in cases where E is sufficiently regular. Examples include rectifiable curves and self-similar sets. Packing measures By considering packings of E, that is, families of disjoint sets at zero distance from E, and by switching inf and sup in the definitions, it is possible to define packing measures which are symmetric to the covering measures and whose critical index is precisely equal to Dim. This explains why Dim is called a packing dimension. 1.2.4. Methods for calculating dimensions Since a dimension is an index of irregularity, it may be used for the classification of irregular sets and for characterizing phenomenons showing erratic behaviors. Here we focus on signals. In practice, we may assume that the signal Γ is given in axes Oxy by discrete data (xi , fi ), with xi < xi+1 : for example, a time series. Notice that other sets, such as geographical curves obtained by aerial photography, are not of this type, so that the analysis tools can be different.
30
Scaling, Fractals and Wavelets
The algorithms used for estimating a dimension rely on the theoretical formula which defines the dimension, with the usual limitation of a minimal scale, roughly equal to the distances xi+1 − xi . Indeed, it is impossible to go further into the dataset structure without a reconstruction at very small scales, which leads to adding up arbitrarily new data. The evaluation of a dimension, whose theoretical definition requires the finest coverings possible, is therefore difficult to justify. This is why we do not propose algorithms for the estimation of σ-stable dimensions such as Hausdorff or packing dimensions. Calculations may be performed to estimate Δ or related dimensions that are naturally adapted to signal analysis. They are usually carried out using logarithmic diagrams. A quantity Q(f, ε) is estimated, which characterizes the irregularity for a certain number of values of the resolution ε between two limit values εmax and εmin . If Q(f, ε) follows a power law of the type cεΔ , then log Q(f, ε) is an affine function of log ε, with slope Δ. The idea is to seek functions Q(f, ε) which provide appropriate logarithmic diagrams, i.e. allow us to estimate the slopes precisely. Here are some examples. Boxes and disks After counting the number Nε of squares of a network of sides ε, we draw the diagram (|log ε|, log Nε ). Although it is very easy to program, the method presents an obvious disadvantage: the quantities Nε are integers, which makes the diagram chaotic for large values of ε. We could try a method using the Minkowski sausage of Γ: 1 |log ε|, log 2 Aire Γ(ε) ε However, this method is more difficult to program than the previous method and also lacks precision: the diagram shows, in general, a strong concavity – even for a curve as simple as a straight line segment! These methods are very popular, in spite of numerical difficulties. Unfortunately there is a major drawback for signals. The coordinates (xi , fi ) are not, in general, of the same nature. If they refer, for example, to stock exchange data, xi is a time value and fi an exchange value. In this case, it makes no sense to give the units the same value on the axis Ox and Oy. The covering of Γ by squares or disks is therefore meaningless. It is preferable to use algorithms which will provide a slope independent of changes of unit. For this purpose, the calculated quantity Q(f, ε) should satisfy the following properties: for any real a, there exists c(a), such that, for any ε: Q(af, ε) = c(a) Q(f, ε) as is the case for the methods which are described below.
(1.9)
Fractal and Multifractal Analysis in Signal Processing
31
Variation method Here we anticipate section 1.3.4. The oscillation of a function f on any set F is defined as β(f, F ) = sup{f (t) − f (t ) t, t ∈ F } The ε-variation on an interval J is the arithmetic mean of the oscillation over intervals of length 2ε: 1 β(f, [t − ε, t + ε] ∩ J) dt varε (f ) = |J| J The variation method [DUB 89, TRI 88] consists of finding the slope of the diagram: 1 |log ε|, log 2 varε (f ) . ε Since β(af, [x − ε, x + ε]) = a β(f, [x − ε, x + ε]) for all x and ε, we obtain through integration varε (af ) = a varε (f ), so equation (1.9) is satisfied. In this case we obtain diagrams which present an almost perfect alignment for a large class of known signals. Furthermore, this method presents the following advantages: ease of programming and speed of execution. Lp norms method Let us define the Lp norm of a function f : D ⊂ Rn −→ R by the relation: 1/p 1 p |f (x)| dx . Lp (f ) = Voln (D) D It is a functional norm when p 1. When p → +∞, the expression Lp (f ) tends to the norm: L∞ (f ) = sup |f (x)| x∈D
Given a signal f defined on [a, b] and the values x ∈ [a, b] and ε > 0, we apply this tool at any x to the local function difference defined by f (x) − f (x ) where x − x ε. Using the norm L∞ , this gives supx ∈[x−ε,x+ε] (|f (x) − f (x )|). This quantity is equivalent to ε-oscillation, since 1 β f, [x − ε, x + ε] sup (|f (x) − f (x )|) β f, [x − ε, x + ε] 2 x ∈[x−ε,x+ε]
32
Scaling, Fractals and Wavelets
It is therefore possible to replace the ε-variation of f by the integral over J of supx ∈[x−ε,x+ε] (|f (x) − f (x )|), without altering the theoretical result for Δ. However, it is also possible to use Lp norms. Indeed, the oscillation (or the local norm L∞ ) only takes into account the peaks of the function. In practice, it can happen that these peaks are measured with a significant error, or even destroyed in the process of acquisition of data (profiles of rough surfaces, for example). It is preferable to use all the intermediate values and replace the ε-variation with the quantity: J
1 2ε
x+ε
|f (x) − f (x )|p dx
1/p dt
x−ε
In this expression, large values of p allow us to emphasize the effect of local peaks, whereas if p = 1, all the values of function f have equal importance. These integrals make it possible to rectify the corresponding logarithmic diagram and to calculate the slope with precision. We can also replace the above integral on J by a norm Lq , with q > 1. If q is large, this will take into account the more irregular parts of the signal. We can also change the integral in the window [x−ε, x+ε] into a convolution product by a kernel of type K(x /ε), so that the results are even smoother. However, it should be noted that except for particular cases (Weierstrass functions, for example), we do not exactly calculate the dimension Δ with these methods, but rather an index smaller than Δ [TRI 99], which nevertheless remains relevant to the signal irregularity. Let us develop an example of the index just referred to. Let K be a kernel belonging to the Schwartz class, with integral equal to 1. Let Ka (t) = a1 K( at ) for a > 0. For a function f defined in a compact, let f a be the convolution of f with Ka . Since f a is regular, the length Λa of its graph is finite. We define the regularization dimension dimR (f ) of the graph of f as: dimR (f ) = 1 + lim
a→0
log(Λa ) . − log a
(1.10)
This dimension measures the speed at which the length of less and less regularized versions of the graph of f tend to infinity. It is easily proved that if f is continuous, the inequality dimR Δ is always true. An interesting aspect of the dimension of regularization is that it is a well-adapted estimation tool. Results obtained on usual signals (Weierstrass function, iterated function system and Brownian fractional motion) are generally satisfactory, even for small-sized samples (a few hundred points). Moreover, the simple analytical form of dimR allows us to easily obtain an estimator for data corrupted by an additional noise, which is particularly useful in signal processing (see [ROU 98] and the FracLab manual for more details).
Fractal and Multifractal Analysis in Signal Processing
33
1.3. Hölder exponents 1.3.1. Hölder exponents related to a measure The dimensional analysis of a set is related to its local properties. To go further into this study, it is convenient to use a measure μ supported by the set. In many cases (self-similar sets, for example), E is defined at the same time as μ. If E is a curve, constructing a measure on E is called a parameterization. Without a parameterization it is impossible to analyze the curve. However, a given set can support very different measures. Particularly interesting ones are the well-balanced measures, in a sense we will explain. Given a measure μ of Rn , let us first define the Hölder exponent of μ over any measurable set F by αμ (F ) =
log μ(F ) log diam(F )
By convention, 0/0 = 1 and 1/0 = +∞. Given a set E, we use this notion in a local manner, i.e. on arbitrarily small intervals or cubes intersecting E. A pointwise Hölder exponent is then defined using centered balls Bε (x) whose radius tends to 0: αμ (x) = lim inf αμ Bε (x) ε→0
The symmetric exponent can also be useful: αμ (x) = lim sup αμ Bε (x) ε→0
In addition, the geometric context sometimes induces a specific analysis framework. If a measure is defined by its value on the dyadic cubes, it will be easier to use the following Hölder exponents: αμ∗ (x) = lim sup αμ un (x) , α∗μ (x) = lim inf αμ un (x) n→+∞
n→+∞
N where un (x) is the cube of i=1 [ki 2−n , (ki + 1)2−n [ that contains x. Other covering nets can obviously be used, but dyadic cubes are well suited for calculations. 1.3.2. Theorems on set dimensions The first theorem can be used as a basis for understanding the subsequent more technical results.
34
Scaling, Fractals and Wavelets
THEOREM 1.2.– Let μ be a finite measure such that μ(E) > 0. Assume that there exists a real α, such that for any x ∈ E: αμ∗ (x) = α∗μ (x) = α Then α = dim(E) = Dim(E). Proof. Let ε > 0. Let En be the subset of E consisting of all points x such that, for k n: α − ε αμ uk (x) α + ε If ρ < 2−n , any ρ-covering {ui } of En by dyadic cubes of rank ≥ n is such that: |ui |α+ε μ(ui ) |ui |α−ε for all i. Therefore μ(En ) i |ui |α−ε and i |ui |α+ε μ . First, we deduce that H ∗(α−ε) (En ) μ(En ). If n is large enough then μ(En ) > 0. This gives dim(En ) α − ε. Since En ⊂ E, then dim(E) α − ε. Secondly, by using the covering of En formed by dyadic cubes of rank k n, we obtain: N2−k (En )2−k(α+ε) μ Therefore N2−k (En ) μ 2k(α+ε) , which implies Δ(En ) α + ε. As a consequence, Dim(E) α + ε by σ-stability. By making ε tend to 0, we obtain the desired result. An analogous theorem can be stated with balls Bε (x) centered at E. We can develop the arguments of the preceding proof to obtain more general results as follows. THEOREM 1.3.– Assume that μ is a finite measure such that μ(E) > 0. Then:
(1.11) inf αμ (x) dim(E) sup αμ (x) x∈E
x∈E
inf αμ (x) Dim(E) sup αμ (x)
(1.12)
inf αμ (x) Δ(E) lim sup sup αμ (x)
(1.13)
x∈E
x∈E
x∈E
ε→0
x∈E
Fractal and Multifractal Analysis in Signal Processing
35
Inequality (1.13) seems more complex than the others. Nevertheless, we can derive from it a simple result: if 0 < μ(E) < +∞ and if α(Bε (x)) converges uniformly on E to a number α, then α = Δ(E). The same results hold if, for example, we replace the network of balls centered at E with that of dyadic cubes. EXAMPLE 1.5.– A perfect symmetric set (see Example 1.1) is the support of a natural or canonical measure: each of the 2n covering intervals of rank n is associated with the weight 2−n . In the case where the set has constant ratio a, these intervals have rank n and their Hölder exponent assumes the value log 2/|log a| uniformly. This grid of intervals allows the computation of dimensions. Indeed dim(E) = Dim(E) = Δ(E) =
log 2 |log a|
By making the successive ratios an vary, it is also possible to construct a set such that: dim(E) = lim inf
n log 2 , |log an |
Dim(E) = Δ(E) = lim sup
n log 2 |log an |
EXAMPLE 1.6.– The set Ep of p-normal numbers (see Example 1.4) supports a measure which makes it possible to estimate its dimension and which is known as the Besicovitch measure. It is defined on the dyadic intervals [0, 1]. Set μ([0, 12 ]) = p and μ([ 12 , 1]) = 1 − p, p ∈ (0, 1). The weights p and 1 − p are then distributed similarly at the following stages: Since each dyadic interval un of rank n is the union of the intervals vn (on the left) and vn (on the right) of rank n + 1, we put μ(vn ) = p μ(un ),
μ(vn ) = (1 − p) μ(un )
It is easy to calculate the exact measure of each dyadic interval un (x) containing the point x by using the base 2 expansion of x. Denote: N0 (x, n) = number of 0 in the expansion of x between ranks 1 and n N1 (x, n) = number of 1 in the expansion of x between ranks 1 and n Thus, N0 (x, n) + N1 (x, n) = n and: μ un (x) = pN0 (x,n) (1 − p)N1 (x,n)
(1.14)
36
Scaling, Fractals and Wavelets
First, let us show that Ep has full measure. The easiest way to proceed is to use the language of probability. Each point x can be viewed as the result of a process which is a sequence of independent Bernoulli random variables Xi taking n the value 0 and 1 with probabilities p and 1 − p. The frequency N1 (x, n)/n = ( 1 Xi )/n has mean 1 − p and variance p(1 − p)/n. We may apply here the strong law of large numbers: with probability 1, N1 (x, n)/n tends to 1 − p when n → +∞, and N0 (x, n)/n tends to p. Coming back to the language of measure, this result tells us that the set of x for which N0 (x, n)/n tends to p has measure 1. Such x are in Ep . Thus μ(Ep ) = 1. Secondly, to compute the dimension, we first need to determine the value of the Hölder exponent. Equation (1.14) implies the following result: 1 1 |log p| |log(1 − p)| + N1 (x, n) α un (x) = N0 (x, n) n log 2 n log 2 for any x of [0, 1]. If x ∈ Ep , then α(un (x)) tends to the value: αμ∗ (x) = α∗μ (x) = p
|log(1 − p)| |log p| + (1 − p) log 2 log 2
(1.15)
Thus, the value is the same for dim(Ep ) and Dim(Ep ) according to equations (1.11) and (1.12). We observe that in equation (1.13), the left-hand side is also equal to this value. Moreover, supx∈Ep α(un (x)) is equal to the largest Hölder exponent of dyadic intervals of rank n. If, for example, p < 12 , this largest exponent is equal to |log p|/ log 2, which is larger than 1. Therefore, the right-hand side of (1.13) is larger than 1. In fact, equation (1.13) gives no indication on the value of Δ(E). An argument of density yields Δ(Ep ) = 1 for any value of p.
1.3.3. Hölder exponent related to a function The Hölder exponents of a function give much more information than those of measures. Firstly, let us generalize the notion of measure of a set F , by using the notion of oscillation of the function f in F : β(f, F ) = sup{f (t) − f (t ) for t, t ∈ F }
This allows us to define a Hölder exponent: α(F ) =
log β(f, F ) log diam(F )
Fractal and Multifractal Analysis in Signal Processing
37
Given an interval J and a function f : J → R, we may use this notion locally in arbitrarily small neighborhoods of t ∈ J. The pointwise Hölder exponent of f in t is obtained as αpf (t) = lim inf α [t − ε, t + ε] ∩ J ε→0
According to this definition, the exponent of an increasing function f is the same as that of the measure μ defined on any [c, d] ⊂ J by μ([c, d]) = f (d) − f (c). Indeed, f (d) − f (c) is also the oscillation value of f in [c, d]. However, in general, f is not monotonous and it is therefore necessary to carry out a more accurate analysis, as we will see below. As in the case of measures, we may also consider the “symmetric” exponent defined with an upper limit, and also the exponents obtained as lower and upper limits by using particular grids of intervals, like the dyadic intervals. Oscillation considered as a measurement of the local variability of a function possesses many advantages. In particular, it is closely related to the box dimension. However, there are some counterparts: it is not simple to use in a theoretical context, it is sometimes difficult to estimate with precision under experimental conditions and, finally, it is sensitive to various disturbances (discretization, noise, etc.). It is possible to replace the oscillation with other set functions v(f, F ) showing more robustness. However, most alternatives no longer verify the important triangle inequality: v(f, F1 ∪ F2 ) v(f, F1 ) + v(f, F2 ) for all F1 , F2 ⊂ J (see [TRI 99]). We can simplify the analysis by restricting the general F to the class of intervals and by setting: v(f, [a, b]) = |f (b) − f (a)|.
We may even consider only dyadic intervals, and take: v f, [k 2−n , (k + 1) 2−n ] = |cj,k | where cj,k is the wavelet coefficient of f at scale j and position k (see also Chapters 2, 8 and 9). Let us now give an alternate and useful definition of the Hölder exponent.
38
Scaling, Fractals and Wavelets
DEFINITION 1.2.– Let f : R → R be a function, s > 0 and x0 a real number. Then f ∈ C s (x0 ), if and only if there is a real number η > 0, a polynomial P of degree ≤ s and a constant C, such that ∀x ∈ B(x0 , η),
|f (x) − P (x − x0 )| C|x − x0 |s .
(1.16)
The pointwise Hölder exponent of f at x0 , denoted αpf (x0 ) or αp (x0 ), is given by
sup s/f ∈ C s x0 If 0 < s < 1, then the polynomial P is simply the constant f (x0 ) and the increments of f over [x0 − ε, x0 + ε] are indeed compared to ε. Considering P allows us to take into account the higher order singularities, i.e. in the derivatives of f . We remove the “regular part” of f to exhibit the singular behavior. Consider for example f (x) = x + |x|3/2 . Using P allows us to find αp = 32 , whereas the simple increment would give αp = 1, a non-significant value in this case. The pointwise exponent, whose definition is natural, has a geometric interpretation. To begin with, let us remove the signal’s regular part, thus performing the difference f (x) − P (x − x0 ). Around x0 , the signal thus obtained is entirely contained in a hull of the form C|x − x0 |αp +ε for any ε > 0, and this hull is optimal, i.e. any smaller hull C|x − x0 |αp −ε , does not contain an infinite number of points of the signal. We observe that the smaller the αp is, the more irregular f is in the neighborhood of x0 and vice versa. In addition, αp > 1 implies that f is derivable at x0 , and a discontinuous function at x0 is such that αp = 0. In many applications, the regularity of a signal is as important as, or more important, than its amplitude. For example, the edges of an image do not vary if an affine transformation of the gray-levels is carried out. This will modify the amplitude, but not the exponents. The pointwise Hölder exponent will thus be one of the main ingredients for the fractal analysis of a signal or image. Generally speaking, measuring and studying the pointwise exponent is particularly useful in the processing of strongly irregular signals whose irregularity is expected to contain relevant information. Examples includes biomedical signals and images (ECG, EEG, echography, scintigraphy), Internet traffic logs, radar images, etc. The pointwise exponent, however, presents some drawbacks. For example, it is neither stable by integral-differentiation nor, more generally, under the action of pseudo-differential operators. This means that it is not possible to predict the exponent at x of a primitive F of f knowing αpf (x). It is only guaranteed that αpF (x) αpf (x) + 1. In the same way, the exponent of the analytical signal associated with f is not necessarily the same as f . This is a problem in signal processing, since
Fractal and Multifractal Analysis in Signal Processing
39
this type of operator is often used. The second disadvantage, related to the first, is that αp (x) does not provide the whole information on the regularity of f in x. A common example is that of the chirp |x|γ sin(1/|x|δ ), with γ and δ positive. We verify that the pointwise exponent at 0 is equal to γ. In particular, it is independent of δ: αp (0) is not sensitive to “oscillating singularities”, i.e. to situations where the local frequency of the signal tends to infinity in the neighborhood of 0. It is therefore necessary to introduce at least one more exponent to fully describe the local irregularity. During the last few years, several quantities have been proposed, including the chirp exponent, the oscillation exponent, etc. Here we will focus on the local Hölder exponent, which we now define (for more details on the properties of this exponent, see [LEV 98a, SEU 02]). Let us first recall that, given a function f : Ω → R, where Ω ⊂ R is an open set, we say that f belongs to the global Hölder space Cls (Ω), with 0 < s < 1, if there is a constant C, such that for any x, y in Ω: |f (x) − f (y)| C|x − y|s
(1.17)
If m < s < m + 1 (m ∈ N), then f ∈ Cls (Ω) means that there exists a constant C, such that, for any x, y in Ω: |∂ m f (x) − ∂ m f (y)| C|x − y|s−m
(1.18)
Now let αl (Ω) = sup{s/f ∈ Cls (Ω)}. Notice that if Ω ⊂ Ω, then αl (Ω ) αl (Ω). To define the local Hölder exponent, we will use the following lemma. LEMMA 1.1.– Let (Oi )i∈I be a family of decreasing open sets (i.e. Oi ⊂ Oj if i > j), such that: ∩i Oi = {x0 }
(1.19)
αl (x0 ) = sup{αl (Oi )}
(1.20)
Let: i∈I
Then, αl (x0 ) does not depend on the choice of the family (Oi )i∈I .
This result makes it possible to define the local exponent by using any intervals family containing x0 .
40
Scaling, Fractals and Wavelets
DEFINITION 1.3.– Let f be a function defined in a neighborhood of x0 . Let {In }n∈N be a decreasing sequence of open intervals converging to x0 . By definition, the local Hölder exponent of f at x0 , noted αl (x0 ), is: αl (x0 ) = sup αl (In ) = lim αl (In ) n∈N
n→+∞
(1.21)
Let us briefly note that the local exponent is related to a notion of critical exponent of fractional derivation [KOL 01]. We may understand the difference between αp and αl as follows: let us suppose that there exists a single couple (y, z) such that β(f, B(x, ε)) = f (y)−f (z). Then αp results from the comparison between β(f, B(x, ε)) and ε, whereas for αl , we compare β(f, B(x, ε)) to |y − z|. This is particularly clear in the case of the chirp, where the distance between the points (y, z) realizing the oscillation tends to zero much faster than the size of the ball around 0. Accordingly, it is easy to demonstrate that αl (0) = γ/(1 + δ) for the chirp. The exponent αl thus “sees” oscillations around 0: for fixed γ, the chirp is more irregular (in the sense of αl ) when δ is larger. The local exponent possesses an advantage over αp : it is stable under the action of pseudo-differential operators. However, as well as αp , αl cannot by itself completely characterize the irregularity around a point. Moreover, αl is, in a certain sense, less “precise” than αp . The results presented below support this assertion. PROPOSITION 1.2.– For any continuous f and for all x: αlf (x) min αpf (x), lim inf αpf (t) t→x
The following two theorems describe the structure of the Hölder functions, i.e. the functions which associate with any x the exponents of f at x. THEOREM 1.4.– Let g : R → R+ be a function. The two assertions below are equivalent: – g is the lower limit of a sequence of continuous functions; – there exists a continuous function f whose pointwise Hölder function αp (x) satisfies αp (x) = g(x) for all x. THEOREM 1.5.– Let g : R → R+ be a function. The following two assertions are equivalent: – g is a lower semi-continuous (LSC) function; – there exists a continuous function f whose local Hölder function αl (x) satisfies αl (x) = g(x) for any x.
Fractal and Multifractal Analysis in Signal Processing
41
NOTE 1.2.– Let us recall that a function f : D ⊂ R → R is LSC if, for any x ∈ D and for any sequence (xn ) in D tending to x: lim inf f (xn ) f (x) n→∞
(1.22)
Figure 1.2 shows a generalization of the Weierstrass function defined on [0, 1] for which αp (x) = αl (x) = x for any x. This function is defined as ∞ Wg (x) = i=0 ω −nx cos(ω n x), with ω > 1.
Figure 1.2. Generalized Weierstrass function for which αp (x) = αl (x) = x
Since the class of lower limits of continuous function is much larger than that of lower semi-continuous functions, we observe that αp generally supplies more information than αl . For example, αp can vary much “faster” than αl . In particular, it is possible to construct a continuous function whose pointwise Hölder function coincides with the indicator function of the set of rational numbers. It is everywhere discontinuous, and thus its local Hölder function is constantly equal to 0. The following results describe more precisely the relations between αl and αp . PROPOSITION 1.3.– Let f : I → R be a continuous function, and assume that there exists γ > 0 such that f ∈ C γ (I). Then, there exists a subset D of I such that: – D is dense, uncountable and has Hausdorff dimension zero; – for any x ∈ D, αp (x) = αl (x).
42
Scaling, Fractals and Wavelets
Moreover, this result is optimal, i.e. there exists a function of global regularity γ > 0 such that αp (x) = αl (x) for all x outside a set of zero Hausdorff dimension. THEOREM 1.6.– Let 0 < γ < 1 and f : [0, 1] → [γ, 1] be a lower limit of continuous functions. Let g : [0, 1] → [γ, 1] be a lower semi-continuous function. Assume that for all t ∈ [0, 1], f (t) g(t). Then, there exists a continuous function F : [0, 1] → R such that: – for all x, αl (x) = g(x); – for all x outside a set of zero Hausdorff dimension, αp (x) = f (x). This theorem shows that, when the “compatibility condition” f (t) g(t) is satisfied, we can simultaneously and independently prescribe the local and pointwise regularity of a function outside a “small” set. These two measures of irregularity are thus to some extent independent and provide complementary information. 1.3.4. Signal dimension theorem Let us investigate the relationships between the dimension of a signal and its Hölder exponents. There is no general result concerning the Hausdorff dimension, apart from obvious upper bounds resulting from the inequalities dim(Γ) Dim(Γ) Δ(Γ). Here is a result for Dim(Γ) [TRI 86a]. THEOREM 1.7.– If Γ is the graph of a continuous function f , then: 2 − sup αμ (x) Dim(Γ) 2 − inf αμ (x) . x∈J
x∈J
The same inequalities are true if we use the grid of the dyadic intervals: 2 − sup α∗μ (x) Dim(Γ) 2 − inf α∗μ (x) . x∈J
x∈J
(1.23)
(1.24)
We do not provide the demonstration of these results, which requires an evaluation of the packing measure of the graph. In the same context, we could show that if the local Hölder exponents α(un (x)) tend uniformly to a real α, then this number is also equal to Δ(Γ). However, a much more interesting equality may be given for Minkowski-Bouligand dimension of the graph which is both simple and general. THEOREM 1.8.– Let f be a continuous function defined on an interval J, and non-constant on J. For any ε > 0, let us call ε-variation of f on J the arithmetic mean of ε-oscillations: 1 β f, [x − ε, x + ε] ∩ J dt varε (f ) = |J| J
Fractal and Multifractal Analysis in Signal Processing
43
then: log varε (f ) Δ(Γ) = lim sup 2 − log ε ε→0
(1.25)
The assumption that f is not constant is necessary, as otherwise the oscillations are all zero and varε (f ) = 0. In this case, the graph is a horizontal segment and the value of its dimension is 1. Proof. A proof [TRI 99] using geometric arguments consists of estimating the area of the Minkowski ε-sausage Γ(ε). We show that this is equivalent to that of the union of the horizontal segments ∪t∈J [t − ε, t + ε] × {z(t)} centered on the graph. This is equal to the variation varε (z). EXAMPLE 1.7.– The graph of a self-affine function defined on J = [a, b] may be obtained as the attractor of an iterated functions system, like the self-similar curves of Example 1.2 (see also Chapters 9 and 10). For this, it is sufficient to define: – an integer N 2; – N + 1 points in the plane A1 = (x1 , y1 ), . . . , AN +1 = (xN +1 , yN +1 ), such that x1 = a < · · · < xi < · · · < xN +1 = b; – N affine triangular applications of the plane T1 ,. . . ,TN , such that, for each Ti , the image of the segment A1 AN +1 is the segment Ai Ai+1 . These may be written as: Ti =
ρi hi
0 δi
+ i ηi
where 0 < ρi = (xi+1 − xi )/(b − a) < 1 and |δi | < 1. The five parameters of Ti are related by the relations Ti (A1 ) = Ai and Ti (AN +1 ) = Ai+1 . In the particular case where ρi = 1/N for any i and i |δ i | 1, we may verify that if ε = N −k , the quantity varε (f ) is of the order of (( i |δi |)/N )k . We then obtain: log i |δi | |log( i |δi |)/N | =1+ Δ(Γ) = 2 − log N log N (see the classical example of Figure 1.3 where N = 4 and δi = 12 for any i). The Hölder exponent is calculated using 4-adic intervals. Its uniform value is 12 . Therefore Δ(Γ) = Dim(Γ) = 32 . Let us note that the Hausdorff dimension, strictly lower than 3 2 , is much more difficult to estimate [MCM 84].
44
Scaling, Fractals and Wavelets
EXAMPLE 1.8.– Lacunary series, such as the Weierstrass function, provide other examples of signals: f (x) =
∞
ω −nH cos(ω n x + φn )
i=0
where ω > 1 and 0 < H < 1. The values of the “phases” φn are arbitrary. We can directly prove [TRI 99] that, restricted to any bounded interval J, the box dimension of the graph Γ is equal to 2 − H. By homogenity, Dim(Γ) = 2 − H. This confirms the fact – although it is more difficult to prove (see [BOU 00]) – that the ε-oscillations are uniformly of the order of εH . In other words, there exist two constants c1 and c2 > 0 such that, for any t ∈ J and for any ε, with 0 < ε < 1: c1 εH β f, [x − ε, x + ε] c2 εH
Figure 1.3. Graph of a nowhere derivable function, attractor of a system of four affine applications, such that ρi = 14 and δi = 12 . Dimensions Δ and Dim are equal to 32 . The Hausdorff dimension lies between 1 and 32
Fractal and Multifractal Analysis in Signal Processing
45
Finding the value of the Hausdorff dimension of such graphs is still an open problem. In the case where the φn are independent random variables with same distribution, it is known that dim(Γ) = 2 − H with probability 1. 1.3.5. 2-microlocal analysis In this section, we briefly present an analysis of local regularity which is much finer than the Hölder exponents described above. As was observed previously, neither αp nor αl allow us to completely describe the local irregularity of a function. To obtain an exhaustive description, we need to consider an infinite number of exponents. This is the purpose of 2-microlocal analysis. This powerful tool was defined in [BON 86], where it is introduced in the framework PDE in the frame of Littlewood-Paley analysis. A definition based on wavelets is developed in [JAF 96]. We present here a time-domain characterization [KOL 02].
DEFINITION 1.4.– A function f : I ⊂ R → R belongs to the 2-microlocal space Cxs,s 0 if there exist 0 < δ < 14 and C > 0 such that, for all (x, y) satisfying |x − x0 | < δ and |y − x0 | < δ: |f (x) − f (y)| −s /2 −s /2 |x − y| + |y − x0 | C|x − y|s+s |x − y| + |x − x0 |
Figure 1.4. Graph of the Weierstrass function for ω = 3 and H = 12 . Phases φn are random independent variables and are identically distributed. Dimensions Δ and Dim are equal to 32 . The Hausdorff dimension is almost surely close to 32
(1.26)
46
Scaling, Fractals and Wavelets
This definition is valid only for 0 < s < 1 and 0 < s + s < 1. The general case is slightly more complex and will not be dealt with here. 2-microlocal spaces, as opposed to Hölder space, require two exponents (s, s ) in their definition. While αp is defined as the sup of exponents α, such that f belongs to Cxα0 , we cannot proceed in the same way to define “2-microlocal exponents”. Instead, we define in the abstract }). plan (s, s ) the 2-microlocal frontier of f at x0 as the curve (s, sup{s , f ∈ Cxs,s 0 It is not hard to show that this curve is well defined, concave and decreasing. Its intersection with the s axis is exactly αl . Moreover, under the hypothesis that f possesses a minimum global regularity, αp is the intersection of the frontier with the line s + s = 0. The 2-microlocal frontier thus allows us to re-interpret the two exponents within a unified framework. The main advantage of the frontier is that it completely describes the evolutions of αp under integro-differentiation of arbitrary order: indeed, an integro-differentiation of order ε simply shifts the frontier by ε along the s axis. Thus, 2-microlocal analysis provides extremely rich information on the regularity of a function around a point. To conclude this brief presentation, let us mention that algorithms exist which make it possible to numerically estimate the 2-microlocal frontier. They often allow us to calculate the values of αp and αl more precisely than a direct method (various estimation methods for the exponents and the 2-microlocal frontier are proposed in FracLab). Furthermore, it is possible to develop a 2-microlocal formalism [LEV 04a] which presents strong analogies with the multifractal formalism (see below). 1.3.6. An example: analysis of stock market price As an illustration of some of the notions introduced above, we use this section to detail a simplified example of the fractal analysis of a signal based on Hölder exponents. Our purpose is not to develop a complete application (this would require a whole chapter) but instead to demonstrate how we calculate and use the information provided by local regularity analysis in a practical scenario. The signal is a stock market log. Financial analysis offers an interesting area of application for fractal methods (see Chapter 13 for a detailed study). We will consider the evolution of the Nikkei index between January 1, 1980 and November 5, 2000. These signals comprise 5,313 values and are presented in Figure 1.5. As with many stock market accounts, it is extremely irregular. We calculate the local Hölder exponent of the logarithm of this signal, which is the quantity on which financial analysts work. The exponent is calculated by a direct application of Definition 1.3: at each point x, we find, for increasing values of ε, one couple (y, z) for which the signal oscillation in a ball centered in x of radius ε is attained. A bilogarithmic regression between the vector of the values found for the oscillation and the distances |y − z| is then performed (see the FracLab manual for more details on the procedure). As Figure 1.6 shows, most local exponents are comprised between 0 and 1, with
Fractal and Multifractal Analysis in Signal Processing
47
Figure 1.5. The Nikkei index between 1st January 1980 and 5th November 2000
some peaks above 1 and up to 3. The exponent values in [0, 1], which imply that the signal is continuous but not differentiable, confirm the visual impression of great irregularity. We can go beyond this qualitative comment by observing that the periods where “something occurs” have a definite signature in terms of the exponents: they are characterized by a dramatic increase of αl followed by very small values, below 0.2. Let us examine some examples. The most remarkable point of the local regularity graph is its maximum at abscissa 2,018, with an amplitude of 3. The most singular points, i.e. those with the smallest exponent, are situated just after this maximum: the exponent is around 0.2 for the abscissae between 2,020 and 2,050, and of the order of 0.05 between points 2,075 and 2,100. These two values are distinctly below the exponent average, which is 0.4 (the variance being 0.036). Calculations show that less than 10% of the log points possess an exponent smaller than 0.2. This remarkable behavior corresponds to the crash of October 19, 1987 which occurs at abscissa 2,036, in the middle of the first zone of points with low regularity after the maximum: the most “irregular” days of the entire signal are thus, as expected, situated in the weeks which followed the crash. It is worthwhile noting that this fact is much more apparent on the regularity signal than on the original log, where only the crash appears clearly, with the subsequent period not displaying remarkable features. Let us now consider another area which contains many points of low regularity along with some isolated regular points (i.e. having αl > 1). It corresponds to the zone between abscissae 4,450 and 4,800: this period approximatively corresponds to the Asian crisis that took place between January 1997 and June 1998 (analysts do not agree upon the exact dates of the beginning and the ending of this crisis: some of them date its beginning in mid-1997 or its end towards the end of 1999, or much later). On the graph of the log, we can observe that this period seems slightly more
48
Scaling, Fractals and Wavelets
Figure 1.6. Local Hölder function of the Nikkei index
irregular than others. In terms of exponents, we notice that it contains two maxima, with values greater than 1, both followed by low regularity points: this area comprises a high proportion of irregular points, since 12% of its points have an exponent lower than 0.15. This proportion is three times higher than that observed in the whole log. The analysis just performed has remained at an elementary level. However, it has allowed us to show that major events have repercussions on the evolution of the local Hölder exponent and that the graph of αl emphasizes features not easily visible on the original log.
1.4. Multifractal analysis 1.4.1. What is the purpose of multifractal analysis? In the previous section, it has been observed that the Hölder functions provide precise information on the regularity at each point of a signal. In applications, this information is often useful as such, but there exists many cases where it is necessary to go further. Here are three examples that highlight this necessity. In image segmentation, we expect that edges correspond to low regularity points and hence to small exponents. However, the precise value of the Hölder exponents of contour points cannot be universal: non-linear transformations of an image, for instance, might preserve edges while modifying the exponent value. In this first situation, we see that the pointwise regularity does not provide all the relevant information and that it is necessary to supplement it with structural information. Let us now consider the issue of image texture classification. It is clear that classifying a pixel based on its
Fractal and Multifractal Analysis in Signal Processing
49
local exponent would not give satisfactory results. A more relevant approach would be to use the statistical distribution of exponents with zones. The same comment applies to the characterization of Internet traffic according to its degree of sporadicity. In this second situation, the Hölder function provides information that is too rich, and which we would like to balance in a certain sense. The last situation is when the exponents are too difficult to calculate: there exists, in particular, continuous signals, easy to synthesize, whose Hölder function is everywhere discontinuous. In this case, the pointwise regularity information is too complex to be used under its original form. In all these examples, we would like to use “higher level” information, which would be extracted from the Hölder function or would sum up, in some sense, its relevant features. Several ways of doing this exist. The idea that comes immediately to mind is simply to calculate histograms of exponents. This approach, however, is not adapted, both for mathematical reasons that go beyond the scope of this chapter and because, in fractal analysis, we always try to deal with quantities that are scale-independent. The most relevant way to extract information from the Hölder function and to describe it globally is to perform a multifractal analysis. There are many variants and we will concentrate on two popular examples: the first is geometric and consists of calculating the dimension of the set points possessing the same exponent. The second is statistical: we study the probability of finding, at a fixed resolution, a given exponent and how this probability evolves when the resolution tends to infinity. The following two sections are devoted to developing these notions. 1.4.2. First ingredient: local regularity measures Before giving a detailed description of the local regularity variations of a signal, it is necessary to determine what method will be used to measure this regularity. It has been explained in section 1.3 that many characterizations are equally relevant and that the choice of one or the other is dictated by practical considerations and the type of applications chosen. Likewise, we may base multifractal analysis on various measures of local regularity. However, there are a certain number of advantages in using the pointwise exponent, which is reasonably simple while leading to a rich enough analysis. Therefore, the following text will use this measure of regularity, which corresponds to the most common choice. Grain exponents For reasons which will be explained below, we need to define a new class of exponents called grain exponents. These are simply approximations, at each finite resolution, of the usual exponents. For simplicity, let us assume that our signal X is defined on [0, 1], and let u denote an interval in [0, 1]. We first choose a descriptor VX (u) of the relevant information of X in u: if X is a measure μ, we will most often take Vμ (u) = μ(u). If X is a function f , then Vf (u) may for instance be
50
Scaling, Fractals and Wavelets
the absolute value of the increment of f in u, that is |f (umax ) − f (umin )| (where u = [umin , umax ]). A more precise descriptor is obtained by considering the oscillation instead, and setting Vf (u) = β(f, u). Finally, in cases where the intervals u are dyadic, of the form Ink = [(k − 1) 2−n , k 2−n ], a third possibility is to choose Vf (Ink ) = |ckn |, where ckn is the wavelet coefficient of f on scale n and in position k (note that, in this case, most quantities defined below will additionally depend on the wavelet). Once VX (u) has been defined, the grain exponent may be calculated as follows: α(u) =
log VX (u) log|u|
Filtration It is then necessary to define a sequence of partitions (Ink )(n0,k=1...νn ) of [0, 1]. For each fixed n, the collection of the νn intervals (Ink )k constitutes a partition of [0, 1] and we require that, when n tends to infinity, maxk |Ink | tends to 0 (this implies, of course, that νn has to tend to infinity). A common (but not neutral) choice is to consider the dyadic intervals (and thus νn = 2n ). The grain exponent αnk is then defined by: log VX (Ink ) k αn = log(|Ink |)
Ink :
Intuitively, αnk does indeed measure the “singularity” of X in the (small) interval the smaller αnk , the greater the variation of X in Ink , and vice versa.
Abstract function A It is important to observe at this point that we can carry out a more general version of multifractal analysis by replacing the grain exponent α(u) with a function A defined on the metric space of closed intervals of [0, 1], and ranging in R+ ∪{+∞} [LEV 04b]. In this context, more general results can be obtained. 1.4.3. Second ingredient: the size of point sets of the same regularity In the same way as the local regularity can be defined through different approaches, there exist many ways of extracting high-level information. Geometric method Conceptually, the simplest method consists of considering the sets Eα of those points of [0, 1] possessing a given exponent α and then describing the size of Eα (to simplify notations and as the focus of this section is on the pointwise exponent,
Fractal and Multifractal Analysis in Signal Processing
51
we write α instead of αp ). In many cases of practical and theoretical interest, these sets have a zero Lebesgue measure for most of the values of α. In addition, they are often dense in [0, 1], with a box dimension equal to 1. To distinguish them, it is thus necessary to measure them either by their Hausdorff or packing dimension. Here, only the first of these dimensions will be considered. We set: fh (α) = dimH (Eα ) where Eα = {x : α(x) = α}. The function fh is called the Hausdorff multifractal spectrum of X. Since the empty set dimension is −∞, we see that fh will take values in {−∞} ∪ [0, n] for an n-dimensional signal. Even though a strict definition of a multifractal object does not exist, it seems reasonable to talk of multifractality when fh (α) is positive or zero for several values of α: indeed, this means that X will display different singular behaviors on different subsets of [0, 1]. Often, we will require that fh be strictly positive on an interval in order to consider X as truly multifractal. While the Hausdorff spectrum is simple to define, it requires an extremely delicate calculation in theoretical as well as numerical studies. The next subsection presents another multifractal spectrum, which is easier to calculate and which also serves to give an approximation by excess of fh . Statistical method The second method used to globally describe the variations of regularity is to adopt a statistical approach (as opposed to fh , which is a geometric spectrum): we first choose a value of α, and counts the number of intervals Ink , at a given resolution n, where X possesses a grain exponent approximately equal to α. We then let the resolution tend to infinity and observes how this number evolves. More precisely, the large deviation spectrum fg is defined as: fg (α) = lim lim inf ε→0 n→∞
log Nnε (α) log νn
where: Nnε (α) = #{k : α − ε αnk α + ε} We may heuristically understand this spectrum by letting Pn denote the uniform probability law on {1 . . . νn }, i.e. Pn (k) = 1/νn for k = 1, . . . , νn . Then, neglecting ε and assuming that the lower limit is a limit: Pn (αnk α) νnfg (α)−1
(1.27)
52
Scaling, Fractals and Wavelets
In other words, if an interval Ink at resolution n is drawn randomly (for a sufficiently large value of n), then the probability of observing a singularity exponent f (α)−1 . approximately equal to α is proportional to νng From the definition, it is clear that fg , as fh , ranges in {−∞} ∪ [0, 1] (in one dimension). As a consequence, whenever fg (α) is not equal to one (and thus is strictly smaller than 1), the probability of observing a grain exponent close to α decays exponentially fast to 0 when the resolution tends to infinity: only those α such that fg (α) = 1 will occur in the limit of infinite resolution. The study of this type of behavior and the determination of the associated exponential convergence speed (here, fg (α) − 1) are the topic of the branch of probability called “large deviation theory”, from which we derive the denomination of fg . Nature of the variable ε Let us consider the role of ε: at each finite resolution, the number of intervals Ink is finite. As a consequence, Nn0 (α) will be zero except for a finite number of values of α. The variable ε then represents a “tolerance” on the exponent value, which is made necessary by the fact that we work at finite resolutions. Once the limit on n has been taken, this tolerance is no longer needed and we can let ε tend to 0 (note that the limit in ε always exists, as we are dealing with a monotonic function of ε). From a more general point of view, we may understand the difference between fh and fg as an inversion of the limits: in the case of fh , we first let the resolution tend to infinity to calculate the exponents and then “count” how many points are characterized by a given exponent. In the case of fg , we start by counting, at a given resolution, the number of intervals having a prescribed exponent, and then let the resolution tend to infinity. This second procedure renders the calculation of the spectrum easier, but in general it will obviously lead to a different result. In section 1.4.7, two examples are given that illustrate the difference between fh and fg . NOTE 1.3.– Historically, the large deviation spectrum was introduced as an easy means to calculate fh through arguments developed in section 1.4.4. It was then gradually realized that fg contains information sometimes more relevant than fh , particularly in signal processing applications. For more details on this topic, see [LEV 98b], where the denominations “Hausdorff spectrum”, “large deviation spectrum” and “Legendre spectrum” were introduced. The next section tackles the problem of calculating the multifractal spectra. 1.4.4. Practical calculation of spectra Let us begin with the Hausdorff spectrum. It is clear that the calculation of the exponents in each point, and then of all the associated dimensions of the Hausdorff spectrum, is an extremely difficult task. It is thus desirable to look for indirect methods to evaluate fh . Two of them are described below.
Fractal and Multifractal Analysis in Signal Processing
53
Multifractal formalism In some cases, fh may be obtained as the Legendre transform of a function that is easily calculated. When this is the case, we say that the (strong) multifractal formalism holds. Define: n
Sn (q) =
2
VX (Ink )q
k=1
and set: τn (q) = −
1 log2 Sn (q) n
τ (q) = lim inf τn (q) n→∞
To understand the link between fh and τ heuristically, let us evaluate τn by grouping together the terms that have the same order of magnitude in the definition of Sn . In that view, fix a “Hölder exponent” α and consider those intervals for which, when n is sufficiently large, VX (Ink ) ∼ |Ink |α . Assume that this approximation is true uniformly in k. We may then roughly estimate the number of such intervals by 2nfh (α) . Then: Sn (q) = {2−nqα : VX (Ink ) ∼ |Ink |α } = 2−n(qα−fh (α)) α
k
α
Factorizing 2−n inf α (qα−fh (α)) , we obtain for τn : 1 2−n[qα−fh (α)−inf α (qα−fh (α))] τn (q) = inf qα − fh (α) − log α n α and, taking the “limit” when n tends to infinity: τ (q) := lim inf α (qα − fh (α)) =: fh∗ (α). The transformation fh → fh∗ is called the “Legendre transform” (for the concave functions). Provided we can justify all the manipulations above, we thus expect that τ will be the Legendre transform of fh . If, in addition, fh is concave, a well-known property of the Legendre transform guarantees that fh = τ ∗ . As announced above, fh is then obtained at once from τ . The important point is that τ is itself much easier to evaluate than fh : there is no need to calculate any Hausdorff dimension and furthermore the definition of τ involves only average quantities, and no evaluations of pointwise Hölder exponents. The only difficulty lies in the estimation of the limit. Therefore, the estimation of τ will be in general much easier and more robust than that of fh . For this reason, even though the multifractal formalism does not hold in
54
Scaling, Fractals and Wavelets
general, it is interesting to consider a new spectrum, called the Legendre spectrum and defined as1 fl = τ ∗ . The study of the validity domain of the equality fh = fl is probably one of the most examined issues in multifractal analysis, for obvious theoretical and practical reasons. However, this is an extremely complex problem, which has only been partially answered to this day, even in “elementary” cases such as the one of multiplicative processes (see section 1.4.6). It is easy to find counter-examples to the equality fh = τ ∗ . For instance, since a Legendre transform is always concave, the formalism certainly fails as soon as fh is not concave. There is no reason to expect that the spectrum of real signals will have this property. In particular, it is not preserved by addition, so that it will not be very stable in practice. Other dimensional spectra Instead of using the Legendre transform approach, another path consists of defining spectra which are close in spirit to fh , but are easier to calculate. Since the major difficulty arises from the estimation of the Hausdorff dimension, we may try to replace it with the box dimension, which is simpler to evaluate. Unfortunately, merely replacing the Haussdorff dimension by Δ in the definition of the spectrum leads to uninteresting results. Indeed, as mentioned above, the sets Eα have, in many cases of interest, a box dimension equal to 1 (see section 1.4.6 for examples). In this situation, the spectrum obtained by replacing dim by Δ, being constant, will not supply any information. A more promising approach consists of defining dimension spectra as in [LEV 04b, TRI 99]. To do so, let us consider a general function of intervals A, as in section 1.4.2. For any x in [0, 1], set: αn (x) = A un (x) where un (x) is the dyadic interval of length 2−n containing x (take for instance the right interval if two such intervals exist). Then let: Eα (, N ) = {x/n N ⇒ |αn (x) − α| } Eα (, N ) is an increasing function of N , and thus: ∪N Eα (, N ) supN Eα (, N ). Let:
=
Eα () = sup Eα (, N ) = {x/∃N such that n N ⇒ |αn (x) − α| } N
1. The inequalities fh g fl are always true (see below). Thus, it would be more accurate to say that the Legendre spectrum is an approximation (by excess) of the large deviation spectrum, rather than that of the Hausdorff spectrum.
Fractal and Multifractal Analysis in Signal Processing
55
Since the sets Eα () decrease with , we may define: Eα = lim Eα () = {x/αn (x)→n→∞ α} →0
DEFINITION 1.5.– Let d be any dimension. We define the following spectra: fd (α) = d(Eα ) = d lim sup Eα (, N ) →0 N
fdlim (α) = lim d Eα () = lim d sup Eα (, N ) →0
→0
fdlim sup (α) = lim sup d Eα (, N )
(1.28) (1.29)
N
→0 N
(1.30)
When d is the Hausdorff dimension, then fd is just the Hausdorff spectrum fh . Let D = Im (A) be the closure of the image of A. Then D is the support of the spectra. Indeed, for any α ∈ D, Eα (, N ) = ∅, and thus fd (α) = fdlim (α) = fdlim sup (α) = −∞ (obviously this also applies to fg ). The following inequalities are easily proved: fd fdlim
(1.31)
fdlim sup fdlim
(1.32)
There is no relation in general between fd and fdlim sup . However, if d is σ-stable: fd (α) fdlim (α) = fdlim sup (α)
(1.33)
Besides, if d1 and d2 have two dimensions such that, for any E ⊂ [0, 1]: d1 (E) d2 (E) then: sup sup (E) fdlim (E), fdlim (E) fdlim (E) fd1 (E) fd2 (E), fdlim 1 2 1 2
It is not hard to prove the below sequence of inequalities. PROPOSITION 1.4.– For any A: fh (α) fhlim sup (α) = fhlim (α) lim lim sup fΔ (α) min fΔ (α), fg (α)
(1.34)
56
Scaling, Fractals and Wavelets
This is an improvement on the traditional result fh (α) fg (α). In particular, lim . If, as soon as fh (α) = fg (α), all the above spectra coincide, except perhaps fΔ lim sup on the contrary, fh is smaller than fg , then we may hope that fΔ (α) will be a better approximation of fh than fg . The important point here is that the calculation of lim sup fΔ (α) only involves box dimensions and that it is of the same order of complexity lim sup as that of fg . The spectrum fΔ (α) is thus a good substitute when we want to lim sup numerically estimate fh . For example, the practical calculation of fΔ (α) on a multinomial measure (see section 1.4.6) yields good results. Since d([0, 1]) = 1, all the spectra have a maximum lower than or equal to 1. In certain cases, a more precise result concerning fdlim sup , fdlim and fg is available: PROPOSITION 1.5.– Let K be the set of x in [0, 1] such that the sequence (αn (x)) converges. Let us suppose that |K| > 0. Then, there exists α0 in D such that: fdlim sup (α0 ) = fdlim (α0 ) = fg (α0 ) = 1
(1.35)
Let us note that such a constraint is typically not satisfied by fh . This shows that fh contains, in general, more information. A more precise and more general result may be found in [LEV 04b]. The three spectra fdlim sup , fdlim and fg also obey a structural constraint, expressed by the following proposition. PROPOSITION 1.6.– The functions fg , fdlim and fdlim sup are upper semi-continuous. NOTE 1.4.– Recall that a function f : D ⊂ R → R is upper semi-continuous (USC) if, for any x ∈ D and for any sequence (xn ) of D converging to x: lim sup f (xn ) f (x)
(1.36)
n→∞
Let us define the upper semi-continuous envelope of a function f by: f˜(α) = lim sup{f (β)/|β − α| } →0
Then, the above results imply that: f˜d (α) fdlim (α)
(1.37)
Let us also mention the following result. PROPOSITION 1.7.– Let A and B be the two interval functions and C = max{A, B}. Let fg (α, A), fg (α, B) and fg (α, C) denote the corresponding spectra. Then, for any α: fg (α, C) max{fg (α, A), fg (α, B)}
(1.38)
Fractal and Multifractal Analysis in Signal Processing
57
PROPOSITION 1.8.– If d is stable, then inequality (1.38) is true for fd , fdlim and fdlim sup . A significant result concerning the “inverse problem” for the spectra serves as a conclusion for this section. PROPOSITION 1.9.– Let f be a USC function ranging in [0, 1] ∪ {−∞}. Then, there exists an interval function A whose fdlim sup or fdlim spectrum is exactly f as soon as d is σ-stable or d = Δ. Let us note that fd , when d is σ-stable, is not necessarily USC. This shows once more that this spectrum is richer than the other ones (see [LEV 98b] for a study of the structural properties of fd with d = h). Weak multifractal formalism Let us now consider the numerical estimation of fg . As in the case of fh , two approaches exist: either we resort to a multifractal formalism with fg as the Legendre transform of a simple function, or we analyze in detail the definition of fg and deduces estimation methods from this. In the first case, the heuristic justification is the same as for fh and we expect that, under certain conditions, fg = τ ∗ . Since we avoid an inversion of limits as compared to the case of fh , this formalism (sometimes called weak multifractal formalism, as opposed to the strong formalism which ensures the equality of τ ∗ and fh ) will be satisfied more often. However, the necessary condition that fg be concave, associated once again with the lack of stability, still limits its applicability. An important difference between the strong and weak formalisms is that, in the latter case, a precise and reasonably general criterion ensuring its validity is known. We are referring to a version of the Ellis theorem, one of the fundamental results in the theory of large deviations, which is recalled below in a simplified form. THEOREM 1.9.– If τ (q) = limn→∞ τn (q) exists as a limit (rather than a lower limit), and if it is differentiable, then fg = fl . When fg is not concave, it cannot equal fl , but the following result holds. THEOREM 1.10.– If fg is equal to −∞ outside a compact set, then fl is its concave hull, i.e.: fl = (fg )∗∗ This theorem makes it possible to measure precisely the information which is lost when fg is replaced by fl . See [LEV 98b] for related results.
58
Scaling, Fractals and Wavelets
Continuous spectra It is possible to prove that the previous relation is still valid in the more subtle case of continuous spectra [LEV 04b, TRI 99]. These continuous spectra constitute a generalization of fg that allow us to avoid choosing a partition. As already mentioned, this choice is not neutral and different partitions will, in general, lead to different spectra. To begin with, interval families are defined: Rη = {u interval of [0, 1] such that |u| = η} Rη (α) = {u : |u| = η, A(u) = α} Rεη (α) = {u : |u| = η, |A(u) − α| ε} DEFINITION 1.6 (continuous large deviation spectra).– log η1 |∪Rεη (α)| fgc (α) = lim lim sup ε→0 η→0 log η log η1 |∪Rη (α)| f˜gc (α) = lim sup log η η→0 Note that fgc (α) is defined similarly to fg , except that all the intervals of a given length η are considered where the variation of X is of the order of η α±ε , rather than only dyadic intervals. Since the number of these intervals may be infinite, we replace Nnε (α) with a measure of the average length, i.e. η1 |∪Rεη (α)|. Within this continuous framework, Rη (α) will, in general, be non-empty for infinitely many values of α and not only for at most 2n values, as is the case for Nn0 (α): it is thus possible to get rid of ε and define the new spectrum f˜gc . Legendre transform of continuous spectra For any family of R intervals, ∪R denotes the union of all the intervals of R. A packing of R is a sub-family composed of disjoint intervals. DEFINITION 1.7 (Legendre continuous spectrum).– For any real q, let:
q qA(u) |u| : R is a packing of R H (R) = sup u∈R
and: Jηq = sup H q Rη (α) α
Fractal and Multifractal Analysis in Signal Processing
Set:
59
log H q (Rη ) η→0 log η q log J (Rη ) τ˜c (q) = lim inf η→0 log η τ c (q) = lim inf
and finally: flc = (τ c )∗ and f˜lc = (˜ τ c )∗ . Here are the main properties of fgc , f˜gc , flc and f˜lc . PROPOSITION 1.10.– – flc and f˜lc are concave functions; – for any α, f˜c (α) f c (α) and fg (α) f c (α); g
g
g
– if μ is a multinomial measure (see section 1.4.6), then fgc (α) = f˜gc (α) = flc (α) = f˜lc (α) = fg (α). THEOREM 1.11.– If fgc (respectively f˜gc ) is equal to −∞ outside a compact interval, then, for any α: ∗∗ flc (α) = fgc (α) ∗∗ f˜lc (α) = f˜gc (α) PROPOSITION 1.11.– – τ c and τ˜c are increasing and concave functions; – τ c (0) = τ˜c (0) = −Δ(supp(X)); – if X is a probability measure, then τ c (1) = τ˜c (1) = 0; – τ c (q) = lim inf n→∞ log H q (Rηn )/ log ηn , where (ηn ) is a sequence tending to zero such that log ηn / log ηn+1 → 1 when n → ∞. The same is true for τ˜c . The last property is important in numerical applications: it means that τ c and τ˜c may be estimated by using discrete sequences of the type ηn = 2−n . Kernel method A second method to estimate fg , which does not assume that the weak formalism is true (and thus in particular allows us to obtain non-concave spectra), is based on the following. Let K denote the “rectangular kernel”, i.e. K(x) = 1 for x ∈ [−1, 1] and K(x) = 0 elsewhere. Let Kε (t) = 1ε K( εt ). Then, by definition, Nnε (α) = 2n+1 εKε ∗ ρn (α), where the symbol ∗ represents convolution. It is not hard to check that replacing K by any positive kernel with compact support whose
60
Scaling, Fractals and Wavelets
integral equals 1 in the definition of Nnε (α) will not change the value of fg . A basic idea is then to use a more regular kernel than the rectangular one to improve the estimation. A more elaborate approach is to use ideas from density estimation to try and remove the double limit in the definition of fg : this is performed by choosing ε to be a function of n in such a way that appropriate convergence properties are obtained [LEV 96b]. We may, for instance, show the following result: PROPOSITION 1.12.– Assume that the studied signal is a finite sum of multinomial measures (see section 1.4.6). Let: fnε (α) =
log Nnε (α) log νn
εn Then, if εn is a sequence such that limn→∞ log(ν = c, where c is a positive n) constant: lim sup fnεn (α) − fg (α) = 0 n→∞ α
Even without making such a strong assumption on the signal structure, it is still possible, in certain cases, to obtain convergence results, as above, with ε = ε(n), using more sophisticated statistical techniques. To conclude our presentation of multifractal spectra, let us emphasize that no spectrum is “better” than the others in all respects. All the spectra give similar but different information. As was already observed for the local regularity measures, each one has advantages and drawbacks, and the choice has to be made in view of the considered application. If we are interested in the geometric aspects, a dimension spectrum should be favored. The large deviation spectrum will be used in statistical and most signal processing applications. When the number of data is small or if it appears that the estimations are not reliable, we will resort to the Legendre spectrum. To be able to compare the different information and also to assert the quality of the estimations, it is important to dispose of general theoretical relations between the spectra. It is remarkable that, as seen above, such relations exist under rather weak hypotheses. 1.4.5. Refinements: analysis of the sequence of capacities, mutual analysis and multisingularity In this section, some refinements of multifractal analysis, useful in applications, will be briefly discussed. The first refinement stems from the following consideration. Assume we are interested in the analysis of road traffic and that the signal X(k) at hand is the flow
Fractal and Multifractal Analysis in Signal Processing
61
per hour, i.e. the number of vehicles crossing a given small section of road during a fixed time interval (often in the order a minute).2 In this case, each point of the signal is not a pointwise value but corresponds to a space-time integral. We would thus be inclined to model such data by a measure and carry out multifractal analysis of this measure. However, it appears that, if we want to anticipate congestions, the relevant quantity to consider for a given time interval Tn whose length n is large as compared to one minute is not the sum of the individual flows X(k) for k in Tn , but the maximum of these flows. Hence, instead of one signal, we would rather consider a sequence of signals, Xn , each yielding the maximum flow at the time scale n: Xn (j) = maxk∈Tn X(k). At each scale n, the signal Xn is a set function (i.e. it does not give pointwise values but averages). However, the Xn are not measures, as they are not additive: the maximum on two disjoint intervals T 1 and T 2 is not, in general, the sum of the maxima of T 1 and T 2. Nonetheless, each Xn possesses some regularity properties which allows us to model it as a Choquet capacity. Thus, we are led to generalize multifractal analysis so as to no longer process one signal, which would be a pointwise function or a measure, but a sequence of signals which are Choquet capacities. A number of other examples exist, particularly in image analysis, where such a generalization is found necessary (see [LEV 96a]). We will not give here the definition of a Choquet capacity, but we stress that the generalization of multifractal analysis to sequences of Choquet capacities, at first seemingly abstract, is in fact very simple. Indeed, nothing in the definition of multifractal analysis implies that the same signal has to be considered at each resolution, nor that the signal must be a function or a measure. In particular, the relations between the spectra are preserved in this generalization [LEV 98b]. A different generalization consists of noting that, in the usual formulation of multifractal analysis, the Lebesgue measure L plays a particular role which may not be desirable. Let us for instance consider the definition of the grain exponent. The logarithm of the interval measure Ink is compared to |Ink |, which is nothing but the Lebesgue measure of Ink . In the same way, when we define fh , the Hausdorff dimension is calculated with respect to L. However, it is a traditional fact that we may 3 define an s-dimensional Hausdorff with measure respects to an arbitrary non-atomic s measure μ by replacing the sum |Uj | with μ(Uj ) . As a matter of fact, once a non-atomic reference measure μ has been chosen, we may rewrite all the definitions of multifractal analysis (Hölder exponent, grain exponent, spectra, etc.) by replacing L with μ. If this analysis is applied to a signal X (which may be a function, a measure or a capacity), we obtain the multifractal analysis of X with respect to μ. This type of analysis is called mutual multifractal analysis. There are several benefits to this
2. A multifractal analysis of traffic in Paris is presented in [VOJ 94]. 3. A measure is non-atomic if it attributes zero mass to all singletons. The reasons for which one has to restrict to such measures go beyond the scope of this chapter; see [LEV 98b] for more details.
62
Scaling, Fractals and Wavelets
generalization. Let us briefly illustrate some of them through two examples. Assume that the signal to be studied X is equal to Y + Z, where Y and Z are two measures Y Z (α) fh,L (α) for supported respectively on [0, 12 ] and on [ 12 , 1]. Suppose that fh,L T all α, where fh,L is the Haussdorff spectrum of T with respect to L. Then, it is easy X Z to see that fh,L (α) = fh,L (α) for any α. The Y component is not detected by the analysis. On the contrary, if we calculate the spectra with respect to the reference X Z Z (α) = fh,Z (α), since fh,Z (α) will be equal to measure Z, then, in general, fh,Z 1 for α = 1 and to −∞ everywhere else. A change in the reference measure thus allows us to carry out a more accurate analysis, by shedding light on possibly hidden components. As a second example, consider an application in image analysis. It is possible to use mutual multifractal analysis to selectively detect certain types of changes in image sequences (for example, the appearance of manufactured objects in sequences of aerial images [CAN 96]). The idea consists of choosing the first image of the sequence as the reference measure and then analyzing each following image with respect to it. In the absence of any change, the mutual spectrum will be equal to 1 for α = 1 and to −∞ everywhere else. A change will lead to a spreading of this spectrum and it is possible to classify changes according to the corresponding values of the couple (α, f (α)) (see Chapter 11). Finally, a third refinement consists of carrying out a multifractal analysis “at a point”, using 2-microlocal analysis: the so-called 2-microlocal formalism allows us to define a function, corresponding to fg , which completely describes the singular behavior of a signal in the vicinity of any point. Such an analysis provides in particular ways to modify pointwise regularity in a powerful manner. 1.4.6. The multifractal spectra of certain simple signals The paradigmatic example of multifractal measures is the Besicovitch measure – often called, in this context, the “binomial measure”. For x in [0, 1], write: x=
∞
xi 2−i
where xi = 0, 1
1
1 1 − xi , n i=1 n
Φn0 (x) =
1 xi n i=1 n
Φn1 (x) =
According to (1.13): α un (x) = −Φn0 (x) log2 p − Φn1 (x) log2 (1 − p) Write lim inf n→∞ Φn0 (x) =: Φ0 (x) and Φ1 (x) := 1 − Φ0 (x). We obtain: α∗ (x) = −Φ0 (x) log2 p − Φ1 (x) log2 (1 − p)
Fractal and Multifractal Analysis in Signal Processing
63
The sets Eα are the sets of points of [0, 1] having a given proportion Φ0 (x) of 0 in their base-2 expansion. Calculations similar to those in Example 1.6 lead to: dim Eα = −Φ0 log2 Φ0 − Φ1 log2 Φ1 The Hausdorff spectrum is thus given in parametric form by: α(Φ0 ) = −Φ0 log2 p − Φ1 log2 (1 − p) fn (α) = −Φ0 log2 Φ0 − Φ1 log2 Φ1 Note also that the sets of points in (0, 1) for which Φn0 (x) does not converge has Hausdorff dimension equal to 1, although it has a zero Lebesgue measure: indeed, the strong law of large numbers entails that, for L-almost x, Φ0 (x) = Φ1 (x) = 12 . An immediate consequence is that, for αm := − 12 log2 p(1 − p), fh (αm ) = 1. Now consider the grain exponents. We observe that L-a.s., αnk → αm . The function fg (α) will measure the speed at which Pr (|αnk − αm | > ε) tends to 0 for ε > 0 when n → ∞. For fixed n there are exactly Cnk intervals Inj such that Φn0 (x) = nk for x ∈ Inj . This makes it possible to evaluate Nnε (α), which is equal to Cnk for ε sufficiently small and α close to α(Φ0 = nk ). Using Stirling’s formula to estimate Cnk , we can then obtain fg (α). However, these calculations are somewhat tedious, in particular due to the double limit, and it is much faster to evaluate fl (α) and use the general results on the relationships between the three spectra: By definition: τ (q) = lim inf n→∞
log2
2n −1 j q k=0 μ In −n
Now, μ(Inj ) = pk (1 − p)n−k exactly Cnk times, for k = 0 . . . n. As a consequence: n 2 −1
j=0
n n q μ Inj = Cnk pkq (1 − p)(n−k)q = pq + (1 − p)q k=0
and: τ (q) = − log2 pq + (1 − p)q A simple Legendre transform calculation then shows that fl (α) = τ ∗ (α) is equal to fh (α). Since fh fl fg is always true, it follows that the strong multifractal formalism holds in the case of the binomial measure: fh = fg = fl .
64
Scaling, Fractals and Wavelets
1
0.8
0.6
0.4
0.2
0
0.8
0.9
1
1.1
1.2
1.3
Figure 1.7. Spectrum of the binomial measure
1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0.2
0.4
0.6
0.8
1
1.2
1.4
Figure 1.8. Spectrum of the sum of two binomial measures
Let us note, however, that simple operations will break the formalism: for instance, the sum of two Besicovitch measures on [0, 1], or the lumping of two such measures with disjoint supports, will have fh = fg = fl (see Figures 1.7 and 1.8). There are many generalizations of the Besicovitch measure. We can replace base 2 by base b > 2, i.e. partition [0, 1] into b sub-intervals. We then speak of multinomial measures. We may also distribute the measure on b < b intervals, in which case the
Fractal and Multifractal Analysis in Signal Processing
65
support of the measure will be a Cantor set. Another variation is to consider stochastic constructions by choosing the partitioning and/or p as random variables following certain probability laws. An example of such a stochastic Besicovitch measure, where, at each scale, p is chosen as an iid lognormal random variable, is represented in Figure 1.9. This type of measure can serve as a starting point for the modeling of certain Internet traffic.
Figure 1.9. A random binomial measure
Essentially the same analysis as above applies to the self-affine functions defined in Example 1.7. Similar parametric formulae are obtained for the spectra, which also coincide in this case. Let us mention as a final note that such measures and functions have a Hölder function which is everywhere discontinuous and has level lines which are everywhere dense. We have αl (x) = α0 for all x, where α0 = mint αp (t). Let us now consider the Weierstrass function: W (x) =
∞
ω −nH cos(ω n x)
i=0
As was already mentioned, α(t) = H for any t for W . As a consequence: fh (H) = 1,
fh (α) = −∞ if α = H
The value of the large deviation spectrum depends on the choice of function VW (u). Taking VW (u) = β(W, u) leads to fg = fh . However, if VW (u) is defined to
66
Scaling, Fractals and Wavelets
be the increment of W in u, then, for certain values of ω: ⎧ ⎪ if α < H ⎨−∞ fg (α) = H + 1 − α if H α H + 1 ⎪ ⎩ −∞ if α > H + 1 The heuristic explanation of the fact that fg is positive for some values of α larger than H is as follows: at each finite resolution ε, the increments of W in intervals of size ε are at most of the order of εH , because the exponent is defined as a lim inf. However, there will also exist intervals for which the increments are smaller, yielding a larger observed grain exponent. The fg spectrum does in fact measure the speed at which the probability of finding one of these smoother increments tends to 0 when ε tends to 0. To conclude, we note that, in both cases (oscillations and increments), fg is concave and coincides with fl . Similar results hold for fractional Brownian motion. Of course, since fractional Brownian motion is a stochastic process, its spectrum is a priori a random function. However, we can show that, with probability 1, fh and fg (defined either using oscillations or increments) are exactly the same as those given above for the Weierstrass function. 1.4.7. Two applications To conclude this chapter, we briefly mention two applications of multifractal analysis to signal and image processing. Our intention is not to go into the details of these applications (they are developed in Chapters 11 and 12), but to indicate, in a simplified way, how the tools introduced above are put into practice. 1.4.7.1. Image segmentation The issue of edge detection in images allows us to illustrate in a concrete way the relevance of multifractal spectra, as well as the difference between fh and fg . We have observed above that edge points are irregular points, but that they cannot be characterized by a universal value of αp , since non-linear transformations of an image may leave the contours unchanged while modifying the exponents. To characterize edges, it is necessary to include higher level information. This information may be obtained from the following obvious comment: (smooth) edges of an image form a set of lines which is of one dimension. Looking for contours thus means looking for sets of points which are characterized by specific values of αp (local regularity criterion), and such that their associated dimension is 1 (global criterion). In other words, we will characterize edges as those points possessing an exponent which belongs to fh−1 (1). This provides a geometric characterization of edges: on the one hand, we rely on
Fractal and Multifractal Analysis in Signal Processing
67
pointwise exponents, which measure the regularity at infinite resolution; on the other hand, we use fh , which is a dimension spectrum. However, it is also possible to follow a statistical approach: assume we consider a very simple image which contains only a black line, the “edge”, drawn on a white background. If we draw randomly in a uniform way a point in the image at infinite resolution, the probability of hitting the line is zero. However, at any finite resolution, where the image is made of, say, 2n ×2n pixels, the probability of hitting the edge is of the order 2−n , since the edge contains 2n pixels. According to the definition of fg (recall in particular (1.27)), we see that, on the black line, fg (α) = 1. In this approach, we thus characterize an edge point as a point possessing a singularity α whose probability of occurrence decreases as 2−n when the resolution tends to infinity. When the multifractal formalism holds, the geometric and statistical approaches yield the same result. For more details, see Chapter 11 and [LEV 96a]. 1.4.7.2. Analysis of TCP traffic Our second application deals with the modeling and analysis of TCP traffic. Here the situation is a little bit different from that of the previous application: contrarily to typical images, TCP traffic possesses, under certain conditions, a multifractal structure. In the language of the discussion at the end of the introduction to this chapter, we here use fractal methods to study a fractal object. This will entail few changes as far as the analysis of the data is concerned. However, dealing with a multifractal signal brings up the question of the source of this multifractality, and thus of a model capable of explaining this phenomenon. This issue will not be not tackled here. See Chapter 12 and [LEV 97, LEV 01, RIE 97, BLV 01]. What is the use of carrying out a multifractal analysis of TCP? First of all, the range of values taken by the Hölder exponents provides important information on the small-scale behavior of traffic. The smaller α is, the more sporadic traffic will be, which means that variations on short time intervals will be significant. The spectrum also allows us to elucidate what is typical behavior, i.e. the value α0 such that fg (α0 ) = 1: with high probability, the variation of traffic between two close time instants (t1 , t2 ) will be of the order |t2 − t1 |α0 . While this typical behavior is important for the understanding and the management of the network, it is also useful to know which other variations may occur, and with what probabilities. This is exactly the information provided by fg . Thus, the whole large deviation spectrum is useful in this application. Let us note that, in contrast, the Hausdorff spectrum is probably less adapted here: first because the relevant physical quantities are increments at different time scales, small but finite; there is no notion of regularity at infinite resolution, as is the case with images. Second, the relevant information is statistical in nature rather than geometric. To conclude, let us mention that the large deviation spectrum of certain TCP traces, as estimated by the kernel method, displays a shape reminiscent of that of
68
Scaling, Fractals and Wavelets
the sum of two binomial measures. This provides useful information on the fine structure of these traces. In particular, fg is not concave. This shows the advantage of having procedures available to estimate spectra without making the assumption that a multifractal formalism is satisfied.
1.5. Bibliography [BLV 01] BARRAL J., L ÉVY V ÉHEL J., “Multifractal analysis of a class of additive processes with correlated nonstationary increments”, Electronic Journal of Probability, vol. 9, p. 508–543, 2001. [BON 86] B ONY J., “Second microlocalization and propagation of singularities for semilinear hyperbolic equations”, in Hyperbolic equations and related topics (Katata/Kyoto, 1984), Academic Press, Boston, Massachusetts, p. 11–49, 1986. [BOU 28] B OULIGAND G., “Ensembles impropres et nombre dimensionnel”, Bull. Soc. Math., vol. 52, p. 320–334 and 361–376, 1928 (see also Les définitions modernes de la dimension, Hermann, 1936). [BOU 00] B OUSCH T., H EURTEAUX Y., “Caloric measure on the domains bounded by Weierstrass-type graphs”, Ann. Acad. Sci. Fenn. Math., vol. 25, p. 501–522, 2000. [CAN 96] C ANUS C., L ÉVY V ÉHEL J., “Change detection in sequences of images by multifractal analysis”, in ICASSP’96 (Atlanta, Georgia), 1996. [DAO 02] DAOUDI K., L ÉVY V ÉHEL J., “Signal representation and segmentation based on multifractal stationarity”, Signal Processing, vol. 82, no. 12, p. 2015–2024, 2002. [DUB 89] D UBUC B., T RICOT C., ROQUES -C ARMES C., Z UCKER S., “Evaluating the fractal dimension of profiles”, Physical Review A, vol. 39, p. 1500–1512, 1989. [GRE 85] G REBOGI C., M C D ONALD S., OTT E., YORKE J., “Exterior dimension of large fractals”, Physics Letters A, vol. 110, p. 1–4, 1985. [HAR 16] H ARDY G., “Weierstrass non-differential function”, American Mathematical Society Translations, vol. 17, p. 301–325, 1916. [HAU 19] H AUSDORFF F., “Dimension und äusseres Mass”, Math. Ann., vol. 79, p. 157–179, 1919. [JAF 96] JAFFARD S., M EYER Y., “Wavelet methods for pointwise regularity and local oscillations of functions”, Mem. Amer. Math. Soc., vol. 123, 1996. [KAH 63] K AHANE J., S ALEM R., Ensembles parfaits et séries trigonométriques, Hermann, 1963. [KOL 61] KOLMOGOROV A., T IHOMIROV V., “Epsilon-entropy and epsilon-capacity of sets in functional spaces”, American Mathematical Society Translations, vol. 17, p. 277-364, 1961. [KOL 01] KOLWANKAR K., L ÉVY V ÉHEL J., “Measuring functions smoothness with local fractional derivatives”, Frac. Calc. Appl. Anal., vol. 4, no. 3, p. 285–301, 2001.
Fractal and Multifractal Analysis in Signal Processing
69
[KOL 02] KOLWANKAR K., L ÉVY V ÉHEL J., “A time domain characterization of the fine local regularity of functions”, J. Fourier Anal. Appl., vol. 8, no. 4, p. 319–334, 2002. [LEV 96a] L ÉVY V ÉHEL J., “Introduction to the multifractal analysis of images”, in F ISHER Y. (Ed.), Fractal Image Encoding and Analysis, Springer-Verlag, 1996. [LEV 96b] L ÉVY V ÉHEL J., “Numerical computation of the large deviation multifractal spectrum”, in CFIC (Rome, Italy), 1996. [LEV 97] L ÉVY V ÉHEL J., R IEDI R., “Fractional Brownian motion and data traffic modeling: The other end of the spectrum”, in L ÉVY V ÉHEL J., L UTTON E., T RICOT C. (Eds.), Fractals in Engineering, Springer-Verlag, 1997. [LEV 98a] L ÉVY V ÉHEL J., G UIHENEUF B., “2-Microlocal analysis and applications in signal processing”, in International Wavelets Conference (Tangier), 1998. [LEV 98b] L ÉVY V ÉHEL J., VOJAK R., “Multifractal analysis of Choquet capacities”, Advances in Applied Mathematics, vol. 20, no. 1, p. 1–43, 1998. [LEV 01] L ÉVY V ÉHEL J., S IKDAR B., “A multiplicative multifractal model for TCP traffic”, in ISCC’2001 (Tunisia), 2001. [LEV 02] L ÉVY V ÉHEL J., “Multifractal processing of signals”, forthcoming. [LEV 04a] L ÉVY V ÉHEL J., S EURET S., “The 2-microlocal formalism”, in Fractal Geometry and Applications: A Jubliee of Benoit Mandelbrot, Proc. Sympos. Pure Math., PSPUM, vol. 72, Part 2, p. 153–215, 2004. [LEV 04b] L ÉVY V ÉHEL J., “On various multifractal spectra”, in BANDT C., M OSCO U., Z ÄHLE M. (Eds.), Fractal Geometry and Stochastics III, Progress in Probability, Birtkhäuser Verlag, vol. 57, p. 23–42, 2004. [MCM 84] M C M ULLEN C., “The Hausdorff dimension of general Sierpinski carpets”, Nagoya Mathematical Journal, vol. 96, p. 1–9, 1984. [RIE 97] R IEDI R., L ÉVY V ÉHEL J., TCP traffic is multifractal: a numerical study, Technical Report RR-3129, INRIA, 1997. [ROU 98] ROUEFF F., L ÉVY V ÉHEL J., “A regularization approach to fractional dimension estimation”, in Fractals’98 (Malta), 1998. [SEU 02] S EURET S., L ÉVY V ÉHEL J., “The local Hölder function of a continuous function”, Appl. Comput. Hamron. Anal., vol. 13, no. 3, p. 263–276, 2002. [TRI 82] T RICOT C., “Two definitions of fractal dimension”, Math. Proc. Camb. Phil. Soc., vol. 91, p. 57–74, 1982. [TRI 86a] T RICOT C., “Dimensions de graphes”, Comptes rendus de l’Académie des sciences de Paris, vol. 303, p. 609–612, 1986. [TRI 86b] T RICOT C., “The geometry of the complement of a fractal set”, Physics Letters A, vol. 114, p. 430–434, 1986. [TRI 87] T RICOT C., “Dimensions aux bords d’un ouvert”, Ann. Sc. Math. Québec, vol. 11, no. 1, p. 205–235, 1987. [TRI 88] T RICOT C., Q UINIOU J., W EHBI D., ROQUES -C ARMES C., D UBUC B., “Evaluation de la dimension fractale d’un graphe”, Rev. Phys. Appl., vol. 23, p. 111–124, 1988.
70
Scaling, Fractals and Wavelets
[TRI 99] T RICOT C., Courbes et dimension fractale, Springer-Verlag, 2nd edition, 1999. [VOJ 94] VOJAK R., L ÉVY V ÉHEL J., DANECH -PAJOUH M., “Multifractal description of road traffic structure”, in Seventh IFAC/IFORS Symposium on Transportation Systems: Theory and Application of Advanced Technology (Tianjin, China), p. 942–947, 1994.
Chapter 2
Scale Invariance and Wavelets
2.1. Introduction
Processes presenting “power law” spectra (often regrouped under the restrictive, but generic, term of “1/f ” processes) appear in various domains: hydrology [BER 94], finance [MAN 97], telecommunications [LEL 94, PAR 00], turbulence [CAS 96, FRI 95], biology [TEI 00] and many more [WOR 96]. The characteristics of these processes are based upon concepts such as fractality, self-similarity or long-range dependence and, even though these different notions are not equivalent, they all possess a common characteristic: that of replacing the idea of a structure related to a preferred time scale with that of an invariant relationship between different scales.
The study of scale invariant processes presents several difficulties ranging from modeling to analysis and processing, for which few tools were available until recently [BER 94]. The effective possibility of appropriately manipulating these processes has recently been reinforced by the appearance of adequate multiresolution techniques: the tools which are referred to here have been developed for this purpose. These tools are explicitly based on the theoretical as well as algorithmic potentialities offered by wavelet transforms.
Chapter written by Patrick F LANDRIN, Paulo G ONÇALVES and Patrice A BRY.
72
Scaling, Fractals and Wavelets
2.2. Models for scale invariance 2.2.1. Intuition From a qualitative point of view, the idea beyond a 1/f spectrum involves situations where, given a signal observed in the time domain, its empirical power spectrum density behaves as S(f ) = C|f |−α with α > 0. From a practical viewpoint, it is evidently the equivalent form: log S(f ) = log C − α log |f | which is the most significant, since the “1/f ” character is translated by a straight line in doubly logarithmic coordinates. Generally, when dealing with physical observations, referring to some 1/f spectral behavior is only meaningful with respect to a frequency analysis band. Therefore, the introduction, on the half-line of positive frequencies, of two (adjustable) frequencies fbf and fhf such that 0 < fbf < fhf < +∞, will end up with a three regime classification, whether we study the 1/f behavior in (at least) one of the three domains that this partition defines. Let us consider each of these cases: 1) fbf f fhf : this is the context of a bandpass domain where we simply observe an algebraic decrease of the spectrum density, without a predominant frequency; 2) fhf f +∞: the 1/f character is dominant in the high frequency limit and highlights the local regularity of the sample paths, their variability and their fractal nature; 3) 0 f fbf : the power law of the spectrum intervenes here in the limit of low frequencies, resulting in a divergence at the origin of the spectrum density. If 0 < α < 1, this divergence corresponds to an algebraic decrease of the correlation function, which is slow enough for the latter not to be summable; there is long-range dependence or long memory. In fact, these three regimes represent three different properties, hence, they have no reason to exist at the same time. However, they possess the common denominator of being linked to an idea of scale invariance according to which – within a scale range and up to some renormalization – the properties of the whole are the same as those of the parts (self-similarity). Indeed, a power law spectrum belongs to the class of homogenous functions. Its form therefore remains invariant under scaling in the sense that, for any c ∈ + : S(f ) = C|f |−α
=⇒
S(cf ) = C|cf |−α = c−α S(f )
Scale Invariance and Wavelets
73
Given that, through Fourier transformation, a dilation or compression in the frequency domain is translated by a corresponding compression or dilation in the time domain, it is thus legitimate to expect “1/f ” processes to be closely coupled with self-similar processes. 2.2.2. Self-similarity To be more precise [BER 94, SAM 94] (see also Chapter 5), we introduce the following definition. DEFINITION 2.1.– A process X = {X(t), t ∈ } is said to be self-similar of index H > 0 if, for any c ∈ , it satisfies the following equality in a distributional sense: L {cH X(t/c), t ∈ }, {X(t), t ∈ } =
∀c > 0.
(2.1)
According to this definition, a self-similar process does not possess any characteristic scale, insofar as it remains (statistically) identical to itself after any scale change. If, from a theoretical point of view, self-similarity is likely to extend from the largest to the finest scales, the above-mentioned definition must, in general, go with a scale domain (i.e., with a variation domain of the factor c) for which the invariance has a meaning. For instance, the finite duration of an observation settles the maximum attainable scale, in the same way as the finite resolution of a sensor limits the finest scale. It is noteworthy that, if a (second-order) process is self-similar, it is necessarily non-stationary. Indeed, assuming that, at some arbitrary time instant t1 := 1, the condition var X(t1 ) = 0 holds, it stems from Definition 2.1 that var X(t) = varX(t × 1) = t2H var X(t1 ) and, as a consequence, the variance of the process depends on time. This behavior applies to all finite moments of X: E |X(t)|q = E |X(1)|q |t|qH .
(2.2)
Therefore, in a strict sense, an ordinary spectrum cannot be attached to a self-similar process. Nevertheless, there exists an interesting sub-class of the class of self-similar processes, which, in a sense, could be paralleled with that of stationary processes: it is that of processes with stationary increments defined as follows [BER 94, SAM 94]. DEFINITION 2.2.– A process X = {X(t), t ∈ } is said to have stationary increments if and only if, for any θ ∈ , the law of its increment process:
X (θ) := X (θ) (t) := X(t + θ) − X(t), t ∈ does not depend on t.
74
Scaling, Fractals and Wavelets
Figure 2.1. Path of a self-similar process. When we simultaneously apply to the sample path of a self-similar process a dilation of the time axis by a factor c and a dilation of the amplitude axis by a factor c−H , we obtain a new sample path that is (statistically) indistinguishable from its original
In this definition, the parameter θ plays the role of a time scale according to which we study the process X. Indeed, the self-similarity of the latter is translated on its increments by:
L H (θ/c) c X (t/c), t ∈ , X (θ) (t), t ∈ =
∀c > 0.
(2.3)
It is evidently possible to extend this definition to higher order increments (“increments of increments”). Coupling self-similarity of index H and the stationary increments property implies that the parameter H remains in the range 0 < H < 1. Moreover, the covariance function of a process (originally centered and zero at the origin) must be of the form: E X(t)X(s) =
σ 2 2H |t| + |s|2H − |t − s|2H 2
(2.4)
Scale Invariance and Wavelets
75
with the identification σ 2 := E |X(1)|2 . Indeed, if we adopt the convention according to which at time t = 0, X(0) = 0, it follows from the assumptions made that: 2 1 E X 2 (t) + E X 2 (s) − E X(t) − X(s) E X(t)X(s) = 2 2 1 E X 2 (t) + E X 2 (s) − E X(t − s) − X(0) = 2 E |X(1)|2 2H |t| + |s|2H − |t − s|2H , = 2 which explains the structure of relation (2.4). Moreover, the correlation function of the increment processes X (θ) reads: σ2 |τ + θ|2H + |τ − θ|2H − 2|τ |2H E X (θ) (t)X (θ) (t + τ ) = 2 2H 2H σ 2 2H θ 1 − θ − 2 . |τ | = 1 + + 2 τ τ
(2.5)
It is now possible to study in depth its asymptotic behaviors in both limits of large and small τ s. For instance, we show that in the limit τ → +∞ (i.e., τ θ), the autocorrelation function decreases asymptotically as τ 2(H−1) : E X (θ) (t)X (θ) (t + τ ) ∼
σ2 2H (2H − 1) θ2 τ 2(H−1) . 2
(2.6)
By Fourier duality, this behavior induces an algebraic spectral divergence with exponent 1 − 2H at the origin. Self-similar processes with stationary increments are hence closely related to long-range dependent processes. In the other limit, τ → 0 (i.e., τ θ), we show that for H > 12 : (2.7) E X (θ) (t)X (θ) (t + τ ) ∼ σ 2 θ2H 1 − θ−2H |τ |2H . This behavior characterizes the local regularity of each sample path of the process X. The following sections explain the notions associated with each of these limits: long-range dependence on the one hand and local regularity on the other hand. 2.2.3. Long-range dependence DEFINITION 2.3.– A second order stationary process X = {X(t), t ∈ } is said to be “long-range dependent” (or to have “long memory”) if its correlation function cX (τ ) := E X(t)X(t + τ ) is such that [BER 94, SAM 94]: cX (τ ) ∼ cr |τ |−β ,
τ −→ +∞
(2.8)
76
Scaling, Fractals and Wavelets
with 0 < β < 1. In the same way, the power spectrum density: +∞ cX (τ ) e−i2πf τ dτ ΓX (f ) := −∞
of a long-range dependent process is such that: ΓX (f ) ∼ cf |f |−γ ,
f −→ 0
(2.9)
with 0 < γ = 1 − β < 1 and cf = 2 (2π) sin((1 − γ)π/2) Γ(γ) cr , where Γ denotes the usual Gamma function. Under its form (2.8), long-range dependence is related to the fact that, for large lags, the (algebraic) decrease of the correlation function is so slow that it does not enable its summability: hence, there is a long memory effect, in the sense that significant statistical relations are maintained between very distant samples. Obviously, this situation is in contrast with that of Markovian processes with short memory, which are characterized by an asymptotic exponential reduction of the correlations. By definition, the existence of an exponential decrease involves a characteristic time scale, whereas this is no longer the case for an algebraic decrease: hence, it is a matter of scaling law behavior. By Fourier duality, long-range dependence implies that ΓX (0) = ∞, in accordance with the power law divergence expressed by (2.9). Finally, even if the property of long-range dependence exists and although its definition is independent from that of self-similarity, relation (2.6) demonstrates that a strong bond exists between these two notions, since it indicates that the increment process of a self-similar process with stationary increments presents, if H > 12 , long-range dependence. 2.2.4. Local regularity The main issue of this section, rather than the long-term behavior of the autocorrelation function, is its short-term behavior. Let X be a second order stationary random process, whose autocorrelation function is originally such that: E X(t)X(t + τ ) ∼ σ 2 (1 − C|τ |2h ),
τ −→ 0,
0 < h < 1.
(2.10)
Hence, it is easy to prove that this original covariance structure is equivalent to an algebraic behavior of the increments variance in the limit of short increments: E |X(t + τ ) − X(t)|2 ∼ C|τ |2h ,
τ −→ 0,
0 < h < 1.
This relation provides information on the local regularity of each sample path of the process X. For Gaussian processes, for instance, it indicates that these sample paths are continuous of order h < h. When 0 < h < 1, this means that these trajectories of X are everywhere continuous but nowhere differentiable.
Scale Invariance and Wavelets
77
To describe this local regularity more precisely, one can use the notion of Hölder exponent, according to the following definition. DEFINITION 2.4.– A signal X(t) is of Hölder regularity h 0 in t0 if there exists a local polynomial Pt0 (t) of degree n = h and a constant C such that: |X(t) − Pt0 (t)| C|t − t0 |h .
(2.11)
In the case where 0 h < 1, the regular part of X(t) is reduced to Pt0 (t) = X(t0 ), thus leading to the characterization of the Hölder regularity of X(t) in t0 by the relation: |X(t0 + θ) − X(t0 )| C|θ|h .
(2.12)
The Hölder exponent heuristically supplies a measure of the roughness of the sample path of X: the closer it is to 1, the softer and more regular the path; the closer it is to 0, the rougher and the more variable the path. The asymptotic algebraic behavior of the increments variance thus highlights a Hölder regularity h < h of the sample paths of the process X. This correspondence between the asymptotic algebraic behavior of increments and the local regularity remains valid even if the process X is no longer stationary, but only has stationary increments. The processes whose sample paths possess a uniform and constant local regularity are said to be monofractal. As far as self-similar processes with stationary increments are concerned, it is easy to observe that, on the one hand, starting from (2.12), the increments present an algebraic behavior for all the θ in general, hence in particular in the limit θ → 0: E |X(t + θ) − X(t)|2 = E |X(θ) − X(0) |2 = σ 2 |θ|2H
(2.13)
=0
whereas, on the other hand, relation (2.13) indicates that the increment process presents an autocovariance as in (2.10). Self-similar processes with stationary increments, as well as their increment processes, thus present uniform local regularities (i.e., on average and everywhere) h < H. 2.2.5. Fractional Brownian motion: paradigm of scale invariance The simplest and most commonly used model of a self-similar process is that of fractional Brownian motion (FBM) [MAN 68], which is characterized by its real exponent 0 < H < 1, called the Hurst exponent. DEFINITION 2.5.– FBM BH = {BH (t), t ∈ ; BH (0) = 0} is the only zero-mean Gaussian process which is self-similar and possesses stationary increments.
78
Scaling, Fractals and Wavelets
The self-similarity and the stationary nature of the increments guarantee that the covariance function of FBM is of the form (2.4). As regards the Gaussian character, it demands that the probability law of FBM must be entirely determined by this covariance structure. FBM can be considered as a generalization of ordinary Brownian motion. In the case of ordinary Brownian motion, we know that the increments possess the particularity of being decorrelated (and therefore independent because of Gaussiannity). The generalization offered by FBM consists of introducing a possibility of correlation between the increments. In fact, we show that: E BH (t + θ) − BH (t) BH (t) − BH (t − θ) = σ 2 22H−1 − 1 |θ|2H which confirms the decorrelation between the increments when H = 12 (i.e. ordinary Brownian motion) but induces a positive correlation (persistence) or a negative correlation (antipersistence) depending on whether H > 12 or H < 12 . DEFINITION 2.6.– We call fractional Gaussian noise (FGN) the increments process GH;θ := {GH;θ (t), t ∈ } defined by: GH;θ (t) :=
1 (θ) B (t), θ H
θ>0
(2.14)
where BH is FBM. It is, by construction, a stationary process, everywhere continuous but nowhere differentiable, that can be considered as an extension of white Gaussian noise. Hence, we must be very careful when we decide to take the limit of definition (2.14) when θ → 0. Nevertheless, if we are interested in the behavior of FGN with “small” increments, we observe according to (2.6) and (2.14) that: cGH;θ (τ ) := E GH;θ (t)GH;θ (t + τ ) ∼
σ2 2H(2H − 1)τ 2(H−1) , 2
τ θ.
On the one hand, this behavior highlights that FGN presents some long memory and, on the other hand, that the power spectrum density of FGN is proportional to |f |−(2H−1) . It is therefore possible to prove on the basis of several arguments (integration/differentiation type) [FLA 92] that the FBM itself possesses an “average spectrum” of the form ΓBH (f ) ∝ |f |−(2H+1) . Along with its role of spectral exponent, the parameter H also controls the Hölder regularity of the sample paths of FBM and FGN, which is h < H in any point. To this regularity (or irregularity), a notion of fractality is naturally associated with the Gaussian processes, since the Hausdorff dimension of the sample paths is equal to dimH graph(BH ) = 2 − H (for a precise definition of the Hausdorff dimension, see Chapter 1).
Scale Invariance and Wavelets
79
As a result, FBM presents the advantage (or the disadvantage) of being globally self-similar on the entire frequency axis, the only parameter H controlling, according to the requirements, one or other of the three regimes cited before: self-similarity, long memory and local regularity. In terms of modeling, FBM appears as a particularly interesting starting point (as can be the case for white Gaussian noise in stationary contexts). This simplicity (FBM is the only Gaussian process with stationary and self-similar increments and it is entirely determined by the single parameter H) is not of course without counterparts when it is comes to applications, i.e. as soon as it becomes necessary to consider real data. From this theme, numerous variations can be considered, which are not only mentioned here but are also studied in detail in the other chapters of this volume. In all cases, it is a matter of replacing the single exponent H by a collection of exponents. 2.2.6. Beyond the paradigm of scale invariance To begin with, we can consider modifying relation (2.13) by allowing the exponent to depend on time: E |X(t + θ) − X(t)|2 ∼ C(t)|θ|2h(t) ,
θ → 0.
When 0 < h(t) < 1 is a sufficiently regular deterministic function, we describe the process X as multifractional or, when it is Gaussian, as locally self-similar, i.e., that locally around t, X(t) possesses similarities with a FBM of parameter H = h(t) (for more details, see Chapter 6). The local regularity is no longer a uniform or global quantity along the sample path but, on the contrary, it varies in time, according to h(t), which therefore makes it possible to model time variations of the roughness. When h(t) is itself a strong irregular function, possibly a random process, in the sense that, with t fixed, h(t) depends on the observed realization of X, the process X is said to be multifractal. The variability fluctuations are no longer described by h(t), but by a multifractal spectrum D(h) which characterizes the Hausdorff dimension of the set of points t where h(t) = h (see Chapter 1 and Chapter 3). One of the major consequences of multifractality in the processes is the fact that quantities usually called partition functions, behave according to power laws in the small scale limit: 1 (τ ) |X (t + kτ )|q cq |τ |ζ(q) , n n
|τ | → 0
(2.15)
k=1
n For processes with stationary increments, the time averages (1/n) k=1 |X (τ ) (t+ kτ )|q can be regarded as estimations of the averages of the set E |X (τ ) (t)|q . Relation (2.15) thus recalls equation (2.2), which is a consequence of self-similarity. However, a fundamental difference exists: the exponents ζ(q) do not possess a priori any
80
Scaling, Fractals and Wavelets
reason to present a linear behavior qH. In other words, the description of scaling laws in data cannot be carried out with a single exponent but requires a whole collection of them. Measuring exponents ζ(q) represents a possibility, through a Legendre transform, of estimating the multifractal spectrum. However, a detailed discussion of the multifractal processes is beyond the scope of this chapter; to this end, see Chapter 1 and Chapter 3. Multifractal processes provide a rich and natural extension of the self-similar model insofar as a single exponent is replaced by a set; nevertheless, they are essentially related to the existence of power law behaviors. In the analysis of experimental data, such behaviors might not be observed. In order to illustrate these situations, the infinitely divisible cascades model exploits an additional degree of freedom: we relax the constraint of a proper power law behavior for the moments, and replace it with a simple behavior that has separable variables q (order of the moment) and τ (scale analysis). The equations below explain this behavior: self-similar multifractal inf. divisib. casc.
E |X (τ ) (t)|q = cq |τ |qH = cq exp(qH log τ ); E |X (τ ) (t)|q = cq |τ |ζ(q) = cq exp ζ(q) log τ ; E |X (τ ) (t)|q = cq exp H(q)n(τ ) .
(2.16) (2.17) (2.18)
In this scenario, the function n(τ ) is no longer fixed a priori to be log τ , as much as the function H(q) is no longer a priori linear according to qH. The concept of an infinitely divisible cascade was initially introduced by Castaing in the context of turbulence [CAS 90, CAS 96]. The complete definition of this notion is beyond the scope of this chapter and can be found in [VEI 00]. It is nonetheless important to indicate that a quantity, called the propagator of the cascade, plays an important role here: it links the probability densities of process increments with two different scales τ and τ . The infinite divisibility formally translates the notion of the absence of any preferred time scale and demands this propagator be constituted of an elementary function G0 , convoluted with itself a number of times dependent only on the scales τ and τ , and therefore with the following functional form: Gτ,τ (log α) = [G0 (log α)]∗(n(τ )−n(τ
))
.
A possible interpretation of this relation is to read the function G0 as the elementary step, i.e. the building block of the cascade, whereas the quantity n(τ ) − n(τ ) measures how many times this elementary step must be carried out to evolve from scale τ to scale τ . The derivative of n with respect to log τ thus describes, in a sense, the size of the cascade. The term of infinitely divisible cascade is ascribed to situations where the function n possesses the specific form n(τ ) = log τ ; otherwise, we only refer to a scaling law behavior. The infinitely divisible scale invariant cascades correspond to multiscaling or multifractality when the scaling law
Scale Invariance and Wavelets
81
exists in the small scale limit. The exponents ζ(q) associated with the multifractal spectrum are thus connected to the propagator of the cascade by ζ(q) = H(q). When the functions H and n simultaneously take the forms H(q) = qH and n(τ ) = log τ , the infinitely divisible cascades are simply reduced to the case of self-similarity, thus represented as a particular case. The propagator is hence written as a Dirac function, Gτ,τ (log α) = δ(log α − H log(τ /τ )). The fundamental characteristic of the infinitely divisible cascades – separation of variables q and τ – induces the following relations, which are essential for the analysis [VEI 00]: log E |X (τ ) |q = H(q)n(τ ) + Kq ; log E |X (τ ) |q =
H(q) log E |X (τ ) |p + κq,p . H(p)
(2.19) (2.20)
These equations indicate that the moments behave as power laws with respect to each other, this property being exploited in the analysis. Further definitions, interpretations and applications of the infinitely divisible cascades can be found in [VEI 00]. 2.3. Wavelet transform 2.3.1. Continuous wavelet transform The continuous wavelet decomposition of a signal X(t) ∈ L2 (; dt) is a linear transformation from L2 (; dt) to L2 (×+∗ ; dtada 2 ), defined by [DAU 92, MAL 98] u−t 1 X(u)ψ ∗ du. (2.21) TX (a, t) := √ a a This is the inner product between the analyzed signal X and a set of analyzing waveforms obtained from a prototype wavelet (or mother wavelet) ψ by dilations with a scale factor a ∈ +∗ and shifts in time t ∈ . In order for the wavelet transform to be a joint representation in time and frequency of the information contained in X, in other words, to be so that the coefficients TX (a, t) account for X around a given instant, in a given frequency range, the mother wavelet must be a function well localized in both time and frequency. In order to obtain the inverse of the wavelet transform, it is also necessary that the mother wavelet satisfies a closure relation: t − t du da = δ(t − t ), ψ(u)ψ u − a a2 which induces a condition called admissibility: ∞ cψ = |Ψ(f )|2 df /|f |. −∞
82
Scaling, Fractals and Wavelets
Given this condition, it is possible to reconstruct the signal X by inverting the wavelet transform according to: u − t dt da −1 TX (a, t)ψ . X(u) = cψ a a2 From the admissibility constraint, it also follows that ψ must satisfy: ψ(t) dt = 0. Such a waveform ψ is therefore an oscillating function, localized on a short temporal support, hence the name wavelet. This oscillating behavior indicates that the wavelet transform does not detect the DC component (average value) of the analyzed signal X. For certain mother wavelets, this property can be extended to higher orders: tk ψ(t) dt = 0, ∀0 k < nψ which means that the wavelet analyzing a signal X is orthogonal to the polynomial components of a degree lower than or equal to its number of vanishing moments nψ . In other words, the wavelet coefficients obtained from a mother wavelet characterized by nψ vanishing moments are insensitive to the behaviors of the signal, which are more regular, i.e. softer than the behavior of a polynomial of a degree strictly lower than nψ ; on the other hand, they account for the information relative to behaviors that are more irregular than such polynomial trends. 2.3.2. Discrete wavelet transform One of the fundamental characteristics of the continuous wavelet transform is its redundant character: the information contained in a signal, i.e. in a space of one dimension, is represented, through the wavelet transform, in a space of dimension 2, the time-scale plane (t, a) ∈ ( × ∗+ ); neighboring coefficients thus share some part of the same information. To reduce this redundance, we define the discrete wavelet transform by the set of coefficients: j/2 X(u) ψ(2j u − k) du (2.22) dX (j, k) := 2 defined using a critical discrete1 sampling of the time-scale plane, which is usually called the dyadic grid: t → 2−j k, a → 2−j , (k, j) ∈ Z × Z ,
1. In [DAU 92] a detailed study can be found of the frames or oblique bases which correspond to the sub-critical sampling of the time-scale plane.
Scale Invariance and Wavelets
83
thus ending up with the correspondence: dX (j, k) = TX (t = 2−j k, a = 2−j ). In this case, the collection of dilated and shifted versions of the mother wavelet {ψj,k (t), j ∈ Z, k ∈ Z} may constitute a basis for L2 (). Here, to simplify, we will suppose that this refers to orthonormal wavelet bases. However, discrete wavelet transforms are not necessarily or a priori equivalent to the existence of an orthonormal basis. The strict definition of the discrete wavelet transform leading to (orthonormal) bases goes through multiresolution analysis. A multiresolution analysis consists of a collection of nested subspaces of L2 (): . . . ⊂ Vj+1 ⊂ Vj ⊂ Vj−1 ⊂ . . . Each Vj , j ∈ Z possesses its own orthonormal basis {2j/2 φ(2j · −k), k ∈ Z} constructed, as for the wavelets, from a prototype scaling function2 (or father wavelet) φ0 onto which dyadic dilations and integer shifts are applied. The embedded structure demands that the function φ must satisfy a two-scale relation: √ un φ(t − n). φ(t/2) = 2 n
The projection of a signal X ∈ L2 () on this basis thus supplies the approximation coefficients at scale j: j/2 X(u) φ(2j u − k) du. aX (j, k) := 2 To complete these approximations, it is necessary to project the signal X onto the supplementary spaces of Vj in Vj+1 ; therefore, we define Wj by: Vj ⊕ Wj = Vj+1 ,
j−1
Wj = ∅,
Vj :=
Wj . j =−∞
j∈Z
For each fixed scale j, the wavelet family {2j/2 ψ(2j . − k), k ∈ Z} thus forms a (orthonormal) basis of the corresponding subspace:
+∞ j/2 j dX (j, k) ψ(2 t − k) . Wj := X : X(t) = 2 k=−∞
2. To define a multiresolution analysis, this function φ has to satisfy a certain number of constraints which are not detailed here [DAU 92].
84
Scaling, Fractals and Wavelets
There again, the embedded structure imposes a two-scale relation on the wavelet: √ vn φ(t − n). ψ(t/2) = 2 n
The wavelet coefficients or detail coefficients therefore correspond to the projections of X on Wj . The signal X can thus be represented as a sum of approximations and details: aX (j, k) 2j/2 φ(2j t − k) X(t) = k
+
−∞ j =j
(2.23)
j /2
dX (j , k) 2
j
ψ(2 t − k).
k
time
scale
signal
details approximation
high-pass filter + decimation
low-pass filter + decimation
Figure 2.2. Fast pyramidal algorithm with filter structure for discrete wavelet decompositions. An approximation aX (0, k) (at scale 0) of the continuous time signal X is initially calculated (only this stage involves a continuous time evaluation from X). The “signal” represented in the figure is made up by the sequence aX (0, k). In multiresolution analysis this approximation a0,k is decomposed in a series of details dX (−1, k) and a new and rougher approximation aX (−1, K). This procedure is then iterated from the sequence aX (−1, k). The impulse responses of the discrete-time filters depend on the generating sequences u and v which define the scaling function and the wavelet. In the case of orthonormal bases, they are exactly equal to them
Scale Invariance and Wavelets
85
Finally, thanks to the properties of embedded spaces specific to multiresolution analysis, there exist very fast algorithms with a pyramidal structure, which enables effective and efficient calculations of the discrete decomposition coefficients. From the sequences u and v, described as the generators of multiresolution analysis, we can prove that the approximation and detail coefficients at octave j can be calculated from those at octave j − 1: aX (j, k) = X(t)2j/2 φ(2j t − k) dt = =
√ X(t)2j/2 2 un φ 2(2j t − k) − n dt
n
un
X(t)2(j+1)/2 φ(2j+1 t − 2k − n) dt
n
=
un aj+1,2k+n
n
=
u∨ n aj+1,2k−n
n
= u∨ · ∗ aj+1,· (2k) and, in an identical manner: dX (j, k) = v·∨ ∗ aX (j + 1, ·) (2k) where ∗ denotes the discrete time convolution operator, i.e., (x· ∗ y· )(k) = ∨ ∨ n x(n)y(k − n), un = u−n and vn = v−n . The two previous relations can be rewritten by using the decimation operator ↓2 (y = ↓2 x means yk = x2k , i.e., that every other sample x is left out): ! " aX (j, k) = ↓2 (u∨ · ∗ aj−1,· ) (k); ! " dX (j, k) = ↓2 (v·∨ ∗ aj−1,· ) (k). Thanks to this recursive structure, the calculation cost of discrete wavelet decomposition of a signal uniformly sampled on N points is in O(N ). 2.4. Wavelet analysis of scale invariant processes The aim of this section is to study how these fundamental principles (scale changing operator) and essential properties (multiresolution structure, number of vanishing moments, localization) of wavelet decomposition can be exploited, in order to characterize and easily measure the scale invariance phenomena that have been previously described.
86
Scaling, Fractals and Wavelets
Let us note that the set of results mentioned below can be formulated in the same way as with the continuous wavelet decompositions. However, for the sake of simplicity and conciseness, we will only tackle the case of discrete random fields of orthogonal wavelet coefficients, arising from the decomposition of scale invariant processes. 2.4.1. Self-similarity PROPOSITION 2.1.– The wavelet coefficients resulting from the decomposition of a self-similar process of index H satisfy the equality: L −j(H+ 1 ) 2 dX (j, 0), . . . , dX (j, Nj − 1) = dX (0, 0), . . . , dX (0, Nj − 1) . 2 This result, initially demonstrated for the FBM [FLA 92] and then generalized to the set of self-similar processes [AVE 98], is based on the scale invariance principle stemming from the dilation/compression operator which defines the wavelet analysis. To outline the proof, it is only necessary to write down the main argument: when L 2jH X(u): X(2j u) = dX (j, k) = X(u)ψ(2j u − k)2j/2 du = L
=
2−j/2 X(2−j u)ψ(u − k) du 1
2−j(H+ 2 )
X(u)ψ(u − k) du
1
= 2−j(H+ 2 ) dX (0, k). The principal consequence of self-similarity is the fact that, when they do exist, the q-th order moment of the wavelet coefficients satisfy the equality: 1
E |dX (j, k)|q = 2−jq(H+ 2 ) E |dX (0, k)|q . PROPOSITION 2.2.– The wavelet coefficients resulting from the decomposition of a process with stationary increments are stationary at each scale 2j . To understand the origin of this result, let us note that the sampled process of −j increments X (θ=2 ) [2−j k] := X((k + 1)2−j ) − X(k 2−j ) can be identified with a wavelet decomposition (2.22) according to: −j X (2 ) [2−j k] = 2j X(u) [δ(2j u − k − 1) − δ(2j u − k)] du = 2j/2 dX (j, k),
Scale Invariance and Wavelets
87
with ψ(t) = δ(t−1)−δ(t) as the analyzing wavelet (an elementary wavelet sometimes referred to as the poor man’s # wavelet). In fact, it is the naturally admissible oscillating structure of the wavelets ( ψ(t) dt = 0) which guarantees this stationarity in the case of processes with stationary increments. Heuristically, and by underlining the main argument – the fact that the number of vanishing moments is at least greater than or equal to 1 – the proof reads (on the coefficients of the discrete decompositions and with j = 0 to simplify the writing): dX (0, k + k0 ) = X(u)ψ(u − k − k0 ) du = X(u + k0 )ψ(u − k) du = [X(u + k0 ) − X(k0 )]ψ(u − k) du L [X(u) − X(0)]ψ(u − k) du = = X(u)ψ(u − k) du = dX (0, k). This proof highlights the role played, for stationarization, by the fact that ψ is of zero-mean value (i.e. that its number of vanishing moments is at least 1). This result was obtained in the case of FBM, although directly from the covariance form, in [FLA 92, TEW 92], extended to stable cases, independently by different authors [DEL 00, PES 99] and proved in a general context in [CAM 95, MAS 93]. Given that we are dealing with processes with stationary increments of order p, the simple admissibility condition of the wavelets is no longer sufficient. Hence, it is necessary to choose a wavelet analysis ψ possessing nψ p vanishing moments so that the coefficient series dX (j, k) obtained are stationary at each scale. The complete proof of this result is given in [AVE 98]. However, a good way to make the issue clearer would be to argue here that the wavelet tool plays a role similar to that of a differentiation operator, insofar as the number of vanishing moments control, by time-frequency duality, the behavior of the spectrum magnitude |Ψ(f )| in the vicinity of the zero frequency. Indeed, for a wavelet ψ possessing nψ vanishing moments, we have |Ψ(f )| ∼ |f |nψ , f → 0, which at first approximation we can identify with the differentiation operator of order nψ . PROPOSITION 2.3.– The wavelet coefficients resulting from the decomposition of a process X which is zero-mean, self-similar of index H, of finite variance and with stationary increments (H − ASAS ) possesses, when they exist, moments of order q satisfying the following scaling law: 1
E |dX (j, k)|q = E |dX (0, 0)|q 2−jq(H+ 2 ) .
88
Scaling, Fractals and Wavelets
This last result stems directly from the coupling of the two previous propositions. For processes with finite variance (i.e., whose third order moment 2 exists) – Gaussian processes, just as the FBM, for instance – this relation takes on the following specific form: E |dX (j, k)|2 = E |dX (0, 0)|2 2−j(2H+1) .
(2.24)
Given that the latter are second order statistics, the particular form (2.4) of the covariance structure of a H − ASAS process makes it possible to deduce the asymptotic behavior of the dependence structure of the wavelet coefficients [FLA 92, TEW 92]. PROPOSITION 2.4.– The asymptotic covariance structure of the wavelet coefficients of a process X which is zero-mean, self-similar of index H, of finite variance and with stationary increments (H − ASAS ) takes on the form:
E dX (j, k)dX (j , k ) ≈ |2−j k − 2−j k |2(H−nψ ) ,
|2−j k − 2−j k | → ∞
which illustrates, on the one hand, that the larger the number of vanishing moments, the shorter the range of the correlation and on the other hand, that if H > nψ + 12 , the long-range dependence which exists for the increment process if H > 12 , is transformed into a short-range dependence [ABR 95, FLA 92, TEW 92]. The set of the results which have just been presented can be made more precise when we specify the distribution law which underlies the self-similar process with stationary increments. The Gaussian case, illustrated by the FBM, has been widely studied [FLA 92, MAS 93]. Its wavelet coefficients are Gaussian at all scales. More recently, interest in the non-Gaussian case has led to developments for self-similar α-stable processes (or α-stable motions) [ABR 00a, DEL 00]. Hence, we can deduce from the wavelet decomposition of such processes that the series of their coefficients dX (j, k), in addition to the above-mentioned properties, is itself α-stable with the same index. 2.4.2. Long-range dependence As specified in section 2.2.3, stationary processes with “long-range dependence” are characterized by a slow decrease of their correlation function cX (τ ) ∼ cr |τ |−β , 0 < β < 1. Thus, the strong statistical connections maintained even between distant samples, X(t) and X(t + τ ), make the study and analysis of such processes much more complex, by impairing, for example, the convergence of algorithms relying on empirical moment estimators. It will be shown hereafter that wavelet decomposition of a process with long-range dependence makes it possible to circumvent this difficulty since – under certain conditions – the series of coefficients dX (j, k) exhibit
Scale Invariance and Wavelets
89
short-term dependence. The covariance function of the wavelet coefficients possesses the following form: E dX (j, k)dX (j , k ) j+j −β 2 cr |τ | ψ(2j u − k) ψ 2j (u − τ ) − k du dτ ∼2 = 2−
j+j 2
cf
(2.25)
Ψ(2−j f )Ψ(2−j f ) −i2πf (2−j k−2−j k ) e df, |f |γ
indicating that its asymptotic behavior, i.e. for the large values of the interval |2−j k − 2−j k |, is equivalent to that of its original Fourier transform and hence to that of the relation:
2−(j+j )nψ 2(−j−j )nψ |f |2nψ |Ψ(2−j f )Ψ(2−j f )| ∼ = . −γ γ f →0 |f | |f | |f |γ−2nψ Thus, we can observe the effect of the number of vanishing moments nψ of the wavelet, which may compensate the original divergence of the spectrum density of the process. By choosing a wavelet such that nψ γ/2, the long-range dependence of the process is no longer preserved in the coefficient sequences of the decomposition. Hence, the bigger nΨ is, the faster the residual correlation decreases:
E dX (j, k)dX (j , k ) ≈ |2−j k − 2−j k |γ−2nψ −1 ,
|2−j k − 2−j k | → ∞.
From equation (2.25) we can also prove that the variance of the wavelet coefficients follows a power law behavior as a function of scales: E |dX (j, k)|2 = 2−j(1−β) cf
|Ψ(f )|2 df = c0 2−jγ . |f |γ
(2.26)
This relation will be at the core of the estimation procedure of the parameter γ (see the following section). Finally, it is important to specify that, since it is possible to invert the wavelet decompositions (see equation (2.23)), the non-stationarity of the studied processes does not disappear from the analysis (no more than the long-range dependence does); all the information is preserved but redistributed differently amongst the coefficients. Thus, long-range dependence and non-stationarity are related to the approximation coefficients aX (j, k) of the decomposition, whereas self-similarity is observed through the scales, by an algebraic progression of the moments of order q of the detail coefficients dX (j, k).
90
Scaling, Fractals and Wavelets
2.4.3. Local regularity The local regularity properties of process sample paths have been introduced in section 2.2.4. Their “wavelet” counterparts most often derive from the orthogonal discrete wavelet transform, given that they could be extended, normally quite easily, to the continuous (surfaces) varieties (see Chapter 3). THEOREM 2.1.– Let X be a signal with Hölder regularity h 0 in t0 and ψ a sufficiently regular wavelet (nψ h). Hence, there exists a constant c > 0 such that for any j, k ∈ Z × Z: 1 |dX (j, k)| c 2−( 2 +h)j 1 + |2j t0 − k|h . Conversely, if for any j, k ∈ Z × Z: 1 |dX (j, k)| c 2−( 2 +h)j 1 + |2j t0 − k|h for h < h, thus X has Hölder regularity h in t0 . The proof of the theorem was established independently by Jaffard [JAF 89] and Holschneider and Tchamitchian [HOL 90]. In the light of this result, we note once again that it is the decrease of wavelet coefficients through scales which characterizes the local regularity of the sample path of X. Furthermore, this result is not surprising since the Hölder regularity of a function is a particular cause for the 1/f spectral behavior at high frequencies. The second part of Theorem 2.1 also shows that knowledge of the coefficients located “vertically” to the singular point (|2j t0 − k| = 0) is itself not sufficient to determine the local regularity of X in t0 . Strictly speaking, it would be necessary to consider the decomposition in its entirety, thus implying that an isolated singularity can affect all the coefficients dX (j, k) inside a cone, called an influence cone. For a wavelet whose temporal support is finite, this cone is also limited at each scale. From the estimation point of view, the direct implication of Theorem 2.1 is to highlight the practical limits of (discrete) orthogonal wavelet transforms, because it is quite unlikely that the abscissa t0 of the singularity coincides with the coefficients line on the dyadic grid. Hence, in practice, it is more often a continuous analysis diagram which is preferred, for which we possess a less precise and incomplete version (direct implication) of Theorem 2.1 (see the following proposition). PROPOSITION 2.5.– If X is of Hölder regularity n < h < n + 1 in t0 , for a wavelet analysis ψ possessing nψ h vanishing moments, then we have the following asymptotic behavior: 1
TX (t0 , a) ∼ O(ah+ 2 ),
a −→ 0+ .
Scale Invariance and Wavelets
91
Proof. Let the continuous wavelet transform (2.21) constructed with nψ > n be: √ TX (t0 , a) = a ψ(u) X(t0 + au) du =
√
$
a
ψ(u) X(t0 + au) −
n
% r r
du,
cr a u
r=0
where cr represent the Taylor expansion coefficients of X in the vicinity of t0 . The signal X is of regularity n < h < n + 1 in t0 and ψ is a localized time function. Thus, in the limit of infinitely fine resolutions (a → 0+ ): √ lim+ TX (t0 , a) C a ψ(u)|au|h du a→0
h+ 12
a
1
|u|h ψ(u) du = Cψ ah+ 2 .
It is important to underline that if the wavelet ψ is not of sufficient regularity, it is the term of degree nψ in the Taylor polynomial which dominates at finite scales and it is thus the regularity of the wavelet which imposes the decrease of the coefficients through scales. However, one should not be misled by the interpretation of Proposition 2.5. It is only because we focus on the limited case of infinite resolution that the influence of the singularity seems to be perfectly localized in t = t0 . In reality, it is shown in [MAL 92] that, in the case of non-oscillating singularities (see Chapter 3), it is necessary and sufficient to consider the maximum local lines of the wavelet coefficients situated inside the influence cone, {TX (a, t) : |t − t0 | < c a}, to be able to characterize the local regularity of the process. In addition, the practical use of this property is made more difficult by the necessarily finite resolution imposed by the sampling of the data, which does not permit detailed scrutiny of the data beyond a minimum scale, which is noted by convention a = 1. Furthermore, the different aspects of the study of the local regularity of a function constitutes an important object in other chapters of this work. This is of true for Chapter 3, which tackles the issue of the characterization of functional regularity spaces, of Chapter 6 and Chapter 5 which expose the case of multifractional processes and their sample path regularity, and finally of Chapter 1, which presents the multifractal spectra as statistical and geometric measures of the distribution of pointwise singularities of a process. Finally, let us note that, as previously indicated in section 2.2.4, the increments of stochastic stationary processes with stationary increments for which the Hölder
92
Scaling, Fractals and Wavelets
exponent is constant throughout the sample paths satisfy the asymptotic relation: E |X(t + τ ) − X(t)|2 ∼ C|τ |2h ,
|τ | → 0.
The latter can be rewritten identically on the wavelet coefficients, which can be either continuous or discrete: E |TX (a, t)|2 ∼ a2h+1 ,
a −→ 0;
(2.27a)
E |dX (j, k)|2 ∼ 2−j(2h+1) ,
j −→ +∞.
(2.27b)
These relations should be compared with those obtained in the case of self-similarity (2.24) and long-range dependence (2.26), and will serve as a starting point in the construction of estimators (see the following section). 2.4.4. Beyond second order In this chapter, analysis is limited to the detailed presentation of the wavelet analysis of scaling laws existing in the second statistical order (self-similarity, long-range dependence, constant local regularity). Nevertheless, the study of scaling law models which involve all statistical orders (multifractal processes, infinitely divisible cascades) can be carried out from wavelet analysis in the same way and benefits from the same qualities and advantages. The wavelet analysis of multifractal processes is developed by S. Jaffard and R. Riedi in Chapter 3 and Chapter 4 respectively. The wavelet analysis of infinitely divisible cascades is detailed in [VEI 00]. 2.5. Implementation: analysis, detection and estimation This section is devoted to the implementation of wavelet analysis in the study of scale invariance phenomena, whether it is for detecting and highlighting them, or for estimating the parameters which describe them. The previous sections have outlined the power law behavior of the scale variance of wavelet coefficients: E |dX (j, k)|2 c C 2jα
(2.28)
see, for self-similarity, equation (2.24); for monofractality of long-range dependence, equation (2.26); for monofractality of sample paths, equation (2.27); these equations are represented in Table 2.1. This relation crystallizes the potential of the “wavelet” tool for the analysis of scaling laws and will hence be the core aspect of this section. In a real situation, we must begin by validating the relevance of a model before estimating its parameters. In other words, we must first highlight the existence of scale invariance phenomena and identify a scale range within which the above-mentioned
Scale Invariance and Wavelets
α
c
C
Self-sim. with stat. incr. 2H + 1 σ 2 = E |X(1)|2 Long-range dependence
γ
#
|Ψ(f )|2 /|f |2H+1 df #
cf
Uniform local regularity 2h + 1
93
|Ψ(f )|2 /|f |γ df
–
–
Table 2.1. Summary of scaling laws existing in the second statistical order
relation supplies an appropriate description of the data, then carry out the measure of exponent α. A simple situation serves as an introduction, where we suppose that there is an octave range j1 j j2 , already identified, for which the fundamental relation is satisfied in an exact manner: j1 j j2 ,
E |dX (j, k)|2 = cf C 2jα
and we concentrate on the estimation of the parameter α. 2.5.1. Estimation of the parameters of scale invariance To estimate the exponent α, a simple idea consists of measuring the slope of log2 E |dX (j, k)|2 against log2 2j = j. In practice of course, this implies that we have to estimate the quantity E |dX (j, k)|2 from a single observed realization of finite length. Given the properties of the wavelet coefficients (stationarity, weak statistical dependence) put forth in the previous section, we simply propose to perform the estimation of the ensemble average by the time average [ABR 95, ABR 98, FLA 92]: 1 dx (j, k)2 nj k
where nj designates the number of wavelet coefficients available at octave j. DEFINITION 2.7.– Let us begin by recalling the main characteristics of linear regression. Let yj be the random variables such that E yj = αj + b and let us define σj2 = var yj . The estimator by a weighted linear regression of α reads: j2 α ˆ=
j=j1
yj (S0 j − S1 )/σj2 S0 S2 − S12
≡
j2 j=j1
wj yj
(2.29)
94
Scaling, Fractals and Wavelets
with: Sp =
j2
j p /aj ,
p = 0, 1, 2
j=j1
where aj are arbitrary quantities, acting as weights associated with yj . With these definitions, the weights wj satisfy the usual constraints, i.e. j2 j2 j=j1 jwj = 1 and j=j1 wj = 0. We can also easily observe that the estimator is unbiased: α ˆ = α. Moreover, the variance of this estimator is written, in the case of uncorrelated variables yj : var α ˆ=
j2
aj σj2 .
j=j1
The choice of the weights remains to be specified. We know that the variance α ˆ is minimal if we take into consideration the covariance structure of yj , yj in the definition of aj . Once again, in the case of uncorrelated variables yj , this leads us to choose aj ≡ σj2 . In the case of scale invariance, we use the estimator defined earlier with the variables: 1 dX (j, k)2 − g(j), yj = log2 nj k
where g(j) are the correction terms aimed at taking into account the fact that E(log(·)) is not log(E(·)) and at ensuring that E yj = αj + b. Hence, this estimator simply consists of a weighted linear regression carried out in the diagram yj against j, referred to as the log-scale diagram [VEI 99]. In order to easily implement this estimator, it is necessary to further determine g(j) and σj2 and choose aj . To begin with, we assume that dx (j, k) are random Gaussian variables, i.e., that they result from the wavelet decomposition of a process which is itself jointly Gaussian. Moreover, if we idealize the weak correlation property of the wavelet coefficients in exact independence, then we can calculate g(j) and σj2 analytically: g(j) = Γ (nj /2)/ Γ(nj /2) log 2 − log2 (nj /2) ∼
−1 , nj log 2
nj → ∞;
(2.30a)
σj2 = ζ(2, nj /2)/ log2 2 ∼
2 , nj log2 2
nj → ∞,
(2.30b)
Scale Invariance and Wavelets
95
where Γ and Γ respectively designate the Gamma function and its derivative, and ∞ where ζ(2, z) = n=0 1/(z + n)2 defines a function called the generalized Riemann zeta function. Let us note that these analytical expressions, which depend on the known nj s alone, can hence be easily evaluated in practice. The numerical simulations presented in depth in [VEI 99] indicate that, for Gaussian processes, this analytical calculation happens to be an excellent approximation of reality, satisfying a posteriori the idealization of exact independence. Thus, for Gaussian processes, we obtain an estimator which is remarkably simple to carry out, since the quantities g(j) and σj2 can be analytically calculated and do not need to be estimated from data, and which gives excellent statistical performance. From these analytical expressions, we obtain: Eα ˆ = α, which indicates that the estimator is unbiased and this is also valid for observations of finite duration. With the choice aj = σj2 , its variance reads: Var α ˆ=
j2
σj2 wj2 =
j=j1
S0 S0 S2 − S12
and it attains the Cramér-Rao lower bound, which is calculated under the same hypotheses (Gaussian process and exact independence) [ABR 95, VEI 99, WOR 96]. In addition, if we add the form nj = 2−j n (with n as the number of coefficients in the initial process) induced by the construction of the multiresolution analysis, we obtain the following expression for the variance: Var α ˆ=
j2
σj2 wj2
j=j1
1 (1 − 2−J ) · 1−j1 , n 2 F
(2.31)
where F = F (j1 , J) = log2 2·(1−(J 2 /2+2) 2−J +2−2J ) and where J = j2 −j1 +1 denotes the number of octaves involved in the linear regression. This analytical result shows that the variance of the estimator decreases in 1/n, in spite of the possible presence of a long-range dependence in the analyzed process. It is noteworthy that, in practice, relation nj = 2−j n is not exactly satisfied because of boundary effects, which are systematically excluded a priori from the measures. In the case of non-Gaussian processes, the implementation of the estimator is more subtle, since we cannot use analytical expressions for g(j) and σj2 . Nevertheless, in the case of finite variance processes, the variables (1/nj ) k dX (j, k)2 are asymptotically Gaussian and we can show that correcting terms can be introduced [ABR 00b] to the Gaussian case: g(j) ∼ −
1 + C4 (j)/2 ; nj log 2
σj2 ∼ 2/ log2 2
1 + C4 (j)/2 , nj
96
Scaling, Fractals and Wavelets
where C4 denotes the normalized fourth-order cumulant: 2 2 C4 (j) = E dX (j, k)4 − 3 E dX (j, k)2 / E dX (j, k)2 of the wavelet coefficients at octave j. The practical use of these relations requires the estimation of C4 , which can be difficult, as well as the guarantee that, for each octave, a sufficient number of points exist, so that the above form, which results from an asymptotic expansion, is valid. An approximate yet simple practical choice, regularly implemented, consists of using: g(j) ≡ 0; σj2 ≡ 2/(nj log2 2); aj ≡ σj2 . The numerical simulations proposed in [ABR 98, VEI 99] show that the performance of these choices are very satisfying for the analysis of long-range dependence. Indeed, such a choice implies that the importance in linear regression of yj is twice less than that of yj−1 , which is a point of view a priori realistic for the study of long-range dependence (Gaussianization effect for large scales), but less obvious for the study of local regularities. This choice is all the more delicate as it induces an effect on the bias and the variance of the estimator at the same time; an alternative choice, aj constant, can also be considered. In equation (2.28), from which the study of the scaling law behavior stems, the focus has, until now, been on the exponent α of the power law, since it defines the phenomenon. However, the measure of the multiplicative parameter cf can be fruitful for certain applications. This estimation is detailed in [VEI 99]. Finally, the case of self-similar processes with stationary increments of infinite variance (and/or mean value) will not be tackled here. It is especially developed in [ABR 00a]. 2.5.2. Emphasis on scaling laws and determination of the scaling range The previous section relied on the hypothesis that the quantity E dX (j, k)2 was made on the basis of a power law. This ideal situation is rarely observed for two reasons. On one hand, the real experimental data are likely to be only approximately described by the proposed models of scale invariance. On the other hand, certain models themselves induce only an approximate or asymptotic behavior as a power law – as is the case for long-range dependence or processes with fractal sample paths; in fact, only the self-similar model induces a strictly satisfied power law: LRD
j → +∞ E dX (j, k)2 ∼ cf C 2jα ;
Fractal j → −∞ E dX (j, k)2 ∼ cf C 2jα ; H-ss
∀j
E dX (j, k)2 = cf C 2jα .
Scale Invariance and Wavelets
97
Hence, in the implementation of the estimator described earlier, it is necessary to choose an octave range on which the measure is carried out, i.e., to select the octaves j1 and j2 . Making this choice does not necessarily mean extracting the theoretical values of j1 and j2 from the data, since these are not always defined by the model, but rather optimizes the statistical performance of the estimator. Widening the octave range [j1 , j2 ] implies the use of a higher fraction of the available wavelet coefficients, resulting in a reduction of the estimation variance, as indicated by the above-mentioned relation (2.31); conversely, it can also mean an increase in the estimation bias if we carry out the measure on a range where the behavior is notably different from a power law behavior. The choice of the range is thus guided by the optimization of a bias-variance trade-off. The example of long-range dependence is used: according to the model, we wish to choose j2 = +∞, i.e., in practice, j2 as large as possible, since the maximum limit of the wavelet decomposition is fixed by the number of coefficients of the analysis process and the importance of boundary effects; as for j1 , it is not imposed by the model. Choosing the larger j1 makes it possible to work in a zone where the asymptotic behavior is satisfied and thus means a small estimation bias but a strong variance (this is essentially contained in 2j1 , see equation (2.31), and thus doubles each time we increase j1 of 1). Qualitatively, we are led to tolerate a little bias (i.e., widen the measure range towards the small octaves) to reduce the variance and thus minimize the average quadratic error (AQE = (bias)2 + variance). In the case of the local regularity measure, the situation is different. The model tends to impose small j1 (in practice, j1 = 1) without fixing j2 . As in the preceding case, increasing j2 may induce a gap in the ideal behavior of the bias, but will only slightly reduce the variance: indeed, it is sufficient to study the form of the variance dependence of α ˆ to note the reduced influence of j2 , in accordance with the fact that there exist fewer and fewer wavelet coefficients exist on the coarsest octaves. In practice, we are led to choose the narrowest possible range which is also limited on the lowest octaves. To move towards more quantitative arguments, it is necessary to completely specify the models of the processes studied. We will keep on using the method which imposes the least possible a priori assumptions on the model and consider that it is sufficient to postulate that scale invariance phenomena are present. We resort to the quantity: G(j1 , j2 ) =
j2 (yj − α ˆ j − ˆb).2 σj2 j=j
(2.32)
1
which conveys a usual measure of the mean-square error between data and model. For Gaussian yj , the variable G(j1 , j2 ) follows a Chi-2 law χ2J−2 with J − 2 degrees of freedom. The dependence in j1 , j2 of the quantity: G(j1 ,j2 ) χ2J−2 (u) du Q(j1 , j2 ) = 1 − 0
98
Scaling, Fractals and Wavelets Ŧ5 9
D = 0.55 8
Ŧ10
D = 2.57
cf = 4.7
7
1 d j d 10
Ŧ15
4 d j d 10 6
y
j
y
j
5
Ŧ20 4
3
Ŧ25
2
Ŧ30
1
0
Ŧ35
1
2
3
4
5
6
7
8
9
10
1
2
3
4
5
6
7
8
9
10
Octave j
Octave j
Figure 2.3. Examples of log-scale diagram. Right: second order log-scale diagram for a long-range dependent process, also possessing a highly pronounced correlation structure of short memory type (visible to the small octaves); practically, it refers to ARFIMA (0, d, 2) with d = 0.25 and a second order moving average Ψ(B) = 1 + 2B + B 2 , implying (γ, cf ) = (0.50, 6.38). The vertical error bars for each j carry out confidence intervals at (2) 95% of Yj . A linear behavior is observed between the octaves [j1 , j2 ] = (4, 10), which excludes the small octaves (short range memory) but includes the larger ones. A weighted linear regression enables, in spite of the strong presence of short-term dependencies, a precise estimation of γ: γˆ = 0.53 ± 0.07, cˆf = 6.0 with 4.5 < cˆf < 7.8. Left: second order log-scale diagram for a self-similar process (FBM) of parameter H = 0.8. The linear behavior spreads ˆ = 0.79 over all scales and allows a precise estimation of H: H
makes it possible to work upon the choice of the analysis range. A value of Q close to 1 indicates the adequacy of the model, as opposed to Q close to 0. An approach which consists of examining breaking points in the behavior of Q with j1 , j2 is proposed. 2.5.3. Robustness of the wavelet approach One of the great difficulties in the analysis of scale invariance phenomena is linked to the fact that their qualitative expressions are close to those induced by non-stationarities. For a long time, data modeling by scale invariance has been rejected because it was considered, sometimes correctly, as an artefact due to non-stationarity. The difficulties are of two types: on the one hand, as just mentioned above, identifying scale invariance when it refers, in fact, to non-stationarities; on the other hand, failing to detect scale invariance or to correctly estimate its parameters, when non-stationary effects are superimposed. Wavelet analysis has made it possible to find solutions to these two problems. For the first type of problem, it was proposed [VEI 01] to chop the signal under analysis into L segments which do not overlap. For each segment, we carry out an estimation α ˆ l of the scale invariance parameter. Then we validate the relevance of using a scale invariance model by testing the similarity between blocks of α ˆ l . Hence, it does not refer to a stationary test in the more general sense, but more simply to a
Scale Invariance and Wavelets
99
18
30
16
25
14
20 12
15
yj
10
10
8
5
6
0
4
2
Ŧ5
0
Ŧ10
2000
4000
6000
8000
10000
12000
14000
1
2
3
4
5
30
6
7
8
9
10
11
7
8
9
10
11
Octave j
16000
18
25
16
20
14
12
15
10
D = 0.59
yj
10
8
5 6
0 4
Ŧ5
Ŧ10
2
2000
4000
6000
8000
time
10000
12000
14000
16000
0
1
2
3
4
5
6
Octave j
Figure 2.4. Robustness with respect to superimposed trends. Left: fractional Gaussian noise with H = 0.80 (above) and with sinusoidal and superimposed linear trends (below). Right: log-scale diagrams of signal corrupted by the trends, as computed with a Daubechies 2 wavelet (i.e., N = 2) (above) and Daubechies 6 wavelet (i.e. N = 6) (below). We observe that increasing N cancels the effects of the superimposed trends ˆ = 0.795 and allows for a reliable estimation of H: H
test aimed at detecting abnormally large fluctuations of estimations α ˆ l which leads us to reject the presence of scale invariance. In practice, the properties of the wavelet coefficients (weak statistical dependence among coefficients) and the definition and the theoretical study of the estimator α ˆ , presented earlier, make it possible to conduct this test as the detection of n mean value change within independent Gaussian variables of unknown but identical mean values and of possibly different but known variances [VEI 01]. For the second type of problem, the number of vanishing moments of the wavelet plays a fundamental role. By definition, the wavelet coefficients of a polynomial p(t) of degree P strictly smaller than the number of vanishing moments of the mother wavelet, P < nψ , are exactly zero. This means that if the observed signal Z is made up of a signal to analyze X on which a polynomial trend is superimposed, the wavelet analysis of the scale invariance phenomena that are likely to be present in X will be, given its linear nature, insensitive to the presence of p as soon as
100
Scaling, Fractals and Wavelets
nψ is sufficiently large. In practice, we do not necessarily know a priori the order of the corrupting polynomial, if any; we can thus simply carry out a series of wavelet analyses by making N increase. When these results no longer change with nψ , this is an indication that P has been overtaken. It is noteworthy that this procedure is made fully practicable by the low calculation cost of the discrete wavelet decomposition. Certainly, in practice, the trends superimposed on the data are not polynomial. However, in the case where they possess a sufficiently regular behavior (e.g., quasi-sinusoidal oscillations) or slightly irregular (e.g., in t−β , β > 0), the preceding argument remains valid: when we make nψ increase, the magnitude of the wavelet coefficients of the trend decreases, whereas that of X remains identical; the effect of the trend thus becomes quasi-negligible [ABR 98]. The superposition of a deterministic trend to the process X can be interpreted as a non-stationarity of the mean; this situation can be complex considering that the variance of the process itself evolves. Hence, we can write the observation as: Z(t) = a(t) + b(t)X(t) where X(t) is a process presenting scale invariance under the form of one of the models referred to earlier, (self-similarity, long-range dependence, etc.) and where a(t) and b(t) are sufficiently regular deterministic functions. Thus, it has been shown that the variation in the number of vanishing moments of the mother wavelet makes it possible to overcome the drift effects of a and b and to carry out reliable estimations of the scale invariance parameters associated only with X [ROU 99]. Finally, the analysis of the signal plus noise situation, which is usual in signal processing when we write an observation Z = Y + X (where X is the process with scale invariance to be studied and Y some additive random noise) has been considered in [WOR 96] by maximum likelihood approaches and will not be developed here. 2.6. Conclusion In this chapter, we have focused on a qualitative description rather than a rigorous formalization of the concepts, models and analyses. The main concern was to offer the reader, who is not specialized in this field but is eager to implement, from real data, certain principles of the fractal analysis, some entry points that are as much theoretical as practical. Especially in the first part, emphasis has been put on the relations between the different models used to describe scaling laws, their similarities and differences, so that this notion become accessible. Similarly, in the second part, the presentation of wavelet tools enabling the characterization and analysis of scaling laws has been structured around the different practical aspects essential for their implementation (selection of the scale range, estimation of the parameters, robustness and algorithm). All the technical analysis described here, as well as some extensions and variations, have been put in practice in Matlab, with toolboxes that are freely
Scale Invariance and Wavelets
101
accessible on the websites http://www.ens-lyon.fr/pabry and http://perso.ens-lyon .fr/paulo.goncalves. 2.7. Bibliography [ABR 95] A BRY P., G ONÇALVES P., F LANDRIN P., “Wavelets, spectrum estimation, and 1/f processes”, in A NTONIADIS A., O PPENHEIM G. (Eds.), Wavelets and Statistics, Springer-Verlag, Lecture Notes in Statistics 103, New York, p. 15–30, 1995. [ABR 98] A BRY P., V EITCH D., “Wavelet analysis of long-range dependent traffic”, IEEE Trans. on Info. Theory, vol. 44, no. 1, p. 2–15, 1998. [ABR 00a] A BRY P., P ESQUET-P OPESCU P., TAQQU M.S., “Wavelet based estimators for self similar α-stable processes”, in International Conference on Signal Processing: Sixteenth World Computer Congress (Beijing, China, 2000), August 2000. [ABR 00b] A BRY P., TAQQU M.S., F LANDRIN P., V EITCH D., “Wavelets for the analysis, estimation, and synthesis of scaling data”, in PARK K., W ILLINGER W. (Eds.), Self-similar Network Traffic and Performance Evaluation, John Wiley & Sons, p. 39–88, 2000. [AVE 98] AVERKAMP R., H OUDRÉ C., “Some distributional properties of the continuous wavelet transform of random processes”, IEEE Trans. on Info. Theory, vol. 44, no. 3, p. 1111–1124, 1998. [BER 94] B ERAN J., Statistics for Long-memory Processes, Chapman and Hall, New York, 1994. [CAM 95] C AMBANIS S., H OUDRÉ C., “On the continuous wavelet transform of second-order random processes”, IEEE Trans. on Info. Theory, vol. 41, no. 3, p. 628–642, 1995. [CAS 90] C ASTAING B., G AGNE Y., H OPFINGER E., “Velocity probability density functions of high Reynolds number turbulence”, Physica D, vol. 46, p. 177, 1990. [CAS 96] C ASTAING B., “The temperature of turbulent flows”, Journal de physique II France, vol. 6, p. 105–114, 1996. [DAU 92] DAUBECHIES I., Ten Lectures on Wavelets, SIAM, 1992. [DEL 00] D ELBEKE L., A BRY P., “Stochastic integral representation and properties of the wavelet coefficients of linear fractional stable motion”, Stochastic Processes and their Applications, vol. 86, p. 177–182, 2000. [FLA 92] F LANDRIN P., “Wavelet analysis and synthesis of fractional Brownian motion”, IEEE Trans. on Info. Theory, vol. IT-38, no. 2, p. 910–917, 1992. [FRI 95] F RISCH U., Turbulence: The Legacy of A. Kolmogorov, Cambridge University Press, Cambridge, 1995. [HOL 90] H OLSCHNEIDER M., T CHAMITCHIAN P., “Régularité locale de la fonction non différentiable de Riemann”, in L EMARIÉ P.G. (Ed.), Les ondelettes en 1989, Springer-Verlag, 1990. [JAF 89] JAFFARD S., “Exposants de Hölder en des points donnés et coefficients d’ondelettes”, Comptes rendus de l’Académie des sciences de Paris, vol. 308, 1989.
102
Scaling, Fractals and Wavelets
[LEL 94] L ELAND W.E., TAQQU M.S., W ILLINGER W., W ILSON D.V., “On the self-similar nature of Ethernet traffic (extended version)”, IEEE/ACM Trans. on Networking, vol. 2, p. 1–15, 1994. [MAL 92] M ALLAT S.G., H WANG W.L., “Singularity detection and processing with wavelets”, IEEE Trans. on Info. Theory, vol. 38, no. 2, p. 617–643, 1992. [MAL 98] M ALLAT S.G., A Wavelet Tour of Signal Processing, Academic Press, San Diego, California, 1998. [MAN 68] M ANDELBROT B.B., VAN N ESS J.W., “Fractional Brownian motions, fractional noises, and applications”, SIAM Review, vol. 10, no. 4, p. 422–437, 1968. [MAN 97] M ANDELBROT B.B., Fractals and Scaling in Finance, Springer, New York, 1997. [MAS 93] M ASRY E., “The wavelet transform of stochastic processes with stationary increments and its application to fractional Brownian motion”, IEEE Trans. on Info. Theory, vol. 39, no. 1, p. 260–264, 1993. [PAR 00] PARK K., W ILLINGER W. (Eds.), Self-similar Network Traffic and Performance Evaluation, John Wiley & Sons (Interscience Division), 2000. [PES 99] P ESQUET-P OPESCU B., “Statistical properties of the wavelet decomposition of certain non-Gaussian self-similar processes”, Signal Processing, vol. 75, no. 3, 1999. [ROU 99] ROUGHAN M., V EITCH D., “Measuring long-range dependence under changing traffic conditions”, in IEEE INFOCOM’99 (Manhattan, New York), IEEE Computer Society Press, Los Alamitos, California, p. 1513–1521, March 1999. [SAM 94] S AMORODNITSKY G., TAQQU M.S., Stable Non-Gaussian Random Processes: Stochastic Models with Infinite Variance, Chapman and Hall, New York and London, 1994. [TEI 00] T EICH M., L OWEN S., J OST B., V IBE -R HEYMER K., H ENEGHAN C., “Heart rate variability: measures and models”, Nonlinear Biomedical Signal Processing, vol. II, Dynamic Analysis and Modeling (M. Akay, Ed.), Ch. 6, p. 159–213, IEEE Press, 2001. [TEW 92] T EWFIK A.H., K IM M., “Correlation structure of the discrete wavelet coefficients of fractional Brownian motions”, IEEE Trans. on Info. Theory, vol. IT-38, no. 2, p. 904–909, 1992. [VEI 99] V EITCH D., A BRY P., “A wavelet based joint estimator of the parameters of long-range dependence”, IEEE Transactions on Information Theory (special issue on “Multiscale statistical signal analysis and its applications”), vol. 45, no. 3, p. 878–897, 1999. [VEI 00] V EITCH D., A BRY P., F LANDRIN P., C HAINAIS P., “Infinitely divisible cascade analysis of network traffic data”, in Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (Istanbul, Turkey), June 2000. [VEI 01] V EITCH D., A BRY P., “A statistical test for the constancy of scaling exponents”, IEEE Trans. on Sig. Proc., vol. 49, no. 10, p. 2325–2334, 2001. [WOR 96] W ORNELL G.W., Signal Processing with Fractals – A Wavelet-based Approach, Prentice-Hall, 1996.
Chapter 3
Wavelet Methods for Multifractal Analysis of Functions
3.1. Introduction A large number of signals are very irregular. In the most complex situations, the irregularity manifests itself in different forms and may change its form almost instantaneously. The most widely studied example in physics is the speed signal of a turbulent flow. During the 1980s, precise records of the speed of a turbulent flow were made in the ONERA wind tunnel at Modane (see Gagne et al. [GAG 87]). A thin wire, heated at one point, is placed in the flow of turbulent air; the rate at which the temperature decreases is directly proportional to the orthogonal component along the flow speed at the heated point. We obtain incomplete information as the signal is 1D; however, it is more precise than the best numerical simulations being carried out currently. The study of this signal showed that the signal recorded does not have statistical homogenity; its regularity varies a lot from one point to another [ARN 95b, FRI 95]. Such signals cannot be modeled by processes such as fractional Brownian motion, for example. The techniques of multifractal analysis were developed in order to model and analyze such behaviors. Originally introduced for fully developed turbulence, these techniques began to be used, within a few years, in various scientific fields: traffic analysis (road and Internet traffic) [ABR 98, LEV 97, TAQ 97, WIL 96, WIL 00], modeling of economic signals [MAN 97], texture analysis [BIS 98], electrocardiograms [AMA 00], etc. The mathematical theory of multifractal functions has expanded considerably: not only were numerous heuristic arguments used to numerically analyze the multifractal Chapter written by Stéphane JAFFARD.
104
Scaling, Fractals and Wavelets
signals studied and justified under certain assumptions within a context of limited validity but, most importantly, these mathematical results had various consequences on applications; they led researchers to introduce new tools that made it possible to refine and enrich the techniques of multifractal analysis. Inside mathematics, multifractal analysis acquired an extremely original position, since functions taken from very different domains prove to be multifractal: – in probabilities, the sample paths of the Lévy process [JAF 99]; – in analytical theory of numbers, trigonometric series linked to theta functions [JAF 96a]; – in analysis, the geometric constructions like “Peano functions” [JAF 97c, JAF NIC]; – in arithmetic, functions where diophantine approximation properties play a role [JAF 97b]; – etc. Multifractal analysis thus provides us with a vocabulary and a cluster of methods that help establish connections and find analogies between diverse fields of science and mathematics. Almost immediately since their appearance, wavelet analysis techniques were applied to multifractal analysis of signals by Arneodo and his team of CRPP of Bordeaux [ARN 95b]. These techniques were seen to be extremely powerful for mathematical analysis of problems, as well as for the construction of robust numerical algorithms. In this chapter, we explain certain fundamental results related to the wavelet methods of multifractal analysis in detail. We also briefly describe the vast scientific panorama of recent times and conclude with a review of specialized articles. In order to facilitate the presentation and the notations, all results will be set out in dimension l. Most of the results can easily be applied to functions defined in d (see [JAF 04b, JAF 06]). The reader can find a more detailed presentation of the different fields of application of multifractal analysis in [ARN 95a, JAF 01a, MAN 98, LAS 08, WEN 07]. 3.2. General points regarding multifractal functions 3.2.1. Important definitions Multifractal functions help in modeling signals whose regularity varies from one point to another. Thus, the first problem is to mathematically define a function’s regularity at every point. What is “pointwise regularity”? It is a way of quantifying,
Wavelet Methods for Multifractal Analysis of Functions
105
with the help of a positive real number α, the fact that the graph of a function is generally rough at a given point x0 (the picture is not simply superficial; the concepts we introduce have, in fact, been used in rough symmetry [DUB 89, TRI 97]). The Hölder regularity generalizes familiar concepts: the “minimum level” of regularity is continuity. A function f is continuous in x0 if we have |f (x) − f (x0 )| → 0 when x → x0 ; the continuity will correspond to a regularity index α = 0. Similarly, f is differentiable at x0 if there is a function P such that |f (x) − P (x − x0 )| → 0 which is faster than |x − x0 | when x → x0 ; the derivability will correspond to a regularity index α = 1. The following definition is a direct generalization of these two cases. DEFINITION 3.1.– Let α be a positive real number and x0 ∈ ; a function f : → is C α (x0 ) if there exists a polynomial P of degree less than α such that: |f (x) − P (x − x0 )| C|x − x0 |α
(3.1)
NOTE 3.1.– The polynomial P is unique (if Q was acceptable as well, by applying Definition 3.1 with P and Q, we would have |P (x) − Q(x)| C|x|α and, since P − Q is of a maximum degree [α], we would have P − Q = 0). The constant in the polynomial P (x − x0 ) is always f (x0 ); similarly, the first degree term, if present, is always (x − x0 )f (x0 ) (by the derivative’s definition). Also, if f is C [α] (x0 ) close to x0 , the comment that we just made concerning the uniqueness of P implies that the polynomial P is Taylor’s expansion of f in x0 of order α. However, as equation (3.1) can take place for large values of α, without f being twice more differentiable in x0 (in which case Taylor’s expansion stops at similar term), we will consider, for example, the “chirp” xn sin(x−n ) in 0 for big n (which does not have a second derivative in 0 since the first derivative is not continuous). We see that P gives a generalization of the notion of Taylor’s expansion (see also Chapter 4; we will consult [GIN 00, MEY 98] for extensions of these concepts in more general contexts). We will finally note that equation (3.1) implies that f is bounded in the vicinity of x0 ; therefore, we suppose that the functions we look at are locally bounded (see [MEY 98], where the exponent factor is introduced, which makes it possible to define a weak notion of Hölder regularity for functions that are not a priori locally bounded – and the same for distributions). DEFINITION 3.2.– The Hölder exponent of f in x0 is: hf (x0 ) = sup{α : f is C α (x0 )} The Hölder exponent is a function that is defined point by point and describes local variations of the regularity of f . Certain functions have a constant Hölder exponent. Thus, the Weierstrass series: b−Hj sin(bj x) Wb,H (x) = j
106
Scaling, Fractals and Wavelets
has a Hölder exponent that is constant and equal to H; similarly, the sample paths of the Brownian motion verify with near certainty that hB (x) = 12 for all x. In a more general manner, a fractional Brownian motion of exponent H has at every point a Hölder exponent equal to H. Such functions are irregular everywhere. Our objective is to study functions whose Hölder exponent can jump from one point to another. In such a situation, the numerical calculation of functions hf (x0 ) is completely unstable and of little significance. We are rather trying to extract less precise information: whether or not the function hf takes a certain given value H and, if it does, what is the size of the sets of points where hf takes on this value? Here we are faced with a new problem: what is the “right” notion of “size” in this context? We will not be able to fully justify the answer to this question because it is a result of the study of numerous mathematical examples. Let us just keep in mind that the term “size” does not signify “Lebesgue measure” because, in general, there exists a Hölder exponent that is the “most probable” and that appears almost everywhere. The other exponents thus appear on all zero sets and the “Lebesgue measure” does not make it possible to differentiate them. Besides, the “right” notion of size cannot be the box dimension because these sets are usually dense. In fact, we expect them to be fractal. A traditional mathematical method to measure the size of such dense sets of zero measure is to calculate their Hausdorff dimension. Let us recall its definition. DEFINITION 3.3.– Let A be a subset of . For ε > 0, let us note: εdi Mεd = inf R
i
where R signifies a generic covering of A by intervals ]xi , xi + εi [ of a length εi ε. The operator inf is thus taken on all these coverings. For all d ∈ [0, 1], Hausdorff d-dimensional measure of A is: mes d (A) = lim M d →0
This measure takes on a value of +∞ or 0 except for, at the most, a value of d and the Hausdorff dimension of A is: dim(A) = sup d: lim M d = +∞ = inf d: lim M d = 0 →0
→0
DEFINITION 3.4.– Let f be a function and H 0. If H is a value taken by function hf (x), let us note by EH the set of points x where we have hf (x) = H. Therefore, the singularity spectrum (or Hölder spectrum) of the signal being studied is: fH (H) = dim(EH ) (we use the convention fH (H) = −∞ if H is not a Hölder exponent of f ).
Wavelet Methods for Multifractal Analysis of Functions
107
The concept of multifractal function is not precisely defined (just like the concept of a fractal set). For us, a multifractal function is a function whose spectrum of singularities is “non-trivial”, i.e. unreduced to a point. In the examples, fH (H) takes on positive values on an entire interval [Hmin , Hmax ]. Its assessment thus requires a study of an infinity of fractal sets EH , hence the term “multifractal”. 3.2.2. Wavelets and pointwise regularity For many reasons that we shall gradually discover, wavelet methods of analysis are a favorite tool for studying multifractal functions. The first reason is that we have a simple criteria that allows us to characterize the value of the Hölder exponent by a decay condition of a given function’s wavelet coefficients. Let us begin by recapitulating certain points related to the wavelet analysis methods. An orthonormal base of wavelets of L2 () has a particularly simple algorithmic form: we start from a function ψ (the “mother” wavelet) that is regular and well-localized; the technical assumptions are: ∀i = 0, . . . , N,
|ψ (i) (x)|
∀m ∈ N,
C(i, m) (1 + |x|)m
for a relatively big N . We can choose such functions as ψ, such that, moreover, the translation-dilation of ψ: ψj,k (x) = 2j/2 ψ(2j x − k),
j, k ∈ Z
form an orthonormal base of L2 () (see [MEY 90]) (we will choose N to be bigger than the maximum regularity that we expect to find in the signal analyzed; we can also take N = +∞ and the wavelet ψ will thus belong to the Schwartz class). We verify that the wavelet has a corresponding number of zero moments: ψ(x)xi dx = 0 ∀i = 0, . . . , N,
Thus, every f ∈ L2 () function can be written as: f (x) =
ef (k, j) ψ(2j x − k)
j∈Z k∈Z
where df (k, j) are the wavelet coefficients of f : f (t)ψ(2j t − k) dt ef (k, j) = 2j
108
Scaling, Fractals and Wavelets
We should note that we do not choose an L2 normalization for the wavelet coefficients, but an L1 normalization which is better adapted to the study of Hölder regularity. Let us first study the characterization by wavelets of uniform regularity. Let us begin by defining it. A function f belongs to C α () if condition (3.1) takes place for all x0 , with the possibility of choosing C uniformly, i.e. independently of x0 . If we have α < 1, taking into account that P (x − x0 ) = f (x0 ), this condition can be rewritten as: f ∈ C α () ⇐⇒ ∀x, y ∈ |f (x) − f (y)| C|x − y|α This condition of uniform regularity is characterized by a condition of uniform decay of the wavelet coefficients of f (see [MEY 90]). PROPOSITION 3.1.– If α ∈ ]0, 1[ we thus have the following characterization: f ∈ C α () ⇐⇒ ∃C > 0 :
∀j, k
|df (k, j)| C 2−αj
Proof. Let us assume that f ∈ C α (); then, ∀j, k, we have: j j j f (x)ψ(2 x − k) dx = 2 f (x) − f (k 2−j ) ψ(2j x − k) dx ef (k, j) = 2 (because the wavelets are of zero integral) thus: C |ef (k, j)| C 2j |x − k 2−j |α dx C 2−αj (1 + |2j x − k|)2 (by the change of variable t = 2j x − k). Let us now prove the converse. Let us assume that we have |ef (k, j)| C 2−αj . Let j0 be defined by 2−j0 −1 |x − x0 | < 2−j0 and note that: fj (x) =
ef (k, j) ψ(2j x − k)
k
From the localization assumption of ψ, we deduce that we obtain: |fj (x)| C
k
2−αj C 2−αj (1 + |2j x − k|)2
Wavelet Methods for Multifractal Analysis of Functions
109
and similarly, using the localization of ψ , we deduce that we have |fj (x)| C 2(1−α)j . We obtain: |f (x) − f (x0 )| |fj (x) − fj (x0 )| + |fj (x)| + |fj (x0 )| j>j0
jj0
j>j0
Using finite increments, the first term is bounded by: |x − x0 | sup |fj (t)| C|x − x0 | 2(1−α)j C|x − x0 |2(1−α)j0 C jj0
[x,x0 ]
jj0
(because we have α < 1). Coming back to the definition of j0 , we see that the first term is bounded by C|x − x0 |α . The second and the third terms are bounded by: 2−αj C 2−αj0 C|x − x0 |α j>j0
thus, the converse estimate holds. The reader will easily be able to extend this result to a case where we have α > 1 and α ∈ N; see [MEY 90] in the case of α ∈ N. If a function f belongs to one of the C α spaces, for α > 0, we will say that f is uniformly Hölderian. The following theorem is similar to Proposition 3.1, but gives a result of pointwise regularity. THEOREM 3.1.– Let α ∈ ]0, 1[. If f is C α (x0 ), then we have: |ef (k, j)| C 2−αj (1 + |2j x0 − k|α )
(3.2)
conversely, if the wavelet coefficients of f verify (3.2) and if f is uniformly Hölderian, then, if we have |x − x0 | 1, we obtain: 2 α (3.3) |f (x) − f (x0 )| C|x − x0 | log |x − x0 | (f is “nearly” C α (x0 ) if we make a small logarithmic correction). Proof. Let us assume that f is C α (x0 ); then we have: j j j f (x)ψ(2 x − k) dx = 2 f (x) − f (x0 ) ψ(2j x − k) dx df (k, j) = 2 (because the wavelets are of zero integral). Thus, |df (k, j)| is bounded by: C|x − x0 |α |x − k 2−j |α + |k 2−j − x0 |α j j 2 dx 2 C 2 dx (1 + |2j x − k|)2 (1 + |2j x − k|)2
110
Scaling, Fractals and Wavelets
(because, for a, b > 0, we have (a + b)α 2aα + 2bα ). By once again changing the variable t = 2j x − k, we obtain |df (k, j)| C 2−αj (1 + |2j x0 − k|α ). Let us now prove the converse. Assuming that there exists an > 0 such that f ∈ C (), let j0 and j1 be defined by: 2−j0 −1 |x − x0 | < 2−j0
and
j1 =
α j0
From (3.2), we deduce that, for all x, we have: |ef (k, j)| C(2−αj + |x0 − k 2−j |α ) 2 C(2−αj + |x − x0 |α + |x − k 2−j |α ) and thus: |fj (x)| C
2−αj + |x − x0 |α + 2−αj |2j x − k|α (1 + |2j x − k|)2 k
We have: k
1 C (1 + |2j x − k|)2
and similarly: k
1 C (1 + |2j x − k|)2−α
because we have α < 1. We get: |fj (x)| C 2−αj + |x − x0 |α C 2−αj (1 + 2αj |x − x0 |α ) Using the localization of ψ , we obtain, in the same manner: |fj (x)| C 2(1−α)j (1 + 2αj |x − x0 |α ) in particular, if we have j j0 , we obtain |fj (x)| C2(1−α)j . We can write: |f (x) − f (x0 )|
jj0
|fj (x) − fj (x0 )| +
j>j0
|fj (x)| +
j>j0
|fj (x0 )|
(3.4)
Wavelet Methods for Multifractal Analysis of Functions
111
The first term is bounded by:
C|x − x0 |
sup |fj (t)| C|x − x0 |
jj0 [x,x0 ]
2(1−α)j
jj0
C|x − x0 |2(1−α)j0 C|x − x0 |α As far as the second term is concerned, if we have j > j0 , bound (3.4) becomes |fj (x)| C|x − x0 |α and we thus have:
|fj (x)|
j0 <j<j1
|x − x0 |α
j0 <j<j1
C(j1 − j0 )|x − x0 |α C|x − x0 |α log
2 |x − x0 |
because we have j1 − j0 (α/)j0 . Moreover, since f belongs to C (), we have:
|fj (x)|
jj1
2− j C 2− j1 C|x − x0 |α
jj1
(because of the choice of j1 ). As far as the third term is concerned, the bound (3.4) in x = x0 becomes |fj (x0 )| C 2−αj and thus:
|fj (x)|
j>j0
2−αj C|x − x0 |α
jj0
and the converse is proved in Theorem 3.1. Here again, the reader will easily be able to extend this result to exponents α > 1 (see [JAF 91]). As far as the second part of the theorem is concerned, we sometimes use the slight variation below. PROPOSITION 3.2.– Let α ∈ ]0, 1[. If the wavelet coefficients of f verify:
|df (k, j)| C 2−αj (1 + |2j x0 − k|α ) for an α < α, then f is C α (x0 ).
(3.5)
112
Scaling, Fractals and Wavelets
The demonstration of this proposition is very similar to that of the second part of the theorem and therefore we only outline it. We show this time that (3.5) implies that:
|fj (x)| C 2−αj (1 + 2α j |x − x0 |α ) and:
|fj (x)| C 2(1−α)j (1 + 2α j |x − x0 |α ) and we finish by using the estimation on fj for j j0 and on fj for j j0 . This proposition also extends to the case where α > 1 (see [JAF 91]). NOTE 3.2.– We could get the impression that Proposition 3.2, contrary to Theorem 3.1, does not make the assumption of global regularity. This does not matter because, if (3.5) is verified and if we have |x − x0 | 1, then we can deduce that |df (k, j)| C 2−(α−α )j , i.e. that we have f ∈ C α−α uniformly close to x0 . We could nevertheless refer to [JAF 00c] where pointwise Hölder estimations are obtained from wavelet coefficients, in the absence of any assumption of uniform regularity. From the theorem, we can immediately deduce the following corollary that characterizes the Hölder exponent by local decay of wavelet coefficients. COROLLARY 3.1.– If f is uniformly Hölder, the Hölder exponent of f at every point x0 is given by: log(|df (k, j)|) (3.6) hf (x0 ) = lim inf inf j→∞ k log(2−j + |k 2−j − x0 |) This corollary is used in all mathematical results where the singularity spectrum of a function f is derived from its wavelet coefficients. On the other hand, it can be used to obtain the numerical value of hf (x) only if function hf is constant, or if it varies slightly with the discretization scale of the signal. 3.2.3. Local oscillations Using the Hölder exponent as a means to measure pointwise regularity makes it a powerful signal and image analysis tool (see [DAO 95, LEV 95]). However, there are many disadvantages in characterizing the pointwise regularity by using only the Hölder exponent: – inability to measure the oscillatory character of the local behavior of f close to x0 ; – lack of stability while using “traditional” operators, such as differential operators, pseudo-differential operators or the Hilbert H transform (convolution
Wavelet Methods for Multifractal Analysis of Functions
113
operator with one dimension as x1 , which allows the transfer of the real signal to the associated analytic signal during signal analysis). So, we can create, for instance, functions f locally bounded with f ∈ C α (x0 ) and Hf ∈ C β (x0 ) for any β > 0 (see [JAF 91]).
spaces will provide a substitute to the pointwise regularity The 2-microlocal Cxs,s 0 notion which does not have any disadvantages; moreover, they are defined even if f is not locally bounded. We have already seen this condition in (3.5).
space if its DEFINITION 3.5.– An F distribution belongs to the 2-microlocal Cxs,s 0 wavelet coefficients satisfy, for sufficiently small |x0 − k 2−j |:
∃C > 0 |ef (k, j)| C 2−j(s+s ) (2−j + |k 2−j − x0 |)−s
(3.7)
This condition does not depend on the chosen wavelet base (sufficiently regular) (see [JAF 91]). DEFINITION 3.6.– The 2-microlocal domain of f in x0 , noted by E(f (x0 )) is the set of couples (s, s ) such that f ∈ C s,s (x0 ). By interpolation between conditions (3.7), we find that the 2-microlocal domain is a convex set and, moreover, by using the trivial lower bound 2−j +|k 2−j −x0 | 2−j , we can see that its boundary is a curve whose slope is everywhere larger than −1. Conversely, we can check that these conditions characterize the boundaries of the 2-microlocal domains in one point x0 (see [GUI 98, MEY 98]) (a more difficult and unresolved issue is to determine which are the compatibility conditions between the different 2-microlocal domains E(f (x)) of a function when the point x varies). The 2-microlocal domain provides very accurate information on the behavior of f close to x0 ; specifically, we can derive the Hölder exponent of primitives from it or fractional derivatives of f (see [ARN 97]). However, this complete information is superfluous when put into practice. In fact, we need to preserve few parameters – and not a convex function at each point (the boundary of the 2-microlocal domain): at least the Hölder exponent, and often a second parameter β, which will measure the oscillating character of f close to x0 . In fact, a similar Hölder exponent can, at any given point, show very different behavior, such as “cusps” |x − x0 |H or, on the contrary, high oscillating functions, such as “chirps”: 1 (3.8) gH,β (x) = |x − x0 |H sin |x − x0 |β for β > 0. In signal processing, the chirp notion models functions whose instantaneous frequency increases rapidly at a given time (see [CHA 99]). The β exponent measures the speed at which the instant frequency (3.8) diverges at x0 .
114
Scaling, Fractals and Wavelets
An additional motivation for the analysis of chirps is that this type of behavior incurs a failure in the first versions of the multifractal formalism, as we will see in section 3.4. We presently have two possible mathematical definitions for exponent β in a general framework. Of course, both result in the same value for functions (3.8). We will define them and discuss their respective advantages with regard to the analysis of the signal. Let f be locally bounded and let us note by f (−n) a primitive n times iterated from f . As shown by a sequence of integrations by parts, a consequence of the oscillations (−n) of (3.8) close to x0 is that gH,β belongs to C H+n(β+1) (x0 ) (the increase of the Hölder exponent in x0 is not 1 at each integration, as expected for an arbitrary function, but β + 1). This observation has led to the following definition given by Meyer [JAF 96b]. DEFINITION 3.7.– Let H 0 and β > 0. A function f ∈ L∞ () is a chirp (n) of type (H, β) in x0 if, for any n 0, f can be written as f = gn , with H+n(1+β) (x0 ). gn ∈ C The chirp type can be derived from the 2-microlocal domain in x0 . In fact, if we have f ∈ C H (x0 ), we also have f ∈ C H,−H (x0 ), etc. and, in general, (n) the condition f = gn with gn ∈ C H+n(1+β) (x0 ) implies that we have f ∈ H+nβ,−H−n(β+1) (x0 ). The other characterization of chirps, given below (see C [JAF 96b]), shows that their definition reflects appropriately the oscillatory phenomenon present in functions gH,β . PROPOSITION 3.3.– A function f ∈ L∞ is a chirp of type (H, β) in x0 if and only if a function r(x), C ∞ exists close to x0 and > 0 such that, if we have 0 < x < , then: f (x) = r(x − x0 ) + (x − x0 )H g+ (x − x0 )−β and if we have − < x < 0, then: f (x) = r(x − x0 ) + |x − x0 |H g− |x − x0 |−β since functions g+ and g− are permanently oscillating, i.e. they have bounded primitives of any order. It is very easy to check that the interior of the set of points (f, β) such that f is a chirp of the type (H, β) in x0 is always of the form H < hf (x0 ), β < βf (x0 ) [JAF 00a]. The positive number βf (x0 ) is called the chirp exponent in x0 .
Wavelet Methods for Multifractal Analysis of Functions
115
A highly oscillating local behavior such as (3.8) is notable and it was believed for a long time that this could be observed only in isolated points. This is why the Meyer result, proving that the Riemann series n−2 sin(πn2 x) has a dense set of chirps of type ( 32 , 1), was unexpected (see [JAF 96b, JAF 01a]). Since then, we know how to generate functions with chirps almost everywhere (see [JAF 00a]). Definition 3.7 has not been adapted to the analysis of signals. In fact, we will see that it is not stable by adding an arbitrary regular function (but not C ∞ ). We will also introduce another definition of the local exponent to measure the oscillation without this disadvantage. Let us consider the following example. Let B(x) be a Brownian motion and: 1 1/3 + B(x) (3.9) C(x) = x sin x The Hölder exponent of B(x) being everywhere 12 , the largest singularity at 0 is the chirp x1/3 sin( x1 ) and we effectively observe this oscillating behavior in the graph of C(x) expanded around 0. However, after integration, the random term becomes paramount (in fact, an integration by parts shows that the first term is O(|x|7/3 ) whereas a Brownian primitive has the Hölder exponent hC (x) = 32 everywhere); the oscillating behavior then disappears on the primitive. The oscillations of the graph of C(x), which do not exist in the primitive, are not taken into account in Definition 3.7: we see that C(x) is not a chirp at 0 (it is actually a chirp of exponents ( 13 , 0)). Nevertheless, this big oscillating behavior should be taken into account in the chirp exponent: C(x) “should” have exponents ( 13 , 1). Let us now show how to define an oscillating exponent taking the value β for (3.8), which would not be changed by adding a “regular noise” (it will take value 1 for C(x) and no longer 0). It is obvious, from the previous example, that the oscillating exponent should not be determined by taking into account primitives of F . An “infinitesimal” fractional integration should be used so that the order of importance of terms in (3.9) is not disturbed. Let ht (x0 ) be the Hölder exponent of a fractional integral of order t of the function f at x0 . To be more precise, if f is a locally bounded function, let us note by ht (x0 ) the Hölder exponent in x0 of: I t (f ) = (Id − Δ)−t/2 (φf )
(3.10)
where φ is a C ∞ function with compact support such that φ(x0 ) = 1 and (Id −Δ)−t/2 is the convolution operator which, in Fourier, is none other than multiplication by (1 + |ξ|2 )−t/2 (see Chapter 7 for more details). In the example of function C(x), we find, where x0 = 0, ht (x0 ) = 13 + 2t after a fractional integration of quite small order t, i.e., 13 + 2t < 12 + t. In this example, the increase of the Hölder exponent in x0 after a fractional integration of a quite small order t is 2t; we can therefore recover the β
116
Scaling, Fractals and Wavelets
oscillating exponent in this way. In general, the function t → ht (x0 ) is concave, so much so that its derivative at the right exists in 0 (with the possible value +∞). The following definition is derived from it. DEFINITION 3.8.– Let f : d → be a locally bounded function. The oscillating exponent of f in x0 is: ∂ −1 (3.11) β = ht (x0 ) ∂t t=0 This exponent belongs to [0, +∞]. We will notice that if we have ht (x0 ) = +∞, then β is not defined. The following proposition, extracted from [AUB 99], shows that this definition appropriately reflects the oscillating phenomenon present in (3.8). PROPOSITION 3.4.– If f is uniformly Hölderian, ∀H < h(x0 ) and ∀β < β(x0 ), f can be written as: 1 H + r(x) f (x) = |x − x0 | g |x − x0 |β with r(x) ∈ C α (x0 ) for a α > h(x0 ) and g is infinitely oscillating. 3.2.4. Complements We discuss here the known results relating to the construction of functions having a prescribed Hölder exponent (and possibly oscillating). This problem was first encountered in the speech simulation context. A speech signal possesses a Hölder exponent which varies drastically (particularly in the case of consonants) and this led to the idea of efficiently storing such a signal by keeping only the information contained in the Hölder exponent. The following theorem characterizes functions which are Hölder exponents (see [AYA 08]). THEOREM 3.2.– A positive function h(x) is the Hölder exponent of a bounded function f if and only if h can be written as lim inf of a sequence of continuous functions. When h(x) has a minimal Hölder regularity, a natural construction is provided by the multifractional Brownian motion (see [BEN 97, PEL 96]). Contrary to the previous result, a couple of functions (h(x), β(x)) should verify very specific conditions to be a couple (Hölder exponent, chirp exponent): β(x) should vanish on a dense set [GUI 98]. We also have a constructive result: if this condition is satisfied, we can prescribe this couple almost everywhere (see [JAF 05]) but unfortunately we do not know how to characterize couples which are couples of the form (Hölder exponent, chirp exponent).
Wavelet Methods for Multifractal Analysis of Functions
117
3.3. Random multifractal processes We present two examples of random multifractal processes. The first is the Lévy process. Their importance is derived, on the one hand, from the central place these processes have in determining the probability factor and, on the other hand, their increasing importance for physics or financial modeling, particularly in situations where Gaussian models are inadequate (see, for instance, [MAN 97, SCH 95] and Chapter 5 and Chapter 6). Our second example involves random wavelet series, that is, processes whose wavelet coefficients are independent and, at a particular scale, have the same laws (these laws can be set in an arbitrary manner at each scale). In addition to the intrinsic advantages of this model, it also makes it possible to introduce new concepts and it enriches the possible variants of the multifractal formalism. 3.3.1. Lévy processes A Lévy process Xt (t 0) with values in can be defined as a process with stationary independent expansions: Xt+s − Xt is independent on (Xv )0vt and has the same law as Xs . The function that characterizes the Lévy process is written as E(eiλXt ) = e−tφ(λ) , where: 2 1 − eiλx + iλx1|x|<1 π(dx) (3.12) φ(λ) = iaλ + Cλ +
where π(dx) is the Lévy measure of Xt , i.e. a positive Radon measure on − {0} verifying: inf(1, |x|2 ) π(dx) < ∞ (3.13) Or:
Cj =
2−j |x|2·2−j
π(dx)
We can measure the size of π close to the origin with the help of the “inferior” exponent of Blumenthal and Getoor, defined by: log Cj α = inf α 0 : |x| π(dx) < ∞ α = sup 0, lim sup j→∞ j log 2 |x|1 This exponent satisfies that 0 α 2 and, if Xt is a stable Lévy process, coincides with the stability index.
118
Scaling, Fractals and Wavelets
In general, the Lévy process has a dense set of discontinuities (it seems to skip everywhere!) and can be written as a superposition of particularly simple processes: the compensated composed Poisson processes, whose sample paths are piecewise linear with jumps. The sample paths of a composed Poisson process of Lévy measure π(dx) (which is finite) are generated in the following manner: X(t) remains at zero for 0 #t < t1 ; t1 is the time of the first jump, whose law is exponential of intensity C = π(dx) (i.e., the law of the first jump has Ce−Ct as density). In t1 , the process jumps, the amplitude of the jump is an independent random variable of the value taken by t1 and the probability measure of the jump is π(dx)/C. Then, we start again: if t2 is the second jump time, t2 − t1 does not depend on t1 and the jump value in t1 ; the t2 − t1 law is the same as t1 and, finally, the jump in t2 has the same law as in t1 and does not depend on the previous choices, etc. Thus, we can generate a process X(t) with independent stationary increments# which is piecewise constant; this is a composed Poisson process. If we have D = xπ(dx), the expectation of Y (t) = X(t) − Dt is zero for all t; this is the compensated composed Poisson process with measure π(dx). If a Lévy measure π(dx) verifies (3.13), then each measure π0 (dx) = 1|x|<1 π(dx) and πj (dx) = 12−j |x|2·2−j π(dx) is bounded and is the Lévy measure of a compensated composed Poisson process Xj (t). The Lévy process (without Brownian component) associated with π is then: X(t) =
+∞
Xj (t)
j=0
If X(t) has a Brownian component, we can add to the process that we have just constructed a Brownian motion originating from 0. The following theorem confirms that the Lévy process sample paths are multifractal. THEOREM 3.3.– Let Xt be a Lévy process without the Brownian component (C = 0 in (3.12)) which verifies: & (3.14) β > 0 and 2−j Cj log(1 + Cj ) < ∞ With probability 1, the singularity spectrum of Xt is: αH if we have H ∈ [0, 1/α] dα (H) = −∞ otherwise NOTE 3.3.– We notice that: – conditions (3.14) are verified as soon as we have 0 < α < 2 and particularly for all stable Lévy processes;
Wavelet Methods for Multifractal Analysis of Functions
119
– Theorem 3.3 can seem to infer the reverse of what we expect: we normally think of multifractal functions as functions whose “behavior” can vary greatly, which seems to contradict the stationary expansion hypothesis of Lévy processes. The paradox disappears if we remember that this hypothesis is related to the law of a Lévy process and thus it authorizes one particular trajectory presenting such “changes of behavior”. Thus, the Eh sets are, for a given trajectory, remarkable points, but the law of Eh remains invariant per translation; – the assertion in Theorem 3.3 is stronger than when we simply state that fH (H) has a given value with probability 1, which would not be enough in order to determine the singularity spectrum of almost any trajectory; – the Hölder exponent of a Lévy process without the Brownian component is α almost everywhere (see [PRU 81]), which, of course, complies with Theorem 3.3 (case H = α); – a lot of work has been devoted to the fractal nature of the set of values taken by a Lévy process (see [MAN 95] for “Lévy flights” and [BER 96] and its references). Let us now outline the demonstration of the theorem. The Lévy process Xt is written as the sum of a composed Poisson process of Lévy measure 1|x|>1 π(dx) (and we can “forget” this first process whose addition does not modify the spectrum) and the series j0 Xtj where Xtj are compensated and independent composed Poisson processes with Lévy measure πj (dx) = 12−j |x|<2·2−j π(dx) (the amplitude of jumps of Xj is therefore of the order of magnitude of Z −j ). The following comment enables us to bound the regularity of Xt . Let f be a function having a dense set of discontinuities and in each discontinuity having a limit on the right and on the left. Let rn be a sequence of discontinuities of f converging towards x. Then, the Hölder exponent of f verifies that: hX (x) lim inf
n→+∞
log|f (rn+ ) − f (rn− )| . log|rn − x|
(3.15)
The proof follows from the simple comment that one of the two numbers |f (x) − f (rn+ )| or |f (x) − f (rn− )| should be larger than |f (rn+ ) − f (rn− )|/2 due to the triangular inequality. The Lévy processes are of this type and thus we can apply this comment to them. We note by Ajδ the union of intervals of diameter 2−δj centered at the jumps of Xtj and Eδ = lim sup Ajδ ; if we have t ∈ Eδ there is a sequence jn of integers and instants tjn such that: |tjn − t| 2−δjn
and
|Xt+ − Xt− | 2−jn jn
jn
By applying (3.15), we then obtain: hX (t) 1/δ
(3.16)
120
Scaling, Fractals and Wavelets
We check that the accumulation of jumps close to t is the only cause of irregularity in a Lévy process and that we actually have the equality in (3.16) (see [JAF 99]). The proof of the theorem therefore consists of showing that the dimension of Eδ is α/δ if we have α δ. The dimension bound of Eδ is immediate: the average number of jumps of Xtj on [0, 1] is Cj and thus, it is almost sure that all Xtj have less than 2Cj jumps starting from a certain rank. Hence, Ajδ covered by 2Cj intervals of diameter 2−δj . The bound dimension bound is derived from it. The upper dimension bound is carried out using a traditional technique, by creating particular measures which “load” the Eδ sets as uniformly as possible (see [JAF 99]). We will also refer to [JAF 99] if Xt has a Brownian component. 3.3.2. Burgers’ equation and Brownian motion Multifractal analysis has been introduced within the frame of turbulence; however, mathematical equations governing the speed evolution of a turbulent flow (Navier-Stokes equations within the limit where the viscosity tends towards 0) are at present very little understood and hope for a mathematical result regarding the multifractal nature of the solutions is not in sight. At the most, we can anticipate precise results only for more simple non-linear evolution equations, hoping that they retain some physical characteristics of the fluid evolution. The simplest equation proposed for this objective is Burgers’ equation: ∂ u2 ∂2u ∂u + = ν 2 , x ∈ , t ∈ + ∂t ∂x 2 ∂t
(3.17)
which we shall consider within the limit of small viscosities ν → 0. One reason to study this equation is that we have explicit formulae providing the solution u(x, t) at any given moment t > 0 according to the initial condition u0 (x) = u(x, 0); in fact, if U is a primitive of u, then U verifies: 2 1 ∂U ∂2U ∂U + =ν 2 ∂t 2 ∂x ∂x Then, we carry out the Cole-Hopf transformation, which consists of supposing that φ = e−U/2ν ; φ then verifies the equation of the linear heat, which we explicitly resolve, hence the expression of U . By passing to the limit within this expression when ν → 0, we obtain using a standard technique (Laplace method) the following result (see [EVA 98] for the details of the calculations). Let us suppose, to simplify things, that the initial condition u0 is zero on ] − ∞, 0[ and that we have u0 (s) + s 0 for s large enough. To have an idea, let us have a look
Wavelet Methods for Multifractal Analysis of Functions
121
at the solution at this instant: t = 1. First, we consider for each x 0, the function of the variable s 0: s u0 (r) + r − x dr Fx (s) = 0
and we note by a(x, 1) the largest point where the minimum is attained: a(x, 1) = max{s 0 : Fx (s) Fx (s ); for all s }
(3.18)
The limit solution when ν → 0 with the time t = 1 is then given by: u(x, 1) = x − a(x, 1)
(3.19)
Formula (3.18) shows that the random process a(x, 1) obtained when the initial condition is a Brownian motion on + (and zero on − ) is a subordinator. We call a subordinator an increasing Lévy process (σx , x 0). A traditional example, which plays an important role later, is that of the first times of passage of a Brownian motion with derivative. More specifically, let us consider a real standard Brownian motion (Bs , s 0), let us note Xs = Bs + s and let us introduce for x 0: τx = inf{s 0, Xs > x} Because τx is a stopping time and since Xτx = x, the strong Markov property applied to the Brownian motion implies that the process Xs = Xs+τx − x is also a Brownian motion with derivative, which is independent of the portion of the trajectory before τx , (Xr , 0 r τx ). For any z ∈ [0, x], the first time of passage τz clearly only depends on the trajectory before τx and hence (Xs , s 0) is independent of (τz , 0 z x). The identification: τx+y − τx = inf{s 0, Xs+τx > x + y} = inf{s 0, Xs > y} then highlights the independence and the homogenity of the incrementation of τ , which is hence a subordinator. Close arguments apply to the increments of function (3.18) for Burgers’ equation (3.17) non-viscous with Brownian initial condition, i.e., when we have u(x, 0) = Bx for x 0. Indeed, we verify that (a(x, 1) − a(0, 1), x 0) and (τx , x 0) follow the same rule (see [BER 98]). The isolated example of the Burgers’ equation with initial Brownian data can make us hope that more general results are true – and in particular that large classes of non-linear partial differential equations generically develop multifractal solutions. However, at present, there are no proven results of this type (however, the reader can consult [VER 94] concerning Burgers’ equation in several space dimensions).
122
Scaling, Fractals and Wavelets
3.3.3. Random wavelet series Since we are interested in the local properties of the functions, it is equivalent, and also easier, to work with periodic wavelets that are obtained by periodization of a usual base of wavelets (see [MEY 90]) and are defined on the toric T = /Z. The periodic wavelets: ψ 2j (x − l) − k , j ∈ N, 0 k < 2j (3.20) ψj,k (x) = 2j/2 l∈Z
form an orthonormal base of L2 (T) (by adding the constant function equal to 1, see [MEY 90]; we use the same notation than for the wavelets on , which will not lead to any confusion). We also assume that ψ has enough regularity and zero moments. Any periodic function f is hence written as follows: ef (k, j) 2−j/2 ψj,k (x) (3.21) f (x) = j,k
where the wavelet coefficients of f are hence given by: 1 2j/2 ψj,k (t)f (t) dt ef (k, j) = 0
We assume that all the coefficients are independent and have the same law at each scale. Let ρj be the measure of common probability of 2j random variables Xj,k = −(log2 |df (k, j)|)/j (signs of wavelet coefficients do not have any influence on the Hölder regularity, which is why we are not making any assumptions on this subject). Thus, the measure ρj verifies: P |ef (k, j)| 2−aj = ρj (−∞, a] We will make the following assumption on ρj : ∃ > 0 :
supp(ρj ) ⊂ [, +∞]
It signifies that the sample paths of the process are uniformly Hölder. We need to define the logarithmic density ρ˜(α) of the coefficients; i.e.: log2 2j ρj ([α − , α + ]) ρ¯(α) = lim lim sup →0 j→+∞ j The reader should note that this density is, in fact, a spectrum of large deviation, but calculated from the wavelet coefficients (see Chapter 4 where general results concerning spectra of large deviation are established). Then, we note: ρ¯(α) if ρ¯(α) 0 ρ˜(α) = 0 otherwise
Wavelet Methods for Multifractal Analysis of Functions
123
or:
Hmax =
sup α>0
ρ˜(α) α
−1
THEOREM 3.4.– Let f be a random wavelet series verifying a uniform regularity assumption. The singularity spectrum of almost any trajectory of f is supported by [, Hmax ] and, within this interval: ρ˜(α) α∈[0,H] α
fH (H) = H sup
(3.22)
Function ρ˜(α) being essentially arbitrary, we notice that a singularity spectrum of a series of random wavelets is not necessarily concave. 3.4. Multifractal formalisms Even if the singularity spectrum of numerous mathematical functions can be determined by simply using Definition 3.4 with regard to the multifractal signals, it is not practical to determine their regularity at each point. This is because the Hölder exponent can be discontinuous everywhere, and it is even less practical to calculate the infinity of the corresponding Hausdorff dimensions! Frisch and Parisi have introduced a formula which allows us to deduce the singularity spectrum of a signal from quantities that are easily measurable. It is the first example of what we now call multifractal formalisms. Several variants have been proposed since then; we shall describe some of them and we shall compare their respective performances. Additional results are found in Chapter 4. The formula proposed by Frisch and Parisi is based on the knowledge of the Besov spaces to which the function belongs. This is why we begin with a few reminders regarding these spaces and their characterization by wavelets. 3.4.1. Besov spaces and lacunarity One of the reasons for the success of wavelet decomposition in applications is that they often provide representations of signals that are very lacunary (few coefficients are numerically not negligible). This lacunarity is often quantified by determining to which Besov spaces the considered function belongs to. For the characterization of the wavelet coefficients of Besov spaces see [MEY 90]: ∀s ∈ , p > 0, f ∈ B s,p () ⇐⇒
j,k
1
|ef (k, j) 2(s− p )j |p
1/p < +∞
(3.23)
124
Scaling, Fractals and Wavelets
When we have p 1, the Besov spaces are very close to the Sobolev spaces; indeed, if Lp,s is the space of the functions of Lp for which the fractional derivatives of order s still belong to Lp , we have the following injections: ∀ > 0,
∀p 1,
Lp,s+ → B s,p → Lp,s−
However, a determining advantage over Sobolev spaces is that the Besov spaces are defined for any p > 0. It is precisely these spaces for p close to 0 that enable us to measure the lacunarity of the representation in wavelets of f . We illustrate this point with an example. Let us consider the function:
H(x) =
1 if |x| 1 0 otherwise
and let us assume that the wavelet chosen has a compact support, with the interval [−A, A]. Because ψ is a zero integral, for each j, there is less than 4A non-zero wavelet coefficients, so much so that the decomposition in wavelets of f is very lacunary. Because H(x) is bounded, we have |df (k, j)| C for any j, k. By using (3.23), H(x) belongs to B s,p () as soon as we have s < 1/p. Let us show, at the same time, that such an assertion is a way to quantify the fact that the decomposition in wavelets of f is very lacunary. Let us suppose that f is a bounded function satisfying: ∀p > 0,
∀s <
1 , p
f ∈ B s,p ()
We will verify that for any D > 0 and for any > 0, at each scale j, there are −Dj . Indeed, if this was not the less than C(, D)2 j coefficients of a size larger than 2 case, by taking p = /(2D), we would obtain k |df (k, j)|p → +∞ when j → +∞, which is a contradiction. Here is another illustration of the relation between the lacunarity of the decomposition in wavelets and the Besov regularity. We assume that f belongs to ∩p>0 B 1/p,p . Coming back to (3.23), we observe that this condition means exactly that the sequence df (k, j) belongs to lp for any p > 0. Let us then note by dn the rearrangement in a decreasing order of the sequence of wavelet coefficient modules |df (k, j)|; hence, the sequence dn also belongs to lp for any p > 0. Thus: ∀p
∃Cp
such that
∞ n=1
dpn Cp
Wavelet Methods for Multifractal Analysis of Functions
125
Because the sequence dn is decreasing: ∀N,
N dpN
N n=1
dpn
∞
dpn Cp
n=1
and thus dN (Cp )1/p N −1/p . Since we can take p arbitrarily close to 0, we observe that the rearrangement in a decreasing order of the sequence |df (k, j)| has fast decay, which is, once again, a way to express the lacunarity (the converse is immediate: if the sequence dn has fast decay, it belongs to all lp and thus this is also the case for the sequence df (k, j)). The Besov space for p < 1 is not locally convex, which partly explains the difficulties in their utilization. Before the introduction of wavelets, these spaces were characterized either by order of approximation of f with rational fractions for which the numerator and the denominator have a fixed degree, or by an order of approximation with the splines “with free nodes” (which means that we are free to choose the points where the polynomials in parts are connected) (see [DEV 98, JAF 01a]). However, these characterizations are difficult to handle and hence do not have any real numerical applications. Characterization (3.23) shows that the knowledge of the Besov spaces to which f belongs is clearly linked to the asymptotic behavior (when j → +∞) of the moments of distribution of the wavelet coefficients of f ; see (3.26). Generally, more information is available; indeed, these moments are deduced from the histogram of the coefficients at each scale j. This is why it is normal to wonder which information regarding the pointwise regularity of f can be deduced from the knowledge of these histograms. We present a study of this problem below. We observe that the cascade type models for the evolution of the repartition function of wavelet coefficients through the scales have been proposed to model the speed of turbulent flows [ARN 98]. To start with, let us point out a limitation of the multifractal analysis: functions having the same histograms of wavelet coefficients at each scale can have singularity spectra that are completely different [JAF 97a]. In the multifractal analysis, it is not only the histogram of the coefficients which is important, but also their positions. This is why no formula deducing the singularity spectrum from the knowledge of the histograms can be valid in general. However, we can hope that some formulae are “more valid than others”. Indeed, we have observed that if the coefficient values are independent random variables, there is a spectrum which is almost sure; we shall notice that the formula that yields this spectrum differs from the formulae proposed until now. Another approach consists of specifying the information on the function and considering the functional spaces that take the positions of the large wavelet coefficients into consideration (see [JAF 05]).
126
Scaling, Fractals and Wavelets
3.4.2. Construction of formalisms The construction of a multifractal formalism can be based on two types of considerations: – counting arguments: we consider the increments (or wavelet coefficients) having a certain size; we estimate their number and deduce their contribution to some “calculable” quantities; – more mathematical arguments: we prove that a bound of the spectrum, according to the “calculable” quantities, is generally true and that this bound is “generically” an equality. The term “generically” is to be understood in the sense of “Baire classes” if the information at the start is of functional type, or as “almost sure” if the information at the start is a probability. We begin by describing the first approach; we do not exactly recapitulate the initial argument of Frisch and Parisi, but rather its “translation” in wavelets, as found in [ARN 95b, JAF 97a]. This approach admits two variants, according to the information on the function that we have. Indeed, we can start from: – the partition function τ (q) defined from knowledge of the Besov spaces to which f belongs: q log k |ef (k, j)| s/q,q } = 1 + lim inf τ (q) = sup{s : f ∈ B j→+∞ log 2−j – histograms of wavelet coefficients. Generally, for each j, let Nj (α) = #{k : |df (k, j)| 2−αj }. Thus, we have E(Nj (α)) = 2j ρj ([0, a]). If: log Nj (α + ) − Nj (α − ) ρ(α, ) = lim sup log(2j ) j→∞
(3.24)
then, we define: ρ(α) = inf ρ(α, ) >0
(3.25)
˜ (there is an order of 2ρ(α)j coefficients of size ∼ 2−αj ).
It is important to note that the information provided by ρ(α) is richer than that provided by τ (q); indeed, τ (q) can be deduced from the histograms with: −1 −j −αqj 2 log2 2 Nj (α) dα (3.26) τ (q) = lim inf j→+∞ j
Wavelet Methods for Multifractal Analysis of Functions
because, by definition of Nj , we have deduce that:
k |df
(k, j)|q =
#
τ (q) = inf αq − ρ(α) + 1 α0
127
2−αqj dNj (α). It is easy to (3.27)
On the other hand, we cannot reconstitute ρ(α) from τ (q); indeed, it is clear from (3.27) that the two functions ρ(α) and ρ (α) having the same concave envelope lead to the same function τ (q). Based on τ (q), we can thus only obtain the envelope of ρ(α), by carrying out a Legendre transformation once again. We will now describe the heuristic arguments based on the construction of these multifractal formalisms. We will divide them into four steps, highlighting the implicit assumptions that we make for each of them. S TEP 1. The first assumption, common to both approaches, is that the Hölder exponent at each point x0 is given by the order of magnitude of the wavelet coefficients of f in a cone |k2−j − x0 | C2−j . With respect to (3.6), if they are decreasing as 2−Hj , we then have hf (x0 ) = H. This assumption is verified if f does not have “cusp” type singularities ([MEY 98]), i.e., the oscillation exponent is zero everywhere. If we go from the data of ρ(H), we have, as an assumption, 2ρ(H)j wavelet coefficients of size 2−Hj . By using the supports of the corresponding wavelets to cover EH , we expect to obtain fH (H) = ρ(H). Thus, we also obtain a first form of multifractal formalism: the formalism said to be “of large deviation”, which simply affirms that: fH (H) = ρ(H) Let us briefly justify this name as well as that of the large deviation spectrum that we sometimes give to function ρ. The theory of large deviations takes care of the calculation of the probabilities, which are so small that we can only correctly estimate them on logarithmic scales. The basic example is as follows: if Xi are n independent reduced centered Gaussian, and if we have S˜n = n1 i=1 Xi , then we obtain n1 log P(|S˜n | δ) ∼ −δ 2 /2. The analogy with (3.24) and (3.25) is striking since, in the common law of wavelet coefficients, the parts of very small probabilities, which we measure with the help of a logarithmic scale, are those that provide the relevant information ρ(α) for calculating the spectrum (see Chapter 4 for a more detailed study). The effective calculation of function ρ(α) is numerically delicate because its definition leads to a double limit, which generally results into problems said to be “of finite size”. In theory, we must go “completely” to the limit in j in (3.24) before taking the limit in in (3.25). Practically speaking, the two limits must effectively be taken “together”. The problem is then to know how to take j sufficiently large according to , which creates significant numerical stability problems. In any case, a
128
Scaling, Fractals and Wavelets
calculation of ρ(α) which is numerically reliable requires us to know the signal on a large number of scales, i.e., with an excellent precision. This is why we often prefer to work from averages, such as k |df (k, j)|q , i.e., finally, based on the partition function for which the definition leads to only one limit. From now, this is the point of view that we shall adopt (however, let us note that the direct method introduced by Chhabra and Jensen in [CHH 89] is a method for calculating ρ(α) without going through a double limit, or through a Legendre transformation; we shall find a mathematical discussion of this method adapted to the framework of the wavelets in [JAF 04a]). S TEP 2. We will estimate, for each H, the contribution of Hölder singularities of exponent H at: |ef (k, j)|q (3.28) k
Each singularity of this type brings a contribution of C2−Hqj and there must be ∼ 2fH (H)j intervals of length 2−j to recover these singularities; the total contribution of the Hölder singularities of exponent H at (3.28) is thus as follows: 2fH (H)j 2−Hqj = 2−(Hq−fH (H))j
(3.29)
This is a critical step of reasoning; it contains an inversion of limits that implicitly assumes that all Hölder singularities have coefficients ∼ 2−Hj simultaneously from a certain scale J and that Hausdorff dimension can be estimated as if it were a box dimension. It is notable that the multifractal formalism leads to the correct singularity spectrum in several situations where these two assumptions are not verified. S TEP 3. The third step is an argument of the “Laplace method” type. When j → +∞, we note that, among the (3.29) terms, the one that brings the main contribution to (3.28) is that for which the exponent H carries out the minimum of Hq − fH (H), from which comes the heuristic formula: τ (q) − 1 = inf hq − fH (H) H
S TEP 4. If fH (H) is concave, then −fH (H) and −τ (q) + 1 are conjugate convex functions and each of them can be deduced from the other by a Legendre transformation. Thus, if we define the Legendre spectrum with: fL (H) = inf Hq − τ (q) + 1 q
we deduce a first formulation of the multifractal formalism: fH (H) = fL (H) = inf Hq − τ (q) + 1 q
(3.30)
Wavelet Methods for Multifractal Analysis of Functions
129
The assumption on which the concavity fH (H) depends has no need to be verified. We can then: – stop at step 3 and only affirm that τ (q) is the Legendre transformation of fH (H). However, this lower form of multifractal formalism is of little interest because the quantity that we would like to calculate is fH (H) and the quantity that we know is generally τ (q); – affirm that (3.30) provides, in fact, the concave spectrum envelope. This information is, however, particularly ambiguous when the function calculated in this manner contains segments at the right (it is often the case, see [JAF 97b, JAF 99]): do they correspond to effective values of the spectrum or only to its envelope in a region where it is not concave? 3.5. Bounds of the spectrum We will obtain bounds of fH (H) valid in general. Once again, we have here two points of view, depending on whether the information that we have concerns the functional spaces to which f belongs or histograms of the wavelet coefficients. We can make an observation before any calculation: since ρ(α) contains more information than τ (q), we expect that the bounds obtained from ρ(α) will be better; we shall see that this is indeed the case. 3.5.1. Bounds according to the Besov domain We start by bounding the singularity spectrum of functions belonging to a fixed Besov space. PROPOSITION 3.5.– Let s > 1/p, α ∈ [s − p1 , s] and d = 1 − p(s − α). Then, for any function f ∈ B s,p , all the points x0 where f ∈ C s−( d-dimensional measure equal to zero.
1−d p )
(x0 ) have a Hausdorff
Proof. Let s > 0 and p > 0. Let us note: ef (k, j) = df (k, j) 2j(s−1/p) With respect to (3.23), the condition f ∈ B s,p can then be rewritten: |ef (k, j)|p < ∞
(3.31)
j,k
Let us note by Ij,k the interval centered in k 2−j and length |ef (k, j)|p/d . We can now rewrite (3.31): ! "d diam(Ij,k ) < ∞
130
Scaling, Fractals and Wavelets
For all J,(Ij,k )jJ form a covering of all the points belonging to an infinity of intervals Ij,k , so much so that the Hausdorff d-dimensional measure of this set is zero. If a point x belongs to more than a finite number of intervals Ij,k , there exists J (= J(x)) such that: k ∀j J, ∀k j − x |ef (k, j)|p/d 2 and thus: d/p k 1 1 d |ef (k, j)| j − x 2−(s− p )j = 2−(s− p + p )j |2j x − k|d/p 2 1
d
Since we have s − 1/p > 0, Proposition 3.2 implies that f ∈ C s− p + p (x), from which we obtain Proposition 3.5, since: s−
1 d + = α. p p
NOTE 3.4.– The condition s > 1/p is necessary because there are functions of B 1/p,p that are nowhere locally bounded, (see [JAF 00c]). We will now deduce from Proposition 3.5 a bound of the spectrum fH (H). Let us assume that τ (q) is known. Let q and > 0 be fixed. By definition of τ (q), for any > 0, f belongs to B (τ (q)− )/q,q . We can then apply Proposition 3.5 for all q such that: ∃ > 0 :
1 τ (q) − > q q
which is equivalent to τ (q) > 1. If f is uniformly Hölder, τ (q) is increasing and continuous, and verifies: lim τ (q) = 0 and
q→0
τ (q) → +∞
when
q → +∞
There is a unique value qc such that τ (qc ) = 1 and, for any q > qc , we thus have: τ (q) − −H fH (H) 1 − q q thus: fH (H) qH − τ (q) + 1 Since this result is true for any q > qc , we have shown the following proposition.
Wavelet Methods for Multifractal Analysis of Functions
131
PROPOSITION 3.6.– If f is uniformly Hölder, the singularity spectrum of f verifies the bound: (3.32) fH (H) inf qH − τ (q) + 1 q>qc
The quasi-sure results that we now describe show that bound (3.32) is optimal. Proposition 3.6 suggests that, in formula (3.30), the domain of q on which the Legendre transform must be calculated is the interval [qc , +∞). Let us assume that τ (q) is a function of partition admissible, i.e., it is the partition function of a function f uniformly Hölder (this will be the case if s(q) = qτ (1/q) is concave and verifies 0 s (q) 1 and s(0) > 0; see [JAF 00b]). To say that f has as a partition function τ (q) implies, by definition, that f belongs to the space V (= Vτ (q) ) defined by: B (τ (q)− )/q,q (3.33) V = >0,q>0
Space V is a Baire space, i.e. it has the following property: any intersection of countable dense open subsets is dense. In a Baire space, a property which is true at least on one intersection countable dense open subsets is said to be quasi-sure. Hence, it is natural to wonder if the multifractal formalism occurs almost surely in V . The following theorem, taken from [JAF 00b], answers this question. THEOREM 3.5.– Let τ (q) be obtainable and V the space defined by (3.33). The definition domain of the singularity spectrum of almost any function of V is the interval [s(0), 1/qc ] and, on this interval, we have: (3.34) fH (H) = inf Hq − τ (q) + 1 qqc
Formula (3.34) affirms that the singularity spectrum of almost any function is made up of two parts: if we have H < τ (qc ), the infimum in (3.34) is reached for q > qc and the spectrum can be calculated using the “usual” Legendre transformation of τ (q): fH (H) = inf Hq − τ (q) + 1 q>0
If we have τ (qc ) H 1/qc , the infimum in (3.34) is reached for q = qc and the spectrum is the segment at the right fH (H) = Hqc . The study of regularity properties that are almost sure derives from Banach (see [BAN 31]). Buczolich and Nagy have shown in [BUC 01] that almost any monotone function is multifractal of spectrum fH (H) = H for H ∈ [0, 1].
132
Scaling, Fractals and Wavelets
3.5.2. Bounds deduced from histograms The following proposition provides the optimal of the Hölder spectrum, which can be deduced in general from the histograms of wavelet coefficients [AUB 00]. PROPOSITION 3.7.– If we have f ∈ C () for a > 0, then: fH (H) H sup α∈[0,H]
ρ(α) α
(3.35)
This becomes an equality for random wavelet series. Indeed, they verify ρ˜(α) = ρ(α), which shows that this bound is optimal. We can easily verify that it implies (3.32). However, (3.35) clearly provides a better bound if ρ(α) is not concave. Once again, we see that the histogram of the coefficients strictly contains more “useful” information than the partition function. We will note that, although (3.35) is more precise than (3.32), the fact still remains that (3.32) is optimal, when the only information available is the partition function (as shown by almost sure results). The optimal bounds (3.35) and (3.32) can propose variants of the multifractal formalism. We say that the almost sure multifractal formalism is verified if (3.35) is saturated, i.e. if: ρ(α) α∈[0,H] α
fH (H) = H sup
(3.36)
and the multifractal formalism almost sure is verified if (3.35) is saturated, i.e. if: (3.37) fH (H) = inf qH − τ (q) + 1 qqc
3.6. The grand-canonical multifractal formalism The aim of the grand-canonical multifractal formalism is to calculate the spectrum of oscillating singularities d(H, β) which, by definition, provides the Hausdorff dimension of all the points where the Hölder exponent is H and the oscillation exponent β. This formalism is based on new functional spaces. To define them, we will use the more geometric notations that follow: λ and λ will designate the dyadic intervals λj,k = k 2−j + [0, 2−j ] and λj ,k = k 2−j + [0, 2−j ] respectively, Cλ will designate the coefficient df (k, j) and ψλ the wavelet ψ(2j x − k).
DEFINITION 3.9.– Let p > 0 and s, s ∈ . A function f belongs to à Ops,s () if its wavelet coefficients satisfy: 1/p sj s j p sup |Cλ 2 | <∞ (3.38) sup 2 j∈Z
k
λ ⊂λ
Wavelet Methods for Multifractal Analysis of Functions
133
This definition is independent of the base of wavelets chosen [JAF 98]. We now derive a grand-canonical multifractal formalism, which will enable us to obtain d(H, β) from the knowledge of the wavelet coefficients of f . This formalism is based on a reasoning similar to that of section 3.4. It is motivated by two preoccupations: obtaining information that is more complete for the Hölder singularities of the signal and explicitly taking into consideration the behaviors of chirp type, which eliminates one of the causes of failure in the multifractal formalism. Indeed, we have seen, in step 1 of section 3.4.2, that the previous multifractal formalisms assume that the signal does not contain any chirps. Let: p s j log sup |C | 2 λ ⊂λ λ k ζ(p, s ) = lim inf j→+∞ log 2−j
= sup s : f ∈ Ops/p,s /p If f has a chirp with exponents (H, β) in x0 , its wavelet coefficients are of the order of magnitude of |k 2−j − x0 |H close to the curve 2−j ∼ |k 2−j − x0 |1+β . For each pair (H, β), we shall estimate the contribution of the chirp of exponents (H, β) to the quantity: sup |Cλ |p 2s j . (3.39) λ∈Λj
λ ⊂λ
Let us consider an interval λ of side 2−j that contains a chirp with exponent (H, β). The wavelet coefficients Cλ for λ ⊂ λ are negligible as long as we have 2−j (2−j )1+β , i.e., as long as we have j j(1 + β). When j ∼ j(1 + β), we have, for certain values of k : |Cλ | ∼ (2−j )H ∼ 2−j and hence:
H 1+β
Hp sup |Cλ |p 2s j ∼ 2−j ( 1+β −s ) ∼ 2−j(Hp−(1+β)s )
λ ⊂λ
(as long as we have s pH/(1 + β)). The contribution of the chirp of exponents (H, β) at (3.39) is thus:
2d(H,β)j 2−j(Hp−(1+β)s ) = 2−j(Hp−(1+β)s −d(H,β)) When j → +∞, the main contribution is provided by the pair (H, β) for which the inf of Hp − (1 + β)s − d(H, β) is attained, from which we obtain the heuristic formula: ζ(s , p) = inf Hp − (1 + β)s − d(H, β) H,β
134
Scaling, Fractals and Wavelets
If d(H, β) is a convex function, then: d(H, β) = inf Hp − (1 + β)s − ζ(s , p) s ,p
(3.40)
If we are interested in the Hölder spectrum fH (H), we can deduce it from the following argument: for a fixed H, the value of β which brings the largest contribution to (3.40) will provide the correct dimension fH (H), hence the formula: ' ( (1 + β)s fH (H) = sup d(H, β) = sup inf + Hp − ζ(s , p) (3.41) β
β
s ,p
Of course, this formula is not at all equivalent to the standard multifractal formalisms, as we can easily verify in the example of lacunary wavelet series. 3.7. Bibliography [ABR 98] A BRY P., V EITCH D., “Wavelet analysis of long-range-dependent traffic”, EEE Trans. Inf. Theory, vol. 44, no. 1, p. 2–15, 1998. [AMA 00] A MANN A., M AYR G., S TROHMENGER H.U., “N (α) histogram analysis of the ventricular fibrilation ECG-signal as predictor of countershock success”, Chaos, Solitons, and Fractals, vol. 11, p. 1205–1212, 2000. [ARN 95a] A RNEODO A., A RGOUL F., BACRY E., E LEZGARAY J., M UZY J.F., Ondelettes, multifractales et turbulence : de l’ADFN aux croissances cristallines, Diderot, 1995. [ARN 95b] A RNEODO A., BACRY E., M UZY J.F., “The thermodynamics of fractals revisited with wavelets”, Physica A, vol. 213, p. 232–275, 1995. [ARN 97] A RNEODO A., BACRY E., JAFFARD S., M UZY J.F., “Oscillating singularities on Cantor sets: A grand canonical multifractal formalism”, Journal of Statistical Physics, vol. 87, p. 179–209, 1997. [ARN 98] A RNEODO A., BACRY E., M UZY J.F., “Random cascades on wavelet dyadic trees”, Journal of Mathematical Physics, vol. 39, no. 8, p. 4142–4164, 1998. [AUB 99] AUBRY J.M., “Representation of the singularities of a function”, Appl. Comput. Harmon. Anal., vol. 6, no. 2, p. 282–286, 1999. [AUB 00] AUBRY J.M., JAFFARD S., “Random wavelet series”, Comm. Math. Phys., vol. 227, p. 483–514, 2002. [AYA 08] AYACHE A., JAFFARD S., Hölder Exponents of Arbitrary Functions (preprint), 2008. [BAN 31] BANACH S., Über die Baire’sche Kategorie gewisser Funktionenmengen, Studia Math., vol. 3, p. 174–179, 1931. [BEN 97] B ENASSI A., JAFFARD S., ROUX D., “Elliptic Gaussian random processes”, Revista Mathematica Iberoamericana, vol. 13, no. 1, p. 19–90, 1997. [BER 96] B ERTOIN J., An Introduction to Lévy Processes, Cambridge University Press, 1996.
Wavelet Methods for Multifractal Analysis of Functions
135
[BER 98] B ERTOIN J., “The inviscid Burgers equation with Brownian initial velocity”, Communications in Mathematical Physics, vol. 193, no. 2, p. 397–406, 1998. [BIS 98] B ISWAS M.K., G HOSE T., G UHA S., B ISWAS P.K., “Fractal dimension estimation for texture images: A parallel approach”, Pattern Recognition Letters, vol. 19, no. 3–4, p. 309–313, 1998. [BUC 01] B UCZOLICH Z., NAGY J., “Hölder spectrum of typical monotone continuous functions”, Real Anal. Exch., vol. 26, no. 1, p. 133–156, 2000–2001. [CHA 99] C HASSANDE -M OTTIN E., F LANDRIN P., “On the time-frequency detection of chirps”, Appl. Comput. Harmon. Anal., vol. 6, no. 2, p. 252–281, 1999. [CHH 89] C HHABRA A.B., J ENSEN R.V., “Direct determination of the f (α) singularity spectrum”, Physical Review Letters, vol. 62, p. 1327–1330, 1989. [DAO 95] DAOUDI K., L ÉVY V ÉHEL J., “Speech signal modeling based on local regularity analysis”, in IASTED/IEEE, International Conference on Signal and Image Processing (SIP’95, Las Vegas, New Mexico), 1995. [DEV 98] D E VORE R., “Nonlinear approximation”, Acta Numerica, p. 1–99, 1998. [DUB 89] D UBUC B., Z UCKER S.W., T RICOT C., Q UINIOU J.F., W EHBI D., “Evaluating the fractal dimension of surfaces”, Proceedings of the Royal Society of London A, vol. 425, p. 113–127, 1989. [EVA 98] E VANS L.C., “Partial differential equations”, American Mathematical Society, Graduate studies in mathematics, 1998. [FRI 95] F RISCH U., Turbulence, Cambridge University Press, 1995. [GAG 87] G AGNE Y., Etude expérimentale de l’intermittence et des singularités dans le plan complexe en turbulence développée, PhD Thesis, Grenoble University, 1987. [GIN 00] G INCHEV I., ROCCA M., “On Peano and Riemann derivatives”, Rend. Circ. Mat. Pal. Ser. 2, vol. 49, p. 463–480, 2000. [GUI 98] G UIHENEUF B., JAFFARD S., L ÉVY V ÉHEL J., “Two results concerning chirps and 2-microlocal exponents prescription”, Appl. Comp. Harm. Anal., vol. 5, no. 4, p. 487–492, 1998. [JAF 91] JAFFARD S., “Pointwise smoothness, two-microlocalization coefficients”, Publicacions Mathematiques, vol. 35, p. 155–168, 1991.
and
wavelet
[JAF 96a] JAFFARD S., “The spectrum of singularities of Riemann’s function”, Revista Mathematica Iberoamericana, vol. 12, no. 2, p. 441–460, 1996. [JAF 96b] JAFFARD S., M EYER Y., “Wavelet methods for pointwise regularity and local oscillations of functions”, Memoirs of the AMS, vol. 123, no. 587, 1996. [JAF 97a] JAFFARD S., “Multifractal formalism for functions”, SIAM Journal of Mathematical Analysis, vol. 28, no. 4, p. 944–998, 1997. [JAF 97b] JAFFARD S., “Old friends revisited. The multifractal nature of some classical functions”, J. Four. Anal. App., vol. 3, no. 1, p. 1–22, 1997.
136
Scaling, Fractals and Wavelets
[JAF 97c] JAFFARD S., M ANDELBROT B., “Peano-Polya motion, when time is intrinsic or binomial (uniform or multifractal)”, The Mathematical Intelligencer, vol. 19, no. 4, p. 21–26, 1997. [JAF 98] JAFFARD S., “Oscillation spaces: Properties and applications to fractal and multifractal functions”, Journal of Mathematical Physics, vol. 39, no. 8, p. 4129–4141, 1998. [JAF 99] JAFFARD S., “The multifractal nature of Lévy processes”, Probability Theory and Related Fields, vol. 114, no. 2, p. 207–227, 1999. [JAF 00a] JAFFARD S., “On lacunary wavelet series”, Ann. Appl. Proba., vol. 10, no. 1, p. 313–329, 2000. [JAF 00b] JAFFARD S., “On the Frisch-Parisi conjecture”, J. Math. Pures Appl., vol. 76, no. 6, p. 525–552, 2000. [JAF 00c] JAFFARD S., M EYER Y., “On the pointwise regularity of functions in critical Besov spaces”, J. Funct. Anal., vol. 175, p. 415–434, 2000. [JAF 01a] JAFFARD S., “Functions with prescribed Hölder and chirps exponents”, Revista Mathematica Iberoamericana, 2001. [JAF 01b] JAFFARD S., M EYER Y., RYAN R., Wavelets: Tools for Science and Technology, SIAM, 2001. [JAF 04a] JAFFARD S., “Beyond Besov spaces part 1: Distributions of wavelet coefficients”, J. Four. Anal. Appl., vol. 10, no. 3, p. 221–246, 2004. [JAF 04b] JAFFARD S., “Wavelet techniques in multifractal analysis”, in L APIDUS M., VAN F RANKENHUIJSEN M. (Eds.), Fractal Geometry and Applications: A Jubilee of Benoît Mandelbrot, Proceedings of Symposia in Pure Mathematics, AMS, vol. 72, Part 2, p. 91–152, 2004. [JAF 05] JAFFARD S., “Beyond Besov spaces part 2: Oscillation spaces”, J. Constr. Approx., vol. 21, no. 1, p. 29–61, 2005. [JAF 06] JAFFARD S., L ASHERMES B., A BRY P., “Wavelet leaders in multifractal analysis”, in Q IAN T. et al. (Eds.), Wavelet Analysis and Applications, Birkhäuser, vol. 72, Part 2, p. 219–264, 2006. [JAF NIC] JAFFARD S., N ICOLAY S., “Pointwise smoothness of space-filling functions”, to appear in Appl. Comp. Harmon. Anal. [LAS 08] L ASHERMES S., ROUX S., A BRY P., JAFFARD S., “Comprehensive multifractal analysis of turbulent velocity using wavelet leaders”, Eur. Phys. J. B., vol. 61, no. 2, p. 201–215, 2008. [LEV 95] L ÉVY V ÉHEL J., “Fractal approaches in signal processing”, Fractals: Symposium in Honor of Benoît Mandelbrot (Curaçao), vol. 3, no. 4, p. 755–775, 1995. [LEV 97] L ÉVY V ÉHEL J., R IEDI R., “Fractional Brownian motion and data traffic modeling: the other end of the spectrum”, in L ÉVY V ÉHEL J., L UTTON E., T RICOT C. (Eds.), Fractals in Engineering, Springer-Verlag, 1997. [MAN 95] M ANDELBROT B., Les Objets Fractals, Flammarion, 1995.
Wavelet Methods for Multifractal Analysis of Functions
137
[MAN 97] M ANDELBROT B., Fractals and Scalings in Finance, Springer, 1997. [MAN 98] M ANDELBROT B., Multifractals and 1/f Noise, Springer, 1998. [MEY 90] M EYER Y., Ondelettes et opérateurs, Hermann, 1990. [MEY 98] M EYER Y., “Wavelets, vibrations, and scalings”, in CRM Series AMS, University of Montreal Press, vol. 9, 1998. [PEL 96] P ELTIER R., L ÉVY V ÉHEL J., “Multifractional Brownian Motion: Definitions and Preliminary Results”, Technical Report, 1996. [PRU 81] P RUITT W., “The growth of random walks and Lévy processes”, Annals of Probability, vol. 9, no. 6, p. 948–956, 1981. [SCH 95] S CHLESINGER, Z ASLAVSKY, F RISCH (Eds.), Lévy Flights and Related Topics (Proceedings of the Nice Workshop, 1994), Springer-Verlag, 1995. [TAQ 97] TAQQU M., T EVEROVSKY V., W ILLINGER W., “Is network traffic self-similar or multifractal?”, Fractals, vol. 5, no. 1, p. 63–73, 1997. [TRI 97] T RICOT C., “Function norms and fractal dimension”, SIAM J. Math. Anal., vol. 28, p. 189–212, 1997. [VER 94] V ERGASOLA M., D UBRULLE B., F RISCH U., N OULLEZ A., “Burgers’ equation, devil’s staircase and the mass distribution for large-scale structures”, Astronomy and Astrophysics, vol. 289, p. 325–356, 1994. [WEN 07] W ENDT H., A BRY P., JAFFARD S., “Bootstrap for empirical multifractal analysis”, IEEE Signal Processing Magazine, vol. 24, no. 4, p. 38–48, 2007. [WIL 96] W ILLINGER W., TAQQU M., E RRAMILLI A., “A bibliographical guide to self-similar traffic and performance modeling for modern high-speed networks”, in K ELLY F.P. et al. (Eds.), Stochastic Networks: Theory and Applications (Selected Papers of the Royal Statistical Society Research Workshop, August 1995), Royal Statistical Society Lecture Notes Series, vol. 4, Clarendon Press, Oxford, p. 339–366, 1996. [WIL 00] W ILLINGER W., R IEDI R., TAQQU M., “Long-range dependence and data network traffic”, in D OUKHAN O., TAQQU M. (Eds.), Long-range Dependence: Theory and Applications, Birkhäuser, 2000.
This page intentionally left blank
Chapter 4
Multifractal Scaling: General Theory and Approach by Wavelets
4.1. Introduction and summary Fractal processes have been successfully applied in various fields such as the theory of fully developed turbulence [MAN 74, FRI 85, BAC 93], stock market modeling [EVE 95, MAN 97, MAN 99], and more recently in the study of network data traffic [LEL 94, NOR 94]. In networking, models using fractional Brownian motion (FBM) have helped advance the field through their ability to capture fractal features such as statistical self-similarity and long-range dependence (LRD). It has been recognized, however, that multifractal features need to be accounted for further, so as to gain a better understanding of network traffic, but also of stock exchange [RIE 97a, RIE 99, RIE 00, FEL 98, MAN 97]. In short, there is a call for more versatile models which can, for example, incorporate LRD and multifractal properties independently of each other. Roughly speaking, a fractal entity is characterized by the inherent, ubiquitous occurrence of irregularities, which govern its shape and complexity. The most prominent example is certainly FBM BH (t) [MAN 68]. Its paths are almost surely continuous but not differentiable. Indeed, the oscillation of FBM in any interval of size δ is of the order δ H where H ∈ (0, 1) is the self-similarity parameter: fd
BH (at) = aH BH (t).
Chapter written by Rudolf R IEDI.
(4.1)
140
Scaling, Fractals and Wavelets
Real world signals, on the other hand, often possess an erratically changing oscillation exponent, limiting the appropriateness of FBM as a model. Due to the various exponents present in such signals, they have been termed multifractals. This chapter’s main objective is to present the framework for describing and detecting such a multifractal scaling structure. In doing so we survey local and global multifractal analysis and relate them via the multifractal formalism in a stochastic setting. Thereby, the importance of higher order statistics will become evident. It might be especially appealing to the reader to see wavelets put to novel use. We focus mainly on the analytical computation of the so-called multifractal spectra, and on their mutual relations, dwelling extensively on variations of binomial cascades. Statistical properties of estimators of multifractal quantities, as well as modeling issues, are addressed elsewhere (see [GON 98, ABR 00, GON 99, MAN 97, RIE 99, RIB 06]). The remainder of this introduction provides a summary of the contents of the paper, roughly following its structure. 4.2. Singularity exponents For simplicity we consider processes Y over a probability space (Ω, F, PΩ ) and defined on a compact interval, which we assume without loss of generality to be [0, 1]. Generalization to higher dimensions is straightforward and extending this to processes defined on is simple and will be indicated. 4.2.1. Hölder continuity The erratic behavior or, more precisely, degree of local Hölder regularity of a continuous process Y (t) at a fixed given time t can be characterized to a first approximation by comparison with an algebraic function: Y is said to be in Cth if there is a polynomial Pt such that |Y (s) − Pt (s)| ≤ C|s − t|h for s sufficiently close to t. If Pt is a constant, i.e. Pt (s) = Y (t) for all s, then Y is in Cth for all h < h(t) and not in Cth for all h > h(t) where h(t) := lim inf ε→0
1 log sup |Y (s) − Y (t)|. log2 (2ε) 2 |s−t|<ε
(4.2)
On the other hand, it easy to prove the following LEMMA 4.1.– If h(t) ∈ / N then Pt is a constant, and h(t) = sup{h : Y ∈ Cth }. As the example Y (s) = s2 + s2.4 with t = 0 shows, the conclusion does not necessarily hold when h(t) ∈ N. Here, |Y (s) − Y (0)| ∼ s2 for s ∼ 0, thus h(0) = 2, while P0 (s) = s2 , Y (s) − P0 (s) = s2.4 , and thus sup{h : Y ∈ C0h } = 2.4.
Multifractal Scaling: General Theory and Approach by Wavelets
141
Proof. Assume there is h > h(t) and Pt (s) such that Y ∈ Cth . We will argue that h(t) must be an integer in this case. Note first that Pt is not constant, by definition of h(t), and we may write Pt (s) = Y (t) + (s − t)m · Q(s) for some integer m ≥ 1 and some polynomial Q without zero at t. Assume first that m < h(t) and choose h such that m < h < h(t). Writing Y (s) − Pt (s) = (Y (s) − Y (t)) − (Pt (s) − Y (t)), the first term is smaller than |s − t|h and the second term, decaying as C|s − t|m , governs. Whence h = m < h(t), against the assumption. Assuming m > h(t), choose h such that m > h > h(t) and a sequence sn such that |Y (sn ) − Y (t)| ≥ |sn − t|h , whence |Y (sn ) − Pt (sn )| ≥ (1/2)|sn − t|h for large n and h ≤ h . Letting h → h(t) we again obtain a contradiction. We conclude that h(t) equals m. For reasons of symmetry we define h(t) := lim sup ε→0
1 log sup |Y (s) − Y (t)|. log2 (2ε) 2 |s−t|<ε
(4.3)
If h(t) and h(t) coincide we denote the common value by h(t). We note first that the continuous limit in (4.2) may be replaced by a discrete limit. To this end we introduce kn (t) := t2n , an integer defined uniquely by t ∈ Iknn := [kn 2−n , (kn + 1)2−n [.
(4.4)
As n increases the intervals Ikn form a nested decreasing sequence (compare Figure 4.1). Provided n is chosen such that 2−n+1 ≤ ε < 2−n+2 we have [(kn − 1)2−n , (kn + 2)2−n [ ⊂ [t + ε, t − ε[ ⊂ [(kn−2 − 1)2−n+2 , (kn−2 + 2)2−n+2 [ from which it follows immediately that h(t) = lim inf hnkn n→∞
h(t) = lim sup hnkn n→∞
where hnkn := −
1 log2 sup |Y (s) − Y (t)| : s ∈ [(kn − 1)2−n , (kn + 2)2−n [ . n
(4.5)
It is essential to note that the countable set of numbers hnkn contains all the scaling information of interest to us. Being defined pathwise, they are random variables.
142
Scaling, Fractals and Wavelets
4.2.2. Scaling of wavelet coefficients A convenient tool for scaling analysis is found in the wavelet transform, both in its discrete and continuous forms. The discrete transform, for example, allows to represent a 1D process Y (t) in terms of shifted and dilated versions of a prototype bandpass wavelet function ψ(t), and shifted versions of a low-pass scaling function φ(t) [DAU 92, VET 95]. While such representations exist also in the framework of continuous wavelet transforms, we use the latter mainly as a “microscope” in this chapter. In the vocabulary of Hilbert spaces, the discrete wavelet and scaling functions ψj,k (t) := 2j/2 ψ 2j t − k , φj,k (t) := 2j/2 φ 2j t − k , j, k integer (4.6) form an orthonormal basis and we have the representations [DAU 92, VET 95] Y (t) =
DJ0 ,k φJ0 ,k (t) +
∞ j=J0
k
Cj,k ψj,k (t),
(4.7)
k
with Cj,k :=
∗ Y (t) ψj,k (t) dt,
Dj,k :=
Y (t) φ∗j,k (t) dt.
(4.8)
The wavelet coefficient Cj,k measures the signal content around time 2−j k and frequency 2j f0 , provided that the wavelet ψ(t) is centered at time zero and frequency f0 . The scaling coefficient Dj,k measures the local mean around time 2−j k. In the wavelet transform, j indexes the scale of analysis: J0 can be chosen freely and indicates the coarsest scale or lowest resolution available in the representation. The most simple example of an orthonormal wavelet basis are the Haar scaling and wavelet functions (see Figure 4.1a). Here, φ is the indicator function of the unit interval, while ψ = φ(2·) − φ(2 · −1). For a process supported on the unit interval a convenient choice is thus J0 = 0. The supports of the fine-scale scaling functions nest inside the supports of those at coarser scales; this can be neatly represented by the binary tree structure of Figure 4.1b. Row (scale) j of this scaling coefficient tree contains an approximation to Y (t) of resolution 2−j . Row j of the complementary wavelet coefficient tree (not shown) contains the details in scale j + 1 of the scaling coefficient tree that are suppressed in scale j. In fact, for the Haar wavelet we have Dj,k Cj,k
= =
2−1/2 (Dj+1,2k + Dj+1,2k+1 ), 2−1/2 (Dj+1,2k − Dj+1,2k+1 ).
(4.9)
Wavelet decompositions contain considerable information on the singularity behavior of a process Y . Indeed, adapting the argument of [JAF 95, p. 291] and
Multifractal Scaling: General Theory and Approach by Wavelets
2
Dj,k0
I j,k(t)
j/2
k’=0 0
0
k2
-j
k’=1 0
(k+1)2 -j
Dj+1,2k0 +1
Dj+1,2k
0
2
\ (t)
j/2
0
k’=0 1
j,k
k2 -j
143
k’=1 1
Dj+2,4k
0
(k+1)2 -j
(a)
k’=0 1
Dj+2,4k
0
+1
k’=1 1
Dj+2,4k
0
+2
Dj+2,4k0 +3
(b)
Figure 4.1. (a) The Haar scaling and wavelet functions φj,k (t) and ψj,k (t). (b) Binary tree of scaling coefficients from coarse to fine scales
correcting wavelet normalization used here for the L2 – as opposed to L1 in [JAF 95] – it is easily shown that |Y (s) − Y (t)| = O(|s − t|h ) implies that (4.10) 2n/2 |Cn,kn | = O 2−nh . This holds for any h > 0 and any compactly supported wavelet. As a matter of fact # only ψ = 0 is needed to obtain this result since the Taylor polynomial of Y is implicitly assumed to be constant. If we are interested in an analysis only, we may thus consider analyzing wavelets ψ such as derivatives of the Gaussian exp(−x2 ) which do not necessarily form a basis. To distinguish them from the orthogonal wavelets we will call them “analyzing wavelets”. In order to invert (4.10), however, we need representation (4.7) as well as some knowledge concerning the decay of the maximum of the wavelet coefficients in the vicinity of t and sufficient wavelet regularity. For a precise statement, see [JAF 95] and [DAU 92, Theorem 9.2]. All this suggests that replacing hnk (4.5) by the left hand side of (4.10) would produce an alternative description of the local behavior of Y . Consequently, we set w(t) := lim inf wknn n→∞
w(t) := lim sup wknn
(4.11)
n→∞
where wknn := −
1 log2 2n/2 Cn,kn . n
(4.12)
If w(t) and w(t) coincide we denote the common value by w(t). Using wavelets has the advantage of yielding an analysis which is largely # unaffected by polynomial trends in Y due to vanishing moments tm ψ(t)dt = 0 which are typically built into wavelets [DAU 92]. In this context recall Lemma 4.1. It has the disadvantage of complicating the analysis since the maxima of wavelet coefficients have to be considered for a reliable estimation of true Hölder continuity
144
Scaling, Fractals and Wavelets
[JAF 95, DAU 92, JAF 97, BAC 93]. In any case, the decay of wavelet coefficients is interesting in itself as it relates to LRD (compare [ABR 95]) and regularity spaces such as Besov spaces [RIE 99]. 4.2.3. Other scaling exponents Traditional multifractal analysis of a singular measure μ on the line constitutes a study of the singularity structure of its primitive M given by t μ(ds) = μ([0, t]), (4.13) M(t) = 0
Since M is an almost surely increasing process, the coarse exponents hnkn (see (4.5)) simplifying to hnkn = − n1 log2 |M((kn + 2)2−n ) − M((kn − 1)2−n )|, we are motivated (some would say “seduced”) to study an even simpler notion of a coarse exponent: α(t) := lim inf αknn n→∞
α(t) := lim sup αknn
(4.14)
n→∞
where αknn := −
1 1 log2 |M (kn + 1)2−n − M(kn 2−n )| = − log2 μ Iknn . n n
(4.15)
If α(t) and α(t) coincide we denote the common value by α(t). This exponent α(t) has attracted considerable attention in the multifractal community, with its potential due to its simplicity. In [LV 98] various examples of more general exponents were introduced, all of which are so-called Choquet capacities, a notion which is not needed to develop the multifractal formalism. As an interesting alternative, [PEY 98] considers an arbitrary function ξ(I) from the space of all intervals to + (instead of only the Ikn ) and develops a multifractal formalism similar to ours. There, it is suggested that we consider the oscillations of Y around the mean, i.e. # Y (s)ds (4.16) ξ(I) := Y (t) − I dt. |I| I Proceeding as with hn (t), we are led to the singularity exponent −(1/n) log2 (ξ(Ikn )) which is of particular interest since it can be used to define oscillation spaces, such as Sobolev spaces and Besov spaces. Another useful choice consists of interpolating Y in the interval I by the linear function aI + bI t and considering 1/2 2 (Y (t) − (aI + bI t)) dt . (4.17) ξ(I) := I
Multifractal Scaling: General Theory and Approach by Wavelets
145
This exponent measures the variability of Y and is related to the dimension of the paths of Y . Deducting constant, resp. linear terms in formulae (4.16) and (4.17) reminds us of the use of wavelets with one, resp. two vanishing moments. 4.3. Multifractal analysis Multifractal analysis has been discovered and developed in [MAN 74, FRI 85, KAH 76, GRA 83, HEN 83, HAL 86, CUT 86, CAW 92, BRO 92, BAC 93, MAN 90b, HOL 92, FAL 94, OLS 94, ARB 96, JAF 97, PES 97, RIE 95a, MAN 02, BAC 03, BAR 04, BAR 02, CHA 05, JAF 99] to give only a short list of some relevant work done in this area. The main insight consisted of the fact that local scaling exponents on fractals as measured by h(t), α(t) or w(t) are not uniform or continuous as a function of t, in general. In other words, h(t), α(t) and w(t) typically change in an erratic way as a function of t, thus imprinting a rich structure on the object of interest. This structure can be captured either in geometric terms, making use of the concept of dimensions, or in statistical terms based on sample moments. A useful connection between these two descriptions emerges from the multifractal formalism. As we will see, as far as the multifractal formalism is concerned there is no restriction in choosing a singularity exponent which seems fit for describing scaling behavior of interest. To express this fact we consider in this section the arbitrary scaling exponents s(t) := lim inf snkn n→∞
and
s(t) := lim sup snkn ,
(4.18)
n→∞
where snk (k = 0, . . . , 2n − 1, n ∈ N) is any sequence of random variables. To keep a connection with what was said before, think of snk as representing a coarse scaling exponent of Y over the dyadic interval Ikn . 4.3.1. Dimension based spectra A geometric description of the erratic behavior of a multifractal’s scaling exponents can be achieved using a quantification of the prevalence of particular exponents in terms of fractal dimensions as follows: We consider the sets Ka which are defined pathwise in terms of limiting behavior of snkn as n → ∞, as Ea := {t : s(t) = a},
E a := {t : s(t) = a},
Ka := {t : s(t) = a}
(4.19)
These sets Ka are typically “fractal”, meaning loosely that they have a complicated geometric structure and more precisely that their dimensions are non-integer. A compact description of the singularity structure of Y is therefore in terms of the following so-called Hausdorff spectrum d(a) := dim(Ka ), where dim(E) denotes the Hausdorff dimension of the set E [TRI 82].
(4.20)
146
Scaling, Fractals and Wavelets
The sets Ea (a ∈ ) – and also E a – form a multifractal decomposition of the support of Y . We will loosely address Y as a multifractal if this decomposition is rich, i.e. if the sets Ea (a ∈ ) are highly interwoven or even dense in the support of Y . However, the study of singular measures (deterministic and random) has often been restricted to the simpler sets Ka and their spectrum d(a) [KAH 76, CAW 92, FAL 94, ARB 96, OLS 94, RIE 98, RIE 95a, RIE 95b, BAR 97]. With the theory developed here (Lemma 4.2) it becomes clear that most of these results extend to provide formulae for dim(Ea ) and dim(E a ) as well. This aspect of multifractal analysis has been of much interest to the mathematical community. 4.3.2. Grain based spectra An alternative description of the prevalence of singularity exponents, statistical in nature due to the counting involved, is f (a) := lim lim sup ε↓0
n→∞
1 log2 N n (a, ε), n
(4.21)
where1 N n (a, ε) := #{k = 0, . . . , 2n − 1 : a − ε ≤ snk < a + ε}.
(4.22)
This notion has grown out of the difficulty faced by any real world application, that the calculation of actual Hausdorff dimensions is often hard, if not impossible. Using a mesh of given grain size as in (4.22) instead of arbitrary coverings as in dim(Ka ) leads generally to more simple notions. However, f should not be regarded as an auxiliary vehicle but recognized for its own merit, which will become apparent in the remainder of this section. Our first remark on f (a) concerns the fact that the counting used in its definition, i.e. N n (a, ε) may be used to estimate box dimensions. Based on this fact it was shown in [RIE 98] that dim(Ka ) ≤ f (a).
(4.23)
Here, we state a slightly improved version: LEMMA 4.2.– dim(Ea ) ≤ f (a)
dim(E a ) ≤ f (a)
(4.24)
1. More generally, using c-ary intervals in Euclidean space d kn will range from 0 to cnd − 1. Logarithms will have to be taken to the base c since we seek the asymptotics of N n (a, ε) in terms of a power law of resolution at stage n, i.e. N n (a, ε) cnf (a) . The maximum value of f (a) will be d.
Multifractal Scaling: General Theory and Approach by Wavelets
147
and dim(Ka ) ≤ f (a) := lim lim inf ε↓0 n→∞
1 log2 N n (a, ε). n
(4.25)
It follows immediately that dim(Ka ) ≤ dim(Ea ) ≤ f (a), but dim(Ea ) is not necessarily smaller than f (a). 4.3.3. Partition function and Legendre spectrum The second comment regarding the grain spectrum f (a) concerns its interpretation as a large deviation principle (LDP). We may consider N n (a, ε)/2n to be the probability of finding (for a fixed realization of Y ) a number kn ∈ κn := {0, . . . , 2n − 1} such that snkn ∈ [a − ε, a + ε]. Typically, there will be one value of s(t) that appears most frequently, denoted a ˆ, and f (a) will reach its maximum 1 at a=a ˆ. However, by definition, for a = a ˆ the chance to observe coarse exponents snkn which lie in [a − ε, a + ε] will decrease exponentially fast with a rate given by f (a). Appealing to the theory of LDP-s we consider the random variable An = −nsnK ln(2) where K is randomly picked from κn = {0, . . . , 2n − 1} with uniform distribution Un (recall that the we study one fixed realization or path of Y ) and define its “logarithmic moment generating function” or partition function τ (q) := lim inf − n→∞
1 log2 S n (q), n
(4.26)
where n
S (q) :=
n 2 −1
exp (−qnsnk
ln(2)) =
k=0
n 2 −1
' ( n n 2−nqsk = 2n En 2−nqsk .
(4.27)
k=0
Here, En stands for expectation with respect to Un . The Gärtner-Ellis theorem [ELL 84] then applies and yields the following result (see [RIE 95a] for a slightly stronger version): THEOREM 4.1.– If the limit τ (q) = lim − n→∞
1 log2 S n (q) n
(4.28)
exists and is finite for all q ∈ , and if τ (q) is a differentiable function of q, then the double limit f (a) = lim lim
ε↓0 n→∞
1 log2 N n (a, ε) n
(4.29)
148
Scaling, Fractals and Wavelets
exists, in particular f (a) = f (a), and f (a) = τ ∗ (a) := inf (qa − τ (q)) q∈
(4.30)
for all a. Proof. Applying [ELL 84, Theorem II] to our situation immediately gives lim sup n→∞
1 1 log2 N n (a, ε) ≤ lim sup log2 #{k : |snk − a| ≤ ε} ≤ sup τ ∗ (a ) n n→∞ n |a −a|≤ε
and lim inf n→∞
1 1 log2 N n (a, ε) ≥ lim inf log2 #{k : |snk − a| < ε} ≥ sup τ ∗ (a ). n→∞ n n |a −a|<ε
By continuity of τ ∗ (a) these two bounds coincide and (4.29) is established. Now, letting ε → 0 shows that f (a) = τ ∗ (a). Sometimes, the differentiability assumptions of this theorem are too restrictive. Before dwelling more on the relation between τ and f in section 4.4 let us note a simple fact, also providing a simple reason why the Legendre transform appears in this context. LEMMA 4.3.– We always have f (a) ≤ τ ∗ (a).
(4.31)
Proof. Fix q ∈ and consider a with f (a) > −∞. Let γ < f (a) and ε > 0. Then, we take n arbitrarily large such that N n (a, ε) ≥ 2nγ . For such n we bound S n (q) by noting n 2 −1
k=0
n
2−nqsk ≥
n
2−nqsk ≥ N n (a, ε)2−n(qa+|q|ε) ≥ 2−n(qa−γ+|q|ε) (4.32)
|sn k −a|<ε
and hence τ (q) ≤ qa − γ + |q|ε. Letting ε → 0 and γ → f (a), we find τ (q) ≤ qa − f (a). Since this is trivial if f (a) = −∞ we find τ (q) ≤ qa − f (a) and
f (a) ≤ qa − τ (q)
for all a and q.
From this it follows trivially that τ (q) ≤ f ∗ (q) and f (a) ≤ τ ∗ (a).
(4.33)
Multifractal Scaling: General Theory and Approach by Wavelets
149
With the special choice snk = αkn for the distribution function M of a measure μ, S (q) becomes n
S
n
α (q)
=
n 2 −1
n
2 −1 q M (k + 1)2−n − M(k2−n )q = (μ(Ikn )) .
k=0
(4.34)
k=0
This is the original form in which τ (q) has been introduced in multifractal analysis [HAL 86, HEN 83, FRI 85, MAN 74]. Note that there is a close connection to the thermodynamic formalism [TEL 88]. 4.3.4. Deterministic envelopes An analytical approach is often useful in order to gain an intuition on the various spectra of a typical path of Y , or at least some estimate of it. To establish such an approach, we consider the position, i.e. t or kn , as well as the path Y to be random simultaneously. We then apply the LDP to the larger probability space. More precisely, the exponents snK are now random variables over (Ω × κn , PΩ × Un ). The “deterministic partition function” corresponding to this setting reads as 1 T (q) := lim inf − log2 EΩ [S n (q)]. (4.35) n→∞ n NOTE 4.1 (Ergodic processes).– So far, we have assumed in the definitions of τ (q) and T (q) that Y is defined on a compact interval. Without loss of generality, this interval was assumed to be [0, 1]. In order to allow for processes defined on we modify S n (q) to N 2n −1 1 −nqsnk n 2 S (q) := lim N →∞ N k=0
n
and N (a, ε) similarly. For ergodic processes this becomes S n (q) = 2n EΩ [2−nqsk ] almost surely. Thus, EΩ [S n (q)] = S n (q) a.s. and n
a.s.
T (q) = τ (q, ω).
(4.36)
We refer to (4.74) for an account of the extent to which marginal distributions may be reflected in multifractal spectra in general. For processes on [0, 1] we cannot expect to have (4.36) in all generality. Nevertheless, we will point out scenarios where (4.36) holds. Notably T (q) always serves as a deterministic envelope of τ (q, ω): LEMMA 4.4.– With probability one2 τ (q, ω) ≥ T (q) for all q with T (q) < ∞.
2. For clarity, we make the randomness of τ explicit here.
(4.37)
150
Scaling, Fractals and Wavelets
Proof. Consider any q with finite T (q) and let ε > 0. Let n0 be such that EΩ [S n (q)] ≤ 2−n(T (q)−ε) for all n ≥ n0 . Then, n(T (q)−2ε) Sn (q, ω) ≤ E 2n(T (q)−2ε) Sn (q, ω) ≤ 2−nε < ∞. E lim sup 2 n→∞
n≥n0
n≥n0
Thus, almost surely lim supn→∞ 2n(T (q)−2ε) Sn (q, ω) < ∞, and τ (q) ≥ T (q)−2ε. Consequently, this estimate holds with probability one simultaneously for all ε = 1/m (m ∈ N) and some countable, dense set of q values with T (q) < ∞. Since τ (q) and T (q) are always concave due to Corollary 4.2 below (see section 4.4), they are continuous on open sets and the claim follows. Along the same lines we may define the corresponding deterministic grain spectrum. By analogy, we will replace probability over κn = {0, . . . , 2n − 1} in (4.21), i.e. N n (a, ε), by probability over Ω × κn , i.e. n 2 −1
! " PΩ [a − ε ≤ snk < a + ε] = 2n EΩ×κn 1[a−ε,a+ε) (snK )
k=0
(4.38)
= EΩ [N n (a, ε)] and define F (a) := lim lim sup ε↓0
n→∞
1 log2 EΩ [N n (a, ε)] n
(4.39)
Replacing N n (a, ε) with (4.38) in the proof of Theorem 4.1 and taking expectations in (4.32) we find properties analogous to the pathwise spectra τ and f : THEOREM 4.2.– For all a F (a) ≤ T ∗ (a).
(4.40)
Furthermore, under conditions on T (q) analogous to τ (q) in Theorem 4.1 F (a) = T ∗ (a) = F (a) := lim lim inf ε↓0 n→∞
1 log2 EΩ [N n (a, ε)]. n
(4.41)
It follows from Lemma 4.4 that with probability one τ ∗ (a, ω) ≤ T ∗ (a) for all a. Similarly, the deterministic grain spectrum F (a) is an upper bound to its pathwise defined random counterpart f (a, ω), however, only pointwise. On the other hand, we have here almost sure equality under certain conditions. NOTE 4.2 (Negative dimensions).– Defined through counting f (a) as always positive – or −∞. The envelopes T ∗ and F , being defined through expectations of
Multifractal Scaling: General Theory and Approach by Wavelets
151
counts and sums, may assume negative values. Consequently, the negative values of T ∗ and F are not very useful in the estimation of f ; however, they do contain further information and can be “observed”. Negative F (a) and T ∗ (a) have been termed negative dimensions [MAN 90b]. They correspond to probabilities of observing a coarse Hölder exponent a which decays faster than the 2n = #κn “samples” snk available in one realization. Oversampling the process, i.e. analyzing several independent realizations, will increase the number of samples more “rare” snk may be observed. In loose terms, in exp(−n ln(2)F (a)) independent traces we have a fair chance of seeing at least one snk of size a. Thereby, it is essential not to average the spectra f (a) of the various realizations but the numbers N n (a, ε). This way, negative “dimensions” F (a) become visible. 4.4. Multifractal formalism In the previous section, various multifractal spectra were introduced along with some simple relations between them. These can be summarized as follows: COROLLARY 4.1 (Multifractal formalism).– For every a a.s.
dim(Ka ) ≤ dim(Ea ) ≤ f (a) ≤ τ ∗ (a) ≤ T ∗ (a)
(4.42)
where the first relations hold pathwise and the last one (the two terms on both sides of the last inequality) with probability one. Similarly a.s.
dim(Ka ) ≤ f (a) ≤ f (a) ≤ F (a) ≤ T ∗ (a).
(4.43)
The spectra on the left end have stronger implications on the local scaling structure while the ones on the right end are more easy to estimate or calculate. This set of inequalities could fairly be called the “multifractal formalism”. However, in the mathematical community a slightly different terminology is already established which goes “the multifractal formalism holds” and means that for a particular process (or one of its paths, according to context) dim(Ka ) can be calculated using some adequate partition function (such as τ (q)) and taking its Legendre transform. Consequently, when “the multifractal formalism holds” for a path or process, then we often find that equality holds between several or all spectra appearing in (4.42), depending on the context of the formalism that had been established. This property (“the multifractal formalism holds”) is a very strong one and suggests the presence of one single underlying multiplicative structure in Y . This intuition is supported by the fact that the multifractal formalism in known to “hold” up to now only for objects with strong rescaling properties where multiplication is involved such as self-similar measures, products of processes and infinitely divisible
152
Scaling, Fractals and Wavelets
cascades (see [CAW 92, FAL 94, ARB 96, RIE 95a, PES 97, HOL 92], respectively [MAN 02, BAC 03, BAR 04, BAR 02, CHA 05] as well as references therein). A notable exception of processes without injected multiplicative structure are Lévy processes, the multifractal properties of which are well understood due to [JAF 99]. Though we pointed out some conditions for equality between f , τ ∗ and T ∗ we must note that in general we may have strict inequality in some or all parts of (4.42). Such cases have been presented in [RIE 95a, RIE 98]. There is, however, one equality which holds under mild conditions and connects the two spectra in the center of (4.42). THEOREM 4.3.– Consider a realization or path of Y . If the sequence snk is bounded, then τ (q) = f ∗ (q),
for all q ∈ .
(4.44)
Proof. Note that τ (q) ≤ f ∗ (q) from Lemma 4.3. Now, to estimate τ (q) from below, choose a larger than |snk | for all n and k and group the terms in S n (q) conveniently, i.e.
a/ε
S n (q) ≤
n
2−nqsk
i=− a/ε (i−1)ε≤sn k <(i+1)ε
(4.45)
a/ε
≤
N n (iε, ε)2−n(qiε−|q|ε) .
i=− a/ε
Next, we need uniform estimates on N n (a, ε) for various a. Fix q ∈ and let η > 0. Then, for every a ∈ [−a, a] there is ε0 (a) and n0 (a) such that N n (a, ε) ≤ 2n(f (a)+η) for all ε < ε0 (a) and all n > n0 (a). We would like to have ε0 and n0 independent from a for our uniform estimate. To this end note that N n (a , ε ) ≤ N n (a, ε) for all a ∈ [a − ε/2, a + ε/2] and all ε < ε/2. By compactness we may choose a finite set of aj (j = 1, . . . , m) such that the collection [aj − ε0 (aj )/2, aj + ε0 (aj )/2] covers [−a, a]. Set ε1 = (1/2) minj=1,...,m ε0 (aj ) and n1 = maxj=1,...,m n0 (aj ). Then, for all ε < ε1 and n > n1 , and for all a ∈ [−a, a] we have N n (a, ε) ≤ 2n(f (a)+η) and, thus,
a/ε
S n (q) ≤
2−n(qiε−f (iε)−η−|q|ε) (4.46)
i=− a/ε
≤ (2a/ε + 1) · 2−n(f
∗
(q)−η−|q|ε)
.
Letting n → ∞ we find τ (q) ≥ f ∗ (q) − η − |q|ε for all ε < ε1 . Now we let ε → 0 and finally η → 0 to find the desired inequality.
Multifractal Scaling: General Theory and Approach by Wavelets
153
Due to the properties of Legendre transforms3 it follows: COROLLARY 4.2 (Properties of the partition function).– If the sequence snk is bounded, then the partition function τ (q) is concave and monotonous. Consequently, τ (q) is continuous on , and differentiable in all but a countable number of exceptional points. In order to efficiently invert Theorem 4.3 we need: LEMMA 4.5 (Lower semi-continuity of f and F ).– Let am converge to a∗ . Then f (a∗ ) ≥ lim sup f (am )
(4.47)
m→∞
and analogous for F . Proof. For all ε > 0 we can find m0 such that a∗ − ε < am − ε/2 < am + ε/2 < a∗ + ε for all m > m0 . Then, N n (a∗ , ε) ≥ N n (am , ε/2) and E[N n (a∗ , ε)] ≥ E[N n (am , ε/2)]. We find lim sup n→∞
1 1 log2 N n (a∗ , ε) ≥ lim sup log2 N n (am , ε/2) ≥ f (am ) n n→∞ n
for any m > m0 (ε) and similar for F . Now, let first m → ∞ and then ε → 0. COROLLARY 4.3 (Central multifractal formalism).– We always have f (a) ≤ f ∗∗ (a) = τ ∗ (a).
(4.48)
Furthermore, denoting by τ (q±) the right- (resp. left-)sided limits of derivatives we have, f (a) = τ ∗ (a) = qτ (q±) − τ (q±)
at a = τ (q±).
(4.49)
Proof. The graph of f ∗∗ is the concave hull of the graph of f which implies (4.48). It is an easy task to derive (4.49) under assumptions suitable to make the tools of calculus available such as continuous second derivatives. To prove it in general let us first assume that τ is differentiable at a fixed q. In particular, τ (q ) is then finite for q close to q. Since τ (q) = f ∗ (q) there is a sequence am such that τ (q) = limm qam − f (am ). Since τ (q ) ≤ q a − f (a) for all q and a by (4.33), and since τ is differentiable
3. For a tutorial on the Legendre transform see [RIE 99, App. A].
154
Scaling, Fractals and Wavelets
at q this sequence am must converge to a∗ := τ (q). From the definition of am we conclude that f (am ) converges to qa∗ − τ (q). Applying Lemma 4.5 we find that f (a∗ ) ≥ qa∗ − τ (q). Recalling (4.33) implies the desired equality. Now, for an arbitrary q the concave shape of τ implies that there is a sequence of numbers qm larger than q in which τ is differentiable and which converges down to q. Consequently, τ (q+) = limm τ (qm ). Formula (4.49) being established at all qm Lemma 4.5 applies with am = τ (qm ) and a∗ = τ (q+) to yield f (τ (q+)) ≥ qτ (q+) − τ (q+). Again, (4.33) furnishes the opposite inequality. A similar argument applies to τ (q−). COROLLARY 4.4.– If T (q) is finite for an open interval of q-values then |snk | is bounded for almost all paths, and T (q) = F ∗ (q)
for all q.
(4.50)
Moreover, F (a) = T ∗ (a) = qT (±q) − T (±q)
at a = T (±q).
(4.51)
Proof. Assume for a moment that snk is unbounded from above with positive probability. Then, grouping (4.45) requires an additional term collecting the snk > a. In fact, for any number a we can find n arbitrarily large such that snk > a for some k. This implies that for any negative q we have S n (q) ≥ 2−nqa and τ (q) ≤ qa. Letting a → ∞ shows that τ (q) = −∞. By Lemma 4.4 we must have T (q) = −∞, a contradiction. Similarly, we show that snk is bounded from below. The remaining claims can be established analogously to those for τ (q) by taking expectations in (4.45). NOTE 4.3 (Estimation and unbounded moments).– In order to apply Corollary 4.4 in a real world situation, but also for the purpose of estimating τ (q), it is of great importance to possess a method to estimate the range of q-values for which the moments of a stationary process (such as the increments or the wavelet coefficients of Y ) are finite. Such a procedure is proposed in [GON 05] (see also [RIE 04]). 4.5. Binomial multifractals The binomial measure has a long-standing tradition in serving as the paradigm of multifractal scaling [MAN 74, KAH 76, MAN 90a, CAW 92, HOL 92, BEN 87, RIE 95a, RIE 97b]. We present it here with an eye on possible generalizations of use in modeling. 4.5.1. Construction To be consistent in notation we denote the binomial measure by μb and its distribution function by Mb (t) := μb (] − ∞, t[). Note that μb is a measure or
Multifractal Scaling: General Theory and Approach by Wavelets
155
(probability) distribution, i.e. not a function in the usual sense, while Mb is a right-continuous and increasing function by definition. In order to define μb we again use the notation (4.4): for any fixed t there is a unique sequence k1 , k2 , . . . such that the dyadic intervals Iknn = [kn 2−n , (kn +1)2−n [ contain t for all integer n. So, the Ikn form a decreasing sequence of half open intervals n+1 n+1 is the left subinterval of Iknn and I2k which shrink down to {t}. Moreover, I2k n n +1 the right subinterval (see Figure 4.1). Note that the first n elements of such a sequence, i.e. (k1 , k2 , . . . , kn ) are identical for all points t ∈ Iknn . We call this a nested sequence and it is uniquely defined by the value of kn . We set μb (Ikn ) = Mb ((kn + 1)2−n ) − Mb (kn 2−n ) · · · Mk11 · M00 . = Mknn · Mkn−1 n−1
(4.52)
In other words, the mass lying in Iknn is redistributed among its two dyadic n+1 n+1 n+1 n+1 subintervals I2k and I2k in the proportions M2k and M2k . For consistency n n +1 n n +1 n+1 n+1 we require M2kn + M2kn +1 = 1. Having defined the mass of dyadic intervals we obtain the mass of any interval ] − ∞, t[ by writing itas a disjoint union of dyadic intervals J n and noting Mb (t) = μb (] − ∞, t[= n μb (J n ). Therefore, integrals (expectations) with respect to μb can be calculated as g(t)μb (dt) = lim
n→∞
=
g(t)dMb (t) = lim
n→∞
n −1 2
g(k2−n )μb (Ikn )
(4.53)
k=0 n −1 2
g(k2−n ) Mb ((k + 1)2−n ) − Mb (k2−n ) (4.54)
k=0
Alternatively, the measure μb can be defined using its distribution function Mb . Indeed, as a distribution function, Mb is monotone and continuous from the right. Since (4.52) defines Mb in all dyadic points it can be obtained in any other point as the right-sided limit. Note that Mb is continuous at a given point t unless Mknn (t) = 1 for all n large. To generate randomness in Mb , we choose the various Mkn to be random variables. The above properties then hold pathwise. We will make the following assumptions on the multiplier distributions Mkn : i) Conservation of mass. Almost surely for all n and k Mkn is positive and n+1 n+1 + M2k = 1. M2k n n +1
(4.55)
156
Scaling, Fractals and Wavelets
0
M0
0.07
1
0
0.06 0.05
1 M0
.M
0.04
0 0
1
.
0.03
0
M1 M0
0
0.5
1
0.02 0.01 0
. .
2
1
0
M0 M0 M0 0
. .
2
1
Ŧ0.01
0
M1 M0 M0
0.25
. .
2
1
0
M2 M1 M0 0.5
. .
2
1
0
M3 M1 M0 0.75 1
Ŧ0.02 Ŧ0.03 0
2000
4000
6000
8000
Figure 4.2. Iterative construction of the binomial cascade
As we have seen, this guarantees that Mb is well defined. ii) Nested independence. All multipliers of a nested sequence are mutually independent. Analogously to (4.52) we have for any nested sequence EΩ [Mknn · · · M00 ] = EΩ [Mknn ] · · · EΩ [M00 ]
(4.56)
and similar for other moments. This will allow for simple calculations in what follows. iii) Identical distributions. For all n and k M0 if k is even, n fd Mk = M1 if k is odd.
(4.57)
A more general version of iii) was given in [RIE 99] to allow for more flexibility in model matching. The theory of cascades or, more properly, T -martingales4 [KAH 76, BEN 87, HOL 92, BAR 97], provides a wealth of possible generalizations. Most importantly, it allows us to soften the almost sure conservation condition i) to i’) Conservation in the mean EΩ [M0 + M1 ] = 1.
(4.58)
In this case, Mb is well defined since (4.52) forms a martingale due to the nested independence (4.56). The main advantage of such an approach is that we can use unbounded multipliers M0 and M1 such as log-normal random variables. Then, the marginals of the increment process, i.e. μb (Ikn ) are exactly log-normal on all scales. For general binomials, always assuming ii) it can be argued that the marginals μb (Ikn ) are at least asymptotically log-normal by applying a central limit theorem to the logarithm of (4.52).
4. For any fixed t sequence (4.52) forms a martingale due to the nested independence (4.56).
Multifractal Scaling: General Theory and Approach by Wavelets
157
4.5.2. Wavelet decomposition The scaling coefficients of μb using the Haar wavelet are simply
φ∗j,k (t) μb (dt) = 2n/2
Dn,k (μb ) =
(k+1)2−n
μb (dt) = 2n/2 μb (Ikn )
k2−n
(4.59)
from (4.8) and (4.53). With (4.9) and (4.52) we derive the explicit expression for the Haar wavelet coefficients: n+1 n+1 ) − μb (I2k ) 2−n/2 Cn,kn (μb ) = μb (I2k n n +1 n+1 n+1 − M2k ) = (M2k n n +1
n )
Mki i .
(4.60)
i=0
Similar scaling properties hold when using arbitrary, compactly supported wavelets, provided the distributions of the multipliers are scale independent. This comes about from (4.52) and (4.53), which give the following rule for substituting t = 2n t − kn −n/2 2 Cn,kn (μb ) = ψ(2n t − kn )μb (dt) Iknn
= Mknn · · · Mk11 ·
1 0
(4.61) (n,k ) ψ(t )μb n (dt ).
(n,k )
Here μb n is a binomial measure constructed with the same method as μb itself, however, with multipliers taken from the subtree which has its root at the node kn of level n of the original tree. More precisely, for any nested sequence i1 , . . . , im (n,kn )
μb
n+1 n+2 (Iim ) = M2k · M4k · · · M2n+m m k +i . m n +i1 n +i2 n m
(n,k )
From nested independence (4.56) we infer that this measure μb n is independent of Mki i (i = 1, . . . , n). Furthermore, the identical distributions of the multipliers iii) imply that for arbitrary, compactly supported wavelets 1 1 d (n,k ) ψ(t)μb n (dt) = C0,0 (μb ) = ψ(t)μb (dt) (4.62) 0
d
0
where = denotes equality in distribution. In particular, for the Haar wavelet we have 1 d (n,k ) n+1 n+1 Haar ψHaar (t)μb n (dt) = M2k − M2k = M0 − M1 = C0,0 (μb ) (4.63) n n +1 0
158
Scaling, Fractals and Wavelets
(the deterministic analog has also been observed in [BAC 93]). Finally, note that if ψ is supported on [0, 1], then ψ(2n (·) − k) is supported on Ikn . So, the tree of wavelet coefficients Cn,k of μb possess a structure similar to the tree of increments of Mb (compare (4.52)). With a little more effort we calculate the wavelet coefficients of Mb itself, provided ψ is admissible and supported on [0, 1]. Indeed, Mb (t) − Mb (kn 2−n ) = μb ([kn 2−n , t]) (n,kn )
= Mknn · · · Mk11 Mb (n,kn )
where Mb this yields
(n,kn )
(t ) := μb
2−n/2 Cn,kn (Mb ) =
([0, t ]). Using
Iknn
#
(2n t − kn ),
(4.64)
ψ = 0 and substituting t = 2n t − kn
ψ(2n t − kn ) Mb (t) − Mb (kn 2−n ) dt
= 2−n · Mknn · · · Mk11 ·
0
1
(4.65) (n,k ) ψ(t )Mb n (t )dt .
Again, we have 1 d (n,k ) ψ(t)Mb n (dt) = C0,0 (Mb ) = 0
1
ψ(t)Mb (dt)
(4.66)
0
LEMMA 4.6.– Let ψ be a wavelet supported on [0, 1]. Let Mb be a binomial with i)-iii). Then, Cn,kn (μb ) is given by (4.61), and if ψ is admissible then Cn,kn (Mb ) is given by (4.65). Furthermore, (4.62) and (4.66) hold. It is obvious that the dyadic structure present in both the construction of the binomial measure as well as in the wavelet transform are responsible for the simplicity of the calculation above. It is, however, standard by now to extend the procedure to more general multinomial cascades such as Mc , introduced in section 4.5.5 (see [ARB 96, RIE 95a]). 4.5.3. Multifractal analysis of the binomial measure In the light of Lemma 4.6 it becomes clear that the singularity exponent α(t) is most easily accessible for Mb while w(t) is readily available for both, Mb and μb . On the other hand, as increments appear in α(t) they are not well defined for μb . Thus, it is natural to calculate the spectra of both, Mb and μb , with appropriate singularity exponents, i.e. f α,Mb , f w,Mb and f w,μb .
Multifractal Scaling: General Theory and Approach by Wavelets
159
Now, Lemma 4.6 indicates that the singularity structures of μb and Mb are closely related. Indeed, μb is the distributional derivative of Mb in the sense of (4.52) and (4.54). Since taking a derivative “should” simply reduce the scaling exponent by one, we would expect that their spectra are identical up to a shift in a by −1. Indeed, this is true for increasing processes, such as Mb , as we will elaborate in section 4.6.2. However, it has to be pointed out that this rule cannot be correct for oscillating processes. This is effectively demonstrated by the example ta · sin(t−b ) with b > 0. Though this example has the exponent a at zero, its derivative behaves like ta−b−1 there. This is caused by the strong oscillations, also called chirp, at zero. In order to deal with such situations the 2-microlocalization theory has to be employed [JAF 91]. Let us first dwell on the well known multifractal analysis of Mb based on αkn . Recall that Mb ((kn + 1)2−n ) − Mb (kn ) is given by (4.52), and use the nested independence (4.56) and identical distributions (4.57) to obtain E[S
n
α,Mb (q)]
=
n 2 −1
E
'
Mknn
q
q 0 q ( M0 · · · Mk11
kn =0
n ( n i n−i E [M0q ] E [M1q ] · =E i i=0 ' ( n 0 q · (E [M0q ] + E [M1q ]) . = E M0
'
q M00
(4.67)
From this, it follows immediately that T α,Mb (q) = − log2 E [(M0 )q + (M1 )q ] .
(4.68)
Note that this value may be −∞ for some q. THEOREM 4.4.– Assume that i’), ii) and iii) hold. Assume furthermore that M0 and M1 have at least some finite moment of negative order. Then, with probability one dim(Ka ) = f (a) = τ ∗ (a) = T ∗ α,Mb (a)
(4.69)
for all a such that T ∗ α,Mb (a) > 0. Thereby, all the spectra are related to the singularity exponents αkn or hnk of Mb . NOTE 4.4 (Wavelet analysis).– In what follows we will show that we obtain the same spectra for Mb replacing αkn with wkn for certain analyzing wavelets. We will also mention the changes which become necessary when studying distribution functions of measures with fractal support (see section 4.5.5).
160
Scaling, Fractals and Wavelets
Proof. Inspection [BAR 97] we find that dim(Ka ) = T ∗ (a) for αkn under the given assumptions. Earlier results, such as [FAL 94, ARB 96], used more restrictive assumptions but are somewhat easier to read. Though weaker than [BAR 97] they are sufficient in some situations. 4.5.4. Examples Example 1 (β binomial).– Consider multipliers M0 and M1 that follow a β distribution, which has the density cp tp−1 (1 − t)p−1 for t ∈ [0, 1] and 0 elsewhere. Thus, p > 0 is a parameter and cp is a normalization constant. Note that the conservation of mass i) imposes a symmetric distribution since M0 and M1 are set to be equally distributed. The β distribution has finite moments of order q > −p which can be expressed explicitly using the Γ-function. We obtain β-Binomial: T α (q) = −1 − log2
Γ(p + q)Γ(2p) Γ(2p + q)Γ(p)
(q > −p),
(4.70)
and T (q) = −∞ for q ≤ −p. For a typical shape of these spectra, see Figure 4.3. 1.5
1.5
slope=D
slope=q
1 0.5
1
(q,T(q)) (1,T(1))=(1,0)
ŦT*(D)
T (D) o
(0,T(0))=(0,Ŧ1)
Ŧ1
0
q0=0
* (D,T (D))
0.5
*
T(q) o
0 Ŧ0.5
q1=1
ŦT(0)=1
ŦT(1)=0
Ŧ1.5 Ŧ2
Ŧ0.5
ŦT(q)
Ŧ2.5 Ŧ3 Ŧ1
0
1 qo
2
3
Ŧ1 Ŧ0.5
0
0.5
1
1.5
2
2.5
3
D o
Figure 4.3. The spectrum of a binomial measure with β distributed multipliers with p = 1.66. Trivially, T (0) = −1, where the maximum of T ∗ is 1. In addition, every positive increment process has T (1) = 0, where T ∗ touches the bisector. Finally, the LRD parameter is Hvar = (T (2) + 1)/2 = 0.85 (see (4.90) below)
An application of the β binomial for the modeling of data traffic on the Internet can be found in [RIE 99]. Example 2 (Uniform binomial).– As a special case of the β binomial we obtain uniform distributions for the multipliers when setting p = 1. Formula (4.70) simplifies
Multifractal Scaling: General Theory and Approach by Wavelets
161
to T α (q) = −1 + log2 (1 + q) for q > −1. Applying the formula for the Legendre transform (4.51) yields the explicit expression a (4.71) uniform binomial: T ∗ α (a) = 1 − a + log2 (e) + log2 log2 (e) for a > 0 and T ∗ α (a) = −∞ for a ≤ 0.
Example 3 (Log-normal binomial).– Another very interesting case is log-normal distributions for the multipliers M0 and M1 . Note that we have to replace i) with i’) in this case since log-normal variables can be arbitrarily large, i.e. larger than 1. Recall that the log-normal binomial enjoys the advantage of having exactly log-normal marginals μb (Ikn ) since the product of independent log-normal variables is again a log-normal variable. Having mass conservation only in the mean, however, may cause problems in simulations since the sample mean of the process μb (Ikn ) (k = 0, . . . , 2n − 1) is not M00 as in case i), but depends on n. Indeed, the negative (virtual) a appearing in the log-normal binomial spectrum reflects the possibility that the sample average my increase locally (see [MAN 90a]). The calculation of its spectrum starts by observing that the exponential M = eG of a N (m, σ 2 ) variable G, i.e. a Gaussian with mean m and variance σ 2 , has the q-th moment E[M q ] = E[exp(qG)] = exp(qm + q 2 σ 2 /2). Assuming that M0 and M1 are equally distributed as M their mean must be 1/2. Hence m + σ 2 = − ln(2), and σ2 q (4.72) log-normal binomial: T α (q) = (q − 1) 1 − 2 ln(2) for all q ∈ such that E[(Mb (1))q ] is finite. Note that the parabola in (4.72) has two zeros: 1 and qcrit = 2 ln(2)/σ 2 . It follows from [KAH 76] that E[(Mb (1))q ] < ∞ exactly for q < qcrit . Since T (q) is exactly differentiable for q < qcrit we may obtain its Legendre transform implicitly from (4.51) for a = T (q) with q < qcrit , i.e., for all a > acrit = T (qcrit ) = σ 2 /(2 ln(2)) − 1. Eliminating q from (4.51) yields the explicit form 2 σ2 ln(2) a − 1 − T ∗ α (a) = 1 − (a ≥ acrit ) (4.73) 2σ 2 2 ln(2) For a ≤ acrit the Legendre transform yields T ∗ (a) = a · qcrit . Thus, at acrit the spectrum T ∗ crosses over from parabola (4.73) to its tangent through the origin with slope qcrit (the other tangent through the origin is the bisector). It should be remembered that only the positive part of this spectrum can be estimated from one realization of Mb . The negative part corresponds to events so rare that they can only be observed in a large array of realizations (see Note 4.2).
162
Scaling, Fractals and Wavelets
The log-normal framework also allows us to calculate F (a) explicitly, demonstrating which rescaling properties of the marginal distributions of the increment processes of Mb are captured in the multifractal spectra. Indeed, if all ln(Mkn ) are N (m, σ 2 ) then − ln(2) · αkn is N (m, σ 2 /n). The mean value theorem of integration gives ln(2)(−a+ε) 1 (x − m)2 n dx exp − PΩ [|αk − a| < ε] = * 2σ 2 /n 2πσ 2 /n ln(2)(−a−ε) (− ln(2)xa,n − m)2 1 ln(2) · 2ε · exp − =* 2σ 2 /n 2πσ 2 /n with xa,n ∈ [a − ε, a + ε] for all n. Keeping only the exponential term in n and substituting m = −σ 2 − ln(2) we find ln(2) 1 log2 (2n PΩ [|αkn − a| < ε]) 1 − n 2σ 2
σ2 xa,n − 1 − 2 ln(2)
2 .
(4.74)
Comparing with (4.73) we see that T ∗ (a) = F (a), as stated in Theorem 4.2. The above computation shows impressively how well adapted a multiplicative iteration with log-normal multipliers is to multifractal analysis (or vice versa): F extracts, basically, the exponent of the Gaussian kernel. Since the multifractal formalism holds for Mb these features can be measured or estimated using the re-normalized histogram, i.e. the grain based multifractal spectrum f (a). This is a property which could be labeled with the term ergodicity. Note, however, that classical ergodic theory deals with observations along an orbit of increasing length, while f (a) concerns a sequence of orbits. 4.5.5. Beyond dyadic structure We elaborate here generalizations of the binomial cascade. Statistically self-similar measures: a natural generalization of the random binomial, denoted here by Mc , is obtained by splitting intervals Jkn iteratively n+1 n+1 n+1 n , . . . , Jck+c−1 with length |Jck+i | = Ln+1 into c subintervals Jck ck+i |Jk | and n+1 n+1 n mass μc (Jck+i ) = Mck+i μc (Jck ). In the most simple case, we will require mass n+1 n+1 n+1 conservation, i.e. Mck + · · · + Mck+c−1 = 1, but also Ln+1 ck + · · · + Lck+c−1 = 1 which guarantees that μc lives everywhere. Assuming the analogous properties of ii) and iii) to hold for both the length- as well as the mass-multipliers we find that T Mc (q) is the unique solution of ( ' (4.75) E (M0 )q (L0 )−T (q) + · · · + (Mc−1 )q (Lc−1 )−T (q) = 1.
Multifractal Scaling: General Theory and Approach by Wavelets
163
This formula of T (q) can be derived rigorously by taking expectations where appropriate in the proof of [RIE 95a, Prop 14]. Doing so shows, moreover, that T (q) assumes a limit in these examples. Multifractal formalism: it is notable that the multifractal formalism “holds” for the class of statistically self-similar measures described above in Theorem 4.4 (see [ARB 96]). n+1 n n However, if Ln+1 ck + · · · + Lck+c−1 = λ < 1, e.g. choosing Lk = (1/c ) almost surely with c > c, then the measure μc lives on a set of fractal dimension and its distribution function Mc (t) = μc ([0, t)) is constant almost everywhere. In this case, equality in the multifractal formalism will fail: indeed, unless the scaling exponents snk are modified to account for boundary effects caused by the fractal support, the partition function will be unbounded for negative q, e.g. τ α (q) = −∞ for q < 0 (see [RIE 95a]). As a consequence, T α (q) = −∞ and (4.75) is no longer valid for q < 0. Interestingly, the fine spectrum dim(Ka ) is still known, however, due to [ARB 96].
Stationary increments: however, an entirely different and novel way of introducing randomness in the geometry of multiplicative cascades which leads to perfectly stationary increments has been given recently in [MAN 02] and in [BAR 02, BAR 03, BAR 04, MUZ 02, BAC 03, CHA 02, CHA 05, RIE 07b, RIE 07a]. The description of these model is, unfortunately, beyond the scope of this work. Binomial in the wavelet domain: in concluding this section we should mention that, with regard to (4.61), we may choose to directly model the wavelet coefficients of a process in a multiplicative fashion in order to obtain a desired multifractal structure. Some early steps in this direction have been taken in [ARN 98]. 4.6. Wavelet based analysis 4.6.1. The binomial revisited with wavelets The deterministic envelope is the most simple wavelet-based spectra of μb to calculate. Taking into account the normalization factors in (4.12) when using Lemma 4.6, the calculation of (4.67) carries over to give n
S n w,μb (q) = 2nq E [|C0,0 |q ] · (EΩ [M0q ] + EΩ [M1q ]) , and similar for Mb . Provided E [|C0,0 |q ] is finite this immediately gives T w,μb (q) + q = T w,Mb (q) = T α,Mb (q), T
∗
w,μb (a
− 1) = T
∗
w,Mb (a)
=T
∗
α,Mb (a).
(4.76) (4.77)
164
Scaling, Fractals and Wavelets
Imposing additional assumptions on the distributions of the multipliers we may also control wkn (μb ) themselves and not only their moments. To this end, we should be able to guarantee that the wavelet coefficients do not decay too fast (compare (4.10)), i.e. the random factor (4.62) which appears in (4.61) does not become too small. Indeed, it is sufficient to assume that there is some ε > 0 such that |C0,0 (μb )| ≥ ε # (n,k ) almost surely. Then for all t, (1/n) log( ψ(t)μb n (dt)) → 0, and with (4.61) 1 (4.78) wμb (t) = lim inf − log2 2n/2 |Cn,kn | = αMb (t) − 1, n→∞ n and similarly wμb (t) = αMb (t) − 1. Observe that this is precisely the relation we expect between the scaling exponents of a process and its (distributional) derivative – at least in nice cases – and that it is in agreement with (4.77). In summary (first observed for deterministic binomials in [BAC 93]): COROLLARY 4.5.– Assume that μb is a random binomial measure satisfying i)-iii). # # (n,k) (n,k) Assume, that the random variables | ψ(t)μb (dt)| resp. | ψ(t)Mb (t)dt| are uniformly bounded away from 0. Then, the multifractal formalism “holds” for the wavelet based spectra of μb , resp. Mb , i.e. dim(Eaw,μb ) = f w,μb (a) = τ ∗ w,μb (a) = T ∗ w,μb (a),
(4.79)
dim(Eaw,Mb ) = f w,Mb (a) = τ ∗ w,Mb (a) = T ∗ w,Mb (a).
(4.80)
a.s.
a.s.
a.s.
respectively a.s.
a.s.
a.s.
# # (n,k) (n,k) Requiring that | ψ(t)μb (dt)| resp. | ψ(t)Mb (t)dt| should be bounded away from zero in order to insure (4.78), though satisfied in some simple cases, seems unrealistically restrictive to be of practical use. A few comments are in order here. First, this condition can be weakened to arbitrarily allow small values of these integrals, as long as all their negative moments exist. This can be shown by an argument using the Borel-Cantelli lemma. Second, the condition may simplify in two ways. For iid multipliers we know that these integrals are equal in distribution to C0,0 , thus only n = k = 0 has to be checked. Further, for the Haar wavelet and symmetric multipliers, it becomes simply the condition that M0 be uniformly bounded away from zero (see (4.60)), or at least that E[|M0 − 1/2|q ] < ∞ for all negative q. Third, if we drop iii) and allow the multiplier distributions to depend on scale (see # # (n,k) (n,k) [RIE 99]), then | ψ(t)μb (dt)| resp. | ψ(t)Mb (t)dt| has to be bounded away from zero only for large n. In applications such as network traffic modeling we find n+1 n+1 − M2k+1 is best modeled by discrete distributions on [0, 1] that on fine scales M2k with large variance, i.e. without mass around 1/2.
Multifractal Scaling: General Theory and Approach by Wavelets
165
Fourth, another way out is to avoid small wavelet coefficients entirely in a multifractal analysis. More precisely, we would follow [BAC 93, JAF 97] and replace Cn,kn in the definition of wknn (4.12) by the maximum over certain wavelet coefficients “close” to t. Of course, the multifractal formalism of section 4.4 still holds. [JAF 97] gives conditions under which the spectrum τ ∗ w,μb (a) based on this modified wkn agrees with the “Hölder” spectrum dim(Ea ) based on hnk (Mb ). 4.6.2. Multifractal properties of the derivative Corollary 4.5 establishes for the binomial what intuition suggests in general, i.e. that the multifractal spectra of processes and their derivative should be related in a simple fashion – at least for certain classes of processes. As we will show, increasing processes have this property, at least for the wavelet based multifractal spectra. However, the order of Hölder regularity in the sense of the spaces Cth (see Lemma 4.1) might decrease under differentiation by an amount different from 1. This is particularly true in the presence of highly oscillatory behavior such as “chirps”, as the example ta sin(1/t2 ) demonstrates. In order to assess the proper space Cth a 2-microlocalization has to be employed. For good surveys see [JAF 95, JAF 91]. In order to establish a general result on derivatives we place ourselves in the framework whereby we care less for a representation of a process in terms of wavelet coefficients and are interested purely in an analysis of oscillatory behavior. A typical example of an analyzing mother wavelet ψ are the derivatives of the Gaussian kernel exp (−t2 /2) which were used to produce Figure 4.4. The idea is to use integration by parts. For a continuous measure μ on [0, 1] with distribution function M(t) = μ([0, t)) and a continuously differentiable function g this reads as g(t)μ(dt) = lim
n→∞
= lim
n→∞
n −1 2
g(k2−n ) M((k + 1)2−n ) − M(k2−n )
k=0 n −1 2
M(k2−n ) g((k − 1)2−n ) − g(k2−n )
(4.81)
k=0
+ M(1)g(1 − 2−n ) − M(0)g(−2−n ) = M(1)g(1) − M(0)g(0) − M(t)g (t)dt where we alluded to (4.53) and regrouped terms. As a matter of fact, M(0) = 0 and M(1) = 1. A similar calculation can be performed for a more general, not necessarily increasing process Y , provided it has a derivative Y , by replacing μ(dt) with Y (t)dt.
166
Scaling, Fractals and Wavelets
Figure 4.4. Demonstration of the multifractal behavior of a binomial measure μb (left) and its distribution function Mb (right). On the top a numerical simulation, i.e. (4.52) on the left and Mb (k2−n ) on the right for n = 20. In the middle the moduli of a continuous wavelet transform [DAU 92] where the second Gaussian derivative was taken as the analyzing wavelet ψ(t) for μb , resp. the third derivative ψ for Mb . The dark lines#indicate the “lines of maxima” [JAF 97, BAC 93], i.e. the locations where the modulus of ψ(2j t − s)μb (dt) has a local maximum as a function of s with j fixed. On the bottom a multifractal analysis in three steps. First, a plot of log S n w (q) against n tests for linear behavior for various q. Second, the partition function τ (q) is computed as the slopes of a least square linear fit of log S n . Finally, the Legendre transform τ ∗ (a) of τ (q) is calculated following (4.49). Indicated with dashes in the plots of τ (q) and τ ∗ (a) of μb are the corresponding function for Mb , providing empirical evidence for (4.76), (4.77), and (4.83)
Now, setting g(t) = 2n/2 ψ(2n t − k) for a smooth analyzing wavelet ψ we have g (t) = 23n/2 ψ (2n t − k) and obtain Cn,k (ψ, μ) = 2n/2 ψ(2n − k) − 2n · Cn,k (ψ , M).
(4.82)
Estimating 2n − kn = 2n − t2n (1 − t)2n and assuming exponential decay of ψ(t) at infinity allows us to conclude w(t)ψ,μ = −1 + w(t)ψ ,M ,
(4.83)
and similarly to w(t). COROLLARY 4.6.– f ψ,μ (a) = f ψ ,M (a + 1)
τ ∗ ψ,μ (a) = τ ∗ ψ ,M (a + 1)
(4.84)
This is impressively demonstrated in Figure 4.4. We should note that ψ has one more vanishing moment than ψ which is easily seen by integrating by parts. Thus, it
Multifractal Scaling: General Theory and Approach by Wavelets
167
is natural to analyze the integral of a process, here the distribution function M of the measure μ, using ψ since the degree of the Taylor polynomials typically grows by 1 under integration. NOTE 4.5 (Visibility of singularities and regularity of the wavelet).– It is notable that the Haar wavelet yields the full spectra of the binomial Mb (and also of its distributional derivative μb ). This fact is in some discord with the folklore saying that a wavelet cannot detect degrees of regularity larger than its own. In other words, a signal will rarely be more regular than the basis elements it is composed of. To resolve the apparent paradox, recall the peculiar property of multiplicative measures which is to have constant Taylor# polynomials. So, it will reveal its # scaling structure to any analyzing wavelet with ψ = 0. No higher regularity, i.e. tk ψ(t)dt = 0 is required. The correct reading of the literature is indeed, that wavelets are only guaranteed to detect singularities smaller than their own regularity. 4.7. Self-similarity and LRD The statistical self-similarity as expressed in (4.1) makes FBM, or rather its increment process, a paradigm of long range dependence (LRD). To be more explicit let δ denote a fixed lag and define fractional Gaussian noise (FGN) as G(k) := BH ((k + 1)δ) − BH (kδ).
(4.85)
Possessing the LRD property means that the auto-correlation rG (k) := EΩ [G(n + k)G(n)] decays so slowly that k rG (k) = ∞. The presence of such strong dependence bears an important consequence on the aggregated processes G(m) (k) :=
1 m
(k+1)m−1
G(i).
(4.86)
i=km
They have a much higher variance, and variability, than would be the case for a short range dependent process. Indeed, if X is a process with iid values X(k), then X (m) (k) has variance (1/m2 ) var(X0 + · · · + Xm−1 ) = (1/m) var(X). For G we find, due to (4.1) and BH (0) = 0, H 1 m BH (mδ) = var BH (δ) var(G(m) (0)) = var m m (4.87) = m2H−2 var (BH (δ)) . Indeed, for H > 1/2 this expression decays much slower than 1/m. As is shown in [COX 84] var(X (m) ) m2H−2 is equivalent to rX (k) k 2H−2 and so, G(k) is indeed LRD for H > 1/2. Let us demonstrate with FGN how to relate LRD with multifractal analysis based only on the fact that it is a zero-mean processes, not (4.1). To this end let
168
Scaling, Fractals and Wavelets
δ = 2−n denote the finest resolution we will consider, and let 1 be the largest. For m = 2i (0 ≤ i ≤ n) the process mG(m) (k) becomes simply BH ((k + 1)mδ) − BH (kmδ) = BH ((k + 1)2i−n ) − BH (k2i−n ). However, the second moment of this expression – which is also the variance – is exactly what determines T α (2). More precisely, using stationarity of G and substituting m = 2i , we obtain ' ( ! " 2 −1 EΩ S n−i α (2) = EΩ |mG(m) (k)|2 n−i
−(n−i)T α (2)
2
= 2n−i 22i var G(2
i
)
(4.88)
k=0
.
This should be compared with the definition of the LRD parameter H using var(G(m) ) m2H−2
or
i
var(G(2 ) ) 2i(2H−2) .
(4.89)
At this point a conceptual difficulty arises. Multifractal analysis is formulated in the limit of small scales (i → −∞) while LRD is a property for large scales (i → ∞). Thus, the two exponents H and T α (2) can in theory only be related when assuming that the scaling they represent is actually exact at all scales, and not only asymptotically. In any real world application, however, we will determine both H and T α (2) by finding a scaling region i ≤ i ≤ i in which (4.88) and (4.89) hold up to satisfactory precision. Comparing the two scaling laws in i yields T α (2) + 1 − 2 = 2H − 2, or H=
T α (2) + 1 . 2
(4.90)
This formula expresses most pointedly how multifractal analysis goes beyond second order statistics: with T (q) we capture the scaling of all moments. The relation (4.90), here derived for zero-mean processes, can be put on more solid grounds using wavelet estimators of the LRD parameter [ABR 95] which are more robust than the estimators through variance. The same formula (4.90) also reappears for certain multifractals (see (4.100)). In this context it is worthwhile pointing forward to (4.96), from which we conclude that T BH (q) = qH − 1 if q > −1. The fact to note here is that FBM requires indeed only one parameter to capture its scaling while multifractal scaling, in principle, is described by an array of parameters T (q). 4.8. Multifractal processes The most prominent examples where we find coinciding, strictly concave multifractal spectra are the distribution functions of cascade measures [MAN 74,
Multifractal Scaling: General Theory and Approach by Wavelets
169
KAH 76, CAW 92, FAL 94, ARB 96, OLS 94, HOL 92, RIE 95a, RIE 97b, PES 97] for which dim(Ka ) and T ∗ (a) are equal and have the form of a ∩ (see Figure 4.3 and also 4.5(e)). These cascades are constructed through a multiplicative iteration scheme such as the binomial cascade, which is presented in detail earlier in this chapter with special emphasis on its wavelet decomposition. Having positive increments, this class of processes is, however, sometimes too restrictive. FBM, as noted, has the disadvantage of a poor multifractal structure and does not contribute to a larger pool of stochastic processes with multifractal characteristics. It is also notable that the first “natural”, truly multifractal stochastic process to be identified was the Lévy motion [JAF 99]. This example is particularly appealing since scaling is not injected into the model by an iterative construction (this is what we mean by the term natural). However, its spectrum is degenerative, though it shows a non-trivial range of scaling exponents h(t), in the sense that it is linear. 4.8.1. Construction and simulation With the formalism presented here, the stage is set for constructing and studying new classes of truly multifractional processes. The idea, to speak in Mandelbrot’s own words, is inevitable after the fact. The ingredients are simple: a multifractal “time warp’, i.e. an increasing function or process M(t) for which the multifractal formalism is known to hold, and a function or process V with strong monofractal scaling properties such as fractional Brownian motion (FBM), a Weierstrass process or self-similar martingales such as Lévy motion. We then form the compound process V(t) := V (M(t)).
(4.91)
To fix the ideas, let us recall the method of midpoint displacement which can be used to define a simple Brownian motion B1/2 which we will also call the Wiener motion (WM) for a clear distinction from FBM. This method constructs B1/2 iteratively at dyadic points. Having constructed B1/2 (k2−n ) and B1/2 ((k + 1)2−n ) we define B1/2 ((2k + 1)2−n−1 ) as (B1/2 (k2−n ) + B1/2 ((k + 1)2−n ))/2 + Xk,n . The offsets Xk,n are independent zero-mean Gaussian variables with variance such as to satisfy (4.1) with H = 1/2, hence the name of the method. One way to obtain Wiener motion in multifractal time WM(MF) is then to keep the offset variables Xk,n as they are but to apply them at the time instances tk,n defined by tk,n = M−1 (k2−n ), i.e. M(tk,n ) = k2−n : B1/2 (t2k+1,n+1 ) :=
B1/2 (tk,n ) + B1/2 (tk+1,n ) + Xk,n . 2
(4.92)
This amounts to a randomly located random displacement, the location being determined by M. Indeed, (4.91) is nothing but a time warp. An alternative construction of “warped Wiener motion” WM(MF) which yields equally spaced sampling, as opposed to the samples B1/2 (tk,n ) provided by (4.92), is
170
Scaling, Fractals and Wavelets
desirable. To this end, note first that the increments of WM(MF) become independent Gaussians once the path of M(t) is realized. To be more precise, fix n and let G(k) := B((k + 1)2−n ) − B(k2−n ) = B1/2 (M(k + 1)2−n ) − B1/2 (M(k2−n )).
(4.93)
For a sample path of G we start by producing first the random variables M(k2−n ). Once this is done, the G(k) are simply independent zero-mean Gaussian variables with variance |M((k + 1)2−n ) − M(k2−n )|. This procedure has been used in Figure 4.5. 4.8.2. Global analysis To calculate the multifractal envelope T (q) we need only to know that V is an H-sssi process, i.e. that the increment V (t + u) − V (t) is equal in distribution to uH V (1) (see (4.1)). Assuming independence between V and M, a simple calculation reads as EΩ
n −1 2
V (k + 1)2−n − V k2−n q
k=0
=
n 2 −1
k=0
=
n 2 −1
q ! E E V M (k + 1)2−n − V M k2−n M k2−n , " M (k + 1)2−n
(4.94)
' q " qH ( ! E V (1) . E M (k + 1)2−n − M k2−n
k=0
With little more effort the increments |V((k + 1)2−n ) − V(k2−n )| can be replaced n by suprema, i.e. by 2−nhk , or even certain wavelet coefficients under appropriate assumptions (see [RIE 88]). It follows that ! " T M (qH) if EΩ | sup0≤t≤1 V (t)|q < ∞ (4.95) Warped H-sssi: T V (q) = −∞ otherwise. Simple H-sssi process: when choosing the deterministic warp time M(t) = t we have T M (q) = q − 1 since S n M (q) = 2n · 2−nq for all n. Also, V = V . We obtain T M (qH) = qH − 1 which has to be inserted into (4.95) to obtain ! " qH − 1 if EΩ | sup0≤t≤1 V (t)|q < ∞ (4.96) Simple H-sssi: T V (q) = −∞ otherwise. 4.8.3. Local analysis of warped FBM Let us now turn to the special case where V is FBM. Then, we use the term FB(MF) to abbreviate fractional Brownian motion in multifractal time:
Multifractal Scaling: General Theory and Approach by Wavelets
171
B(t) = BH (M(t)). First, to obtain an idea of what to expect from the spectra of B, let us note that the moments appearing in (4.95) are finite for all q > −1 (see [RIE 88, lem 7.4] for a detailed discussion). Applying the Legendre transform easily yields ∗ (a/H). T ∗ B (a) = inf (qa − TM (qH)) = TM
(4.97)
q
(a)
(d)
50
1
40
0.8
30
0.6
20
0.4
10
0.2
0 0
0
0.2
0.4
0.6
0.8
1
Ŧ0.5
Ŧ0.4
Ŧ0.3
Ŧ0.2
Ŧ0.1
0
0.1
time lag
0.2
0.3
0.4
0.5
(b) (e)
1 0.8
0
0.6
Ŧ0.1
0.4
Ŧ0.2
0.2 0 0
0.2
0.4
0.6
0.8
1
Ŧ0.3 Ŧ0.4
(c)
Ŧ0.5
1.5
Ŧ0.6
1
Ŧ0.7
0.5
Ŧ0.8
0
Ŧ0.9
Ŧ0.5 0
0.2
0.4
0.6
time
0.8
1
Ŧ1 0
0.5
1
1.5
2
a
Figure 4.5. Left: simulation of Brownian motion in binomial time (a) sampling of Mb ((k + 1)2−n ) − Mb (k2−n ) (k = 0, . . . , 2n − 1), indicating distortion of dyadic time intervals, (b) Mb ((k2−n )): the time warp, (c) Brownian motion warped with (b): B(k2−n ) = B1/2 (Mb (k2−n )) Right: estimation of dim EaB using τ ∗ w,B , (d) empirical correlation of the Haar wavelet coefficients, (e) dot-dashed: T ∗ Mb (from theory), dashed: T ∗ B (a) = T ∗ Mb (a/H) Solid: the estimator τ ∗ w,B obtained from (c). (Reproduced from [GON 99])
Second, towards the local analysis we recall the uniform and strict Hölder continuity of the paths of FBM5 which reads roughly as sup |B(t + u) − B(t)| = sup |BH (M(t + u)) − BH (M(t))|
|u|≤δ
|u|≤δ
sup |M(t + u) − M(t)|H . |u|≤δ
5. For a precise statement see Adler [ADL 81] or [RIE 88, Theorem 7.4].
172
Scaling, Fractals and Wavelets
This is the key to concluding that BH simply squeezes the Hölder regularity exponents by a factor H. Thus, hB (t) = H · hM (t), etc. and M = KaB , Ka/H
and, consequently, analogous to (4.97), dB (a) = dM (a/H). Figure 4.5(d)-(e) displays an estimation of dB (a) using wavelets which agrees very closely with the form dM (a/H) predicted by theory (for statistics on this estimator see [GON 99, GON 98]). In conclusion: COROLLARY 4.7 (Fractional Brownian motion in multifractal time).– Let BH denote FBM of Hurst parameter H. Let M(t) be of almost surely continuous paths and independent of BH . Then, the multifractal warp formalism ∗ (a/H) dim(KaB ) = f B (a) = τ ∗ B (a) = T ∗ B (a) = TM
(4.98)
holds for B(t) = BH (M(t)) for any a such that the multifractal formalism holds for M ) = T ∗ M (a/H). M at a/H, i.e., for which dim(Ka/H This means that the local, or fine, multifractal structure of B captured in dim(KaB ) on the left can be estimated through grain based, simpler and numerically more robust spectra on the right side, such as τ ∗ B (a) (compare Figure 4.5 (e)). “Warp formula” (4.98) is appealing since it allows us to separate the LRD parameter of FBM and the multifractal spectrum of the time change M. Indeed, provided that M is almost surely increasing, we have T M (1) = 0 since S n (0) = M(1) for all n. Thus, T B (1/H) = 0 reveals the value of H. Alternatively, the tangent at T ∗ B through the origin has slope 1/H. Once H is known, T ∗ M follows easily from T ∗ B . Simple FBM: when choosing the deterministic warp time M(t) = t we have B = BH and T M (q) = q − 1 since S n M (q) = 2n · 2−nq for all n. We conclude that T BH (q) = qH − 1
(4.99)
for all q > −1. This confirms (4.90) for FGN. With (4.98) it shows that all spectra of FBM consist of the one point (H, 1) only, making the monofractal character of this process most explicit.
Multifractal Scaling: General Theory and Approach by Wavelets
173
4.8.4. LRD and estimation of warped FBM Let G(k) := B((k + 1)2−n ) − B(k2−n ) be FGN in multifractal time (see (4.93) for the case H = 1/2). Calculating auto-correlations explicitly shows that G is second order stationary under mild conditions with HG =
T M (2H) + 1 . 2
(4.100)
Let us discuss some special cases. For example, in a continuous, increasing warp time M, we have always T M (0) = −1 and T M (1) = 0. Exploiting the concave shape of T M we find that H < H G < 1/2 for 0 < H < 1/2, and 1/2 < H G < H for 1/2 < H < 1. Thus, multifractal warping cannot create LRD and it seems to weaken the dependence as measured through second order statistics. Especially in the case of H = 1/2 (“white noise in multifractal time”) G(k) becomes uncorrelated. This follows from (4.100). Notably, this is a different statement from the observation that the G(k) are independently conditioned on M (see section 4.8.1). As a particular consequence, wavelet coefficients will decorrelate fast for the entire process G, not only when conditioning on M (see Figure 4.5(d)). This is favorable for estimation purposes as it reduces the error variance. Of greater importance, however, is the warning that the vanishing correlations should not lead us to assume the independence of G(k). After all, G becomes Gaussian only lead us to assume that we know M. A strong, higher order dependence in G is hidden in the dependence of the increments of M which determine the variance of G(k) as in (4.93). Indeed, Figure 4.5(c) shows clear phases of monotony of B indicating positive dependence in its increments G, despite vanishing correlations. Mandelbrot calls this the “blind spot of spectral analysis”. 4.9. Bibliography [ABR 95] A BRY P., G ONÇALVES P., F LANDRIN P., “Wavelets, spectrum analysis and 1/f processes”, in A NTONIADIS A., O PPENHEIM G. (Eds.), Lecture Notes in Statistics: Wavelets and Statistics, vol. 103, p. 15–29, 1995. [ABR 00] A BRY P., F LANDRIN P., TAQQU M., V EITCH D., “Wavelets for the analysis, estimation and synthesis of scaling data”, Self-similar Network Traffic and Performance Evaluation, John Wiley & Sons, 2000. [ADL 81] A DLER R., The Geometry of Random Fields, John Wiley & Sons, New York, 1981. [ARB 96] A RBEITER M., PATZSCHKE N., “Self-similar random multifractals”, Math. Nachr., vol. 181, p. 5–42, 1996. [ARN 98] A RNEODO A., BACRY E., M UZY J., “Random cascades on wavelet dyadic trees”, Journal of Mathematical Physics, vol. 39, no. 8, p. 4142–4164, 1998.
174
Scaling, Fractals and Wavelets
[BAC 93] BACRY E., M UZY J., A RNEODO A., “Singularity spectrum of fractal signals from wavelet analysis: exact results”, J. Stat. Phys., vol. 70, p. 635–674, 1993. [BAC 03] BACRY E., M UZY J., “Log-infinitely divisible multifractal processes”, Comm. in Math. Phys., vol. 236, p. 449–475, 2003. [BAR 97] BARRAL J., Continuity, moments of negative order, and multifractal analysis of Mandelbrot’s multiplicative cascades, PhD thesis no. 4704, Paris-Sud University, 1997. [BAR 02] BARRAL J., M ANDELBROT B., “Multiplicative products of cylindrical pulses”, Probability Theory and Related Fields, vol. 124, p. 409–430, 2002. [BAR 03] BARRAL J., “Poissonian products of random weights: Uniform convergence and related measures”, Rev. Mat. Iberoamericano, vol. 19, p. 1–44, 2003. [BAR 04] BARRAL J., M ANDELBROT B., “Random multiplicative multifractal measures, Part II”, Proc. Symp. Pures Math., AMS, Providence, RI, vol. 72, no. 2, p. 17–52, 2004. [BEN 87] B EN NASR F., “Mandelbrot random measures associated with substitution”, C. R. Acad. Sc. Paris, vol. 304, no. 10, p. 255–258, 1987. [BRO 92] B ROWN G., M ICHON G., P EYRIERE J., “On the multifractal analysis of measures”, J. Stat. Phys., vol. 66, p. 775–790, 1992. [CAW 92] C AWLEY R., M AULDIN R.D., “Multifractal decompositions of Moran fractals”, Advances Math., vol. 92, p. 196–236, 1992. [CHA 02] C HAINAIS P., R IEDI R., A BRY P., “Compound Poisson cascades”, Proc. Colloque “Autosimilarité et Applications” Clermont-Ferrand, France, May 2002, 2002. [CHA 05] C HAINAIS P., R IEDI R., A BRY P., “On non-scale invariant infinitely divisible cascades”, IEEE Trans. Information Theory, vol. 51, no. 3, p. 1063–1083, 2005. [COX 84] C OX D., “Long-range dependence: a review”, Statistics: An Appraisal, p. 55–74, 1984. [CUT 86] C UTLER C., “The Hausdorff dimension distribution of finite measures in Euclidean space”, Can. J. Math., vol. 38, p. 1459–1484, 1986. [DAU 92] DAUBECHIES I., Ten Lectures on Wavelets, SIAM, New York, 1992. [ELL 84] E LLIS R., “Large deviations for a general class of random vectors”, Ann. Prob., vol. 12, p. 1–12, 1984. [EVE 95] E VERTSZ C.J.G., “Fractal geometry of financial time series”, Fractals, vol. 3, p. 609–616, 1995. [FAL 94] FALCONER K.J., “The multifractal spectrum of statistically self-similar measures”, J. Theor. Prob., vol. 7, p. 681–702, 1994. [FEL 98] F ELDMANN A., G ILBERT A.C., W ILLINGER W., “Data networks as cascades: Investigating the multifractal nature of Internet WAN traffic”, Proc. ACM/Sigcomm 98, vol. 28, p. 42–55, 1998. [FRI 85] F RISCH U., PARISI G., “Fully developed turbulence and intermittency”, Proc. Int. Summer School on Turbulence and Predictability in Geophysical Fluid Dynamics and Climate Dynamics, p. 84–88, 1985.
Multifractal Scaling: General Theory and Approach by Wavelets
175
[GON 98] G ONÇALVES P., R IEDI R., BARANIUK R., “Simple statistical analysis of wavelet-based multifractal spectrum estimation”, Proc. 32nd Asilomar Conf. on Signals, Systems and Computers, Pacific Grove, CA, Nov. 1998. [GON 99] G ONÇALVES P., R IEDI R., “Wavelet analysis of fractional Brownian motion in multifractal time”, Proceedings of the 17th Colloquium GRETSI, Vannes, France, September 1999. [GON 05] G ONÇALVES P., R IEDI R., “Diverging moments and parameter estimation”, J. Amer. Stat. Assoc., vol. 100, no. 472, p. 1382–1393, December 2005. [GRA 83] G RASSBERGER P., P ROCACCIA I., “Characterization of strange attractors”, Phys. Rev. Lett., vol. 50, p. 346–349, 1983. [HAL 86] H ALSEY T., J ENSEN M., K ADANOFF L., P ROCACCIA I., S HRAIMAN B., “Fractal measures and their singularities: the characterization of strange sets”, Phys. Rev. A, vol. 33, p. 1141–1151, 1986. [HEN 83] H ENTSCHEL H., P ROCACCIA I., “The infinite number of generalized dimensions of fractals and strange attractors”, Physica D, vol. 8, p. 435–444, 1983. [HOL 92] H OLLEY R., WAYMIRE E., “Multifractal dimensions and scaling exponents for strongly bounded random cascades”, Ann. Appl. Prob., vol. 2, p. 819–845, 1992. [JAF 91] JAFFARD S., “Pointwise smoothness, two-microlocalization coefficients”, Publicacions Mathematiques, vol. 35, p. 155–168, 1991.
and
wavelet
[JAF 95] JAFFARD S., “Local behavior of Riemann’s function”, Contemporary Mathematics, vol. 189, p. 287–307, 1995. [JAF 97] JAFFARD S., “Multifractal formalism for functions, Part 1: Results valid for all functions”, SIAM J. of Math. Anal., vol. 28, p. 944–970, 1997. [JAF 99] JAFFARD S., “The multifractal nature of Lévy processes”, Prob. Th. Rel. Fields, vol. 114, p. 207–227, 1999. [KAH 76] K AHANE J.-P., P EYRIÈRE J., “Sur Certaines Martingales de Benoit Mandelbrot”, Adv. Math., vol. 22, p. 131–145, 1976. [LV 98] L ÉVY V ÉHEL J., VOJAK R., “Multifractal analysis of Choquet capacities: preliminary results”, Adv. Appl. Math., vol. 20, p. 1–34, 1998. [LEL 94] L ELAND W., TAQQU M., W ILLINGER W., W ILSON D., “On the self-similar nature of Ethernet traffic (extended version)”, IEEE/ACM Trans. Networking, p. 1–15, 1994. [MAN 68] M ANDELBROT B.B., N ESS J.W.V., “Fractional Brownian motion, fractional noises and applications”, SIAM Reviews, vol. 10, p. 422–437, 1968. [MAN 74] M ANDELBROT B.B., “Intermittent turbulence in self similar cascades: divergence of high moments and dimension of the carrier”, J. Fluid. Mech., vol. 62, p. 331, 1974. [MAN 90a] M ANDELBROT B.B., “Limit lognormal multifractal measures”, Physica A, vol. 163, p. 306–315, 1990. [MAN 90b] M ANDELBROT B.B., “Negative fractal dimensions and multifractals”, Physica A, vol. 163, p. 306–315, 1990.
176
Scaling, Fractals and Wavelets
[MAN 97] M ANDELBROT B.B., Fractals and Scaling in Finance, Springer, New York, 1997. [MAN 99] M ANDELBROT B.B., “A multifractal walk down Wall Street”, Scientific American, vol. 280, no. 2, p. 70–73, February 1999. [MAN 02] M ANNERSALO P., N ORROS I., R IEDI R., “Multifractal products of stochastic processes: construction and some basic properties”, Advances in Applied Probability, vol. 34, no. 4, p. 888–903, December 2002. [MUZ 02] M UZY J., BACRY E., “Multifractal stationary random measures and multifractal random walks with log-infinitely divisible scaling laws”, Phys. Rev. E, vol. 66, 2002. [NOR 94] N ORROS I., “A storage model with self-similar input”, Queueing Systems, vol. 16, p. 387–396, 1994. [OLS 94] O LSEN L., “Random geometrically graph directed self-similar multifractals”, Pitman Research Notes Math. Ser., vol. 307, 1994. [PES 97] P ESIN Y., W EISS H., “A multifractal analysis of equilibrium measures for conformal expanding maps and Moran-like geometric constructions”, J. Stat. Phys., vol. 86, p. 233–275, 1997. [PEY 98] P EYRIÈRE J., An Introduction to Fractal Measures and Dimensions, Paris, 11th Edition, k 159, 1998, ISBN 2-87800-143-5. [RIB 06] R IBEIRO V., R IEDI R., C ROUSE M.S., BARANIUK R.G., “Multiscale queuing analysis of long-range-dependent network traffic”, IEEE Trans. Networking, vol. 14, no. 5, p. 1005–1018, October 2006. [RIE 88] R IEDI R.H., “Multifractal processes”, in D OUKHAN P., O PPENHEIM G., TAQQU M.S. (Eds.), Long Range Dependence: Theory and Applications, p. 625–715, Birkhäuser 2002, ISBN: 0817641688. [RIE 95a] R IEDI R.H., “An improved multifractal formalism and self-similar measures”, J. Math. Anal. Appl., vol. 189, p. 462–490, 1995. [RIE 95b] R IEDI R.H., M ANDELBROT B.B., “Multifractal formalism for infinite multinomial measures”, Adv. Appl. Math., vol. 16, p. 132–150, 1995. [RIE 97a] R IEDI R.H., L ÉVY V ÉHEL J., “Multifractal properties of TCP traffic: a numerical study”, Technical Report No 3129, INRIA Rocquencourt, France, February, 1997, see also: L ÉVY V ÉHEL J., R IEDI R.H., “Fractional Brownian motion and data traffic modeling”, in Fractals in Engineering, p. 185–202, Springer, 1997. [RIE 97b] R IEDI R.H., S CHEURING I., “Conditional and relative multifractal spectra”, Fractals. An Interdisciplinary Journal, vol. 5, no. 1, p. 153–168, 1997. [RIE 98] R IEDI R.H., M ANDELBROT B.B., “Exceptions to the multifractal formalism for discontinuous measures”, Math. Proc. Cambr. Phil. Soc., vol. 123, p. 133–157, 1998. [RIE 99] R IEDI R.H., C ROUSE M.S., R IBEIRO V., BARANIUK R.G., “A multifractal wavelet model with application to TCP network traffic”, IEEE Trans. Info. Theory, Special issue on multiscale statistical signal analysis and its applications, vol. 45, p. 992–1018, April 1999.
Multifractal Scaling: General Theory and Approach by Wavelets
177
[RIE 00] R IEDI R.H., W ILLINGER W., “Toward an improved understanding of network traffic dynamics”, in PARK K., W ILLINGER W. (Eds.), Self-similar Network Traffic and Performance Evaluation, p. 507–530, Wiley, 2000. [RIE 04] R IEDI R.H., G ONÇALVES P., Diverging moments, characteristic regularity and wavelets, Rice University, Dept. of Statistics, Technical Report, vol. TR2004-04, August 2004. [RIE 07a] R IEDI R.H., G ERSHMAN D., “Infinitely divisible shot-noise: modeling fluctuations in networking and finance”, Proceedings ICNF 07, Tokyo, Japan, September 2007. [RIE 07b] R IEDI R.H., G ERSHMAN D., Infinitely divisible shot-noise, Report, Dept. of Statistics, Rice University, TR2007-07, August 2007. [TEL 88] T EL T., “Fractals, multifractals and thermodynamics”, Z. Naturforsch. A, vol. 43, p. 1154–1174, 1988. [TRI 82] T RICOT C., “Two definitions of fractal dimension”, Math. Proc. Cambr. Phil. Soc., vol. 91, p. 57–74, 1982. ´ J., Wavelets and Subband Coding, Prentice-Hall, ˘ C [VET 95] V ETTERLI M., KOVA CEVI Englewood Cliffs, NJ, 1995.
This page intentionally left blank
Chapter 5
Self-similar Processes
5.1. Introduction 5.1.1. Motivations Invariance properties constitute the basis of major laws in physics. For example, conservation of energy results from invariance of these laws compared with temporal translations. Mandelbrot was the first to relate scale invariance to complex objects and the outcome was coined “fractals”. Using the concept of scale invariance, different notions of fractal dimension can be discussed. A particular class of complex objects presenting scale invariance is that of random medium, on which we mainly focus here. Let us begin with the example of percolation (see, for example, [GRI 89]). On a regular network, some connections are randomly and abruptly removed. The resulting network itself is random and it contains “cracks”, “bottlenecks”, “cul-de-sac”, etc. However, a regularity of statistical nature is often seen. For example, let us think of a network of spins on Z2 at critical temperature (see [GUY 94]): an “island” of + signs will be found within a “lake” of − signs, which itself is an island, etc. At each scale, we statistically see “the same thing”. Over this mathematical medium, physicists imagine the circulation of a fluid (or particles) and are hence interested, for example, in the position Xn of a particle after n time steps or in statistical characteristics such as its average position EXn . Using symmetry, this average position is often zero. Therefore, we will study the
Chapter written by Albert B ENASSI and Jacques I STAS.
180
Scaling, Fractals and Wavelets
corresponding standard deviation EXn2 . Let us consider a case where, for large n: EXn2 ∼ σ 2 n2H with σ 2 the variance. When we have H = 12 , the random walk Xn is of Brownian nature. It is said to be abnormal and overdiffusive (or underdiffusive) when H > 12 (respectively H < 12 ). ), with n 0, Let us now consider the case when the dilated random walk ( X(λn) λH is statistically indistinguishable from the initial walk (Xn ), n 0: X(λn) L , n 0 = (Xn ), n 0 (5.1) λH The walk is then said to be self-similar, when the equality in law (5.1) is valid for all λ > 0. A comprehensive survey of random walk on fractal media, orientated toward physicists, can be found in [HAV 87]. A more mathematically grounded framework can be read in [BAR 95]. Apart from the framework of random walk, reasons for which a physical quantity possesses invariance with a power law are generally very tricky to discover. Indications of physical nature can be found in [HER 90], particularly in Duxburg’s contribution, which elaborates a scale renormalization theory for crack dynamics. In [DUB 97], a number of contributions in various fields such as financial markets, avalanches, metallurgy, etc. provide illustrations of scale invariance. In particular, [DUR 97] proposes an analysis of invariance phenomena in avalanches, which is both experimental and theoretical. One major source of inspiration for the definition and study of the property of scale invariance, is that of hydrodynamic turbulence – more precisely, Kolmogorov’s work (see, for example, [FRI 97]) which, from the basis of Richardson’s work on the energy cascades, established the famous − 53 law, in 1941, based on the modeling of energy transfers in turbulent flows. This theory provides a powerful means to define the stochastic self-similar processes and to study their properties. The reader is referred to [FRI 97] and references therein. Scale invariance, or self-similarity, sometimes leads to a correlation property referred to as long-range dependence. Generally, for processes satisfying (5.1), increments have a power law correlation decrease, a slow decline that indicates long-term persistence. Mandelbrot and van Ness [MAN 68] popularized fractional Brownian motion, historically introduced in [KOL 40], precisely to model long-range correlation. These processes had since an extraordinary success and numerous extensions provide quantities of generally identifiable models. Let us briefly describe the article [WIL 98], in which the authors give a “physical” theory of fractional
Self-similar Processes
181
Brownian motion. A typical machine is either active, or inactive; the durations of activity and inactivity are independent. An infinity of machines is then considered. At the time tT , we consider all the active machines – more exactly, the fluctuations around the average number of active machines at time tT . Then, we renormalize in T to obtain the law of this phenomenon. The set of all durations of active machines at a given time is distributed in a rather similar way to that of the balls of the distribution model of non-renormalized mass which we will study later. The reader is also directed to Chapter 12. The goal of this chapter is to present a set of stochastic processes which have partial self-similarity and stationarity properties. Unfortunately, we cannot claim to present an exhaustive study – moreover, the available space prohibits it. We had to make choices. We preferred to challenge the reader by asking him or her questions whose answers appeared to us as surprising. The intention is to show that the concept of scale invariance remains, to a great extent, misunderstood. We thus hope, with what has been said before, centered on physics, and which we will present, to have opened paths which other researchers will perhaps follow. Trees will be used here as the leading path to scale invariance. It seemed to us that such a simple geometric structure, with such great flexibility, is well-adapted to the study of self-similarity. In order to become familiarized with trees and invariance by dilation and translation, we begin our presentation with a study of purely geometric scaling, where trees and spaces are mixed. From this, we present random or non-random fractals with scale invariance. This leads us to the model of mass distribution. This provides a convenient means to generate a quantity of processes, stochastic or not, with scale and translation invariance. It is remarkable that, through a suitable wavelet decomposition, all the self-similar stochastic processes with stationary increments relate to a “layer-type” model, except perhaps for Takenaka’s process. These models of scale invariance enable us to question the difference between two concepts of equal importance: long-range correlation and sample path regularity. From examples, we show that these two concepts are independent. This chapter consists of four sections. The first is mostly an introduction. The second clarifies the Gaussian case, a quasi-solved problem. The third section turns to some non-Gaussian cases, mostly that of stable processes. The last section studies correlation and regularity from defective examples. Generally, certain technical difficulties, sometimes even major ones, are overlooked. Consequently, the results may give the impression of lack of rigor, while returning to the original work is a necessity.
182
Scaling, Fractals and Wavelets
5.1.2. Scalings 5.1.2.1. Trees To understand the geometric aspects of scaling, we will mainly use trees. Let us start by studying the interval [0, 1[. Let q 2 be a real number and Aq = {0, 1, . . . , q − 1}. Let us note by Tq (respectively Tq ) the unilateral set of sequences (respectively bilateral) (a1 , a2 , . . . , an , . . .) (respectively (. . . , a−n , . . . , a−1 , a0 , a1 , . . . , an , . . .)) where an take values in Aq . We will denote by a a sequence (a1 , a2 , . . . , an , . . .) and an the finite sub-sequence (a1 , a2 , . . . , an ). We can consider the set Tq from various points of view: – Tq is a set of real numbers of [0, 1[ written with base q: x=
+∞ ak (x)
qk
k=1
Let us recall that this decomposition is unique except for a countable set of real numbers; – Tq is the q-ary tree. Each father has q sons. This tree is provided with a root or ancestor, denoted by , that has no antecedent and is associated with the empty sequence. The peaks of Tq are finite sequences an , with n 1, and the ridges are couples (an−1 , an ), with n 2; – Tq is the set of q-adic cells {Δnk , 0 k n}, with Δnk = [k/q n , (k + 1)/q n [. The lexicographic order makes it possible to associate each finite sequence an with only one of the q n cells Δnk . The (lexicographic) order number of this cell will be noted by kn (an ).
0 I−1 1 *:I 1 0
Figure 5.1. Coding from cell [0.5, 0.75]
Self-similar Processes
183
1 I−1 1 *:I 0
Figure 5.2. Coding from cell [0.0, 0.5]
5.1.2.2. Coding of R This section may seem complex, but is actually not difficult: it amounts to extending the previous coding to R. Let a be an infinite branch of the tree Tq . For n 0, let us dilate R by q n , then let us perform a translation by kn (an ) so as to move cell Δnkn (an ) , multiplied by q n , in coincidence with [0, 1[. The Tq (an ) tree, of root an , allows us to code all the q-adic cells included in [−2n , 2n ], with their position in R:
k (5.2) Δm , −m n < +∞, −kn (an ) k q m+n − kn (an ) When we have n → +∞, Tq (an ) extending by itself, this tends towards the ˜ and complete q-adic tree Tq , which has been provided with a particular bilateral way a ˜, ˜ ). This triplet leads us to code all q-adic cells of R. Another a root ˜ , noted by (Tq , a possibility is to base the analysis on the decomposition of any arbitrary real numbers in base q, as previously done for the interval [0, 1[. This coding can be extended to Rd . The reader is referred to [BEN 00] for more details. 5.1.2.3. Renormalizing Cantor set of T in Let E be a set of [0, 1]. With the set E, we can associate a sub-tree TE,q the following way: in any x ∈ E, we connect the branch of a(x).
Let us now assume that E is the triadic Cantor set. The natural choice is q = 3. Up to countable sets, E is the real set that admit no 1 in their decomposition in base 3. . As previously, Thus, set E is simply defined by a condition on the branches of TE,3 by means of dilation and translation, we define, from Cantor set E, a set E on R: in other words, E is the set of real numbers that admit no 1 in their decomposition in base 3. This set E is, of course, invariant by dilation of a factor 3p , with p ∈ Z. However, it is not invariant for other factors, as can be easily verified, for example with 2.
184
Scaling, Fractals and Wavelets
5.1.2.4. Random renormalized Cantor set We build a uniform law on the set of binary sub-trees of ternary tree T3 . Following the intuition behind the construction of traditional Cantor set, let us define the random compact K(T ) by: def
K(T ) =
+
Δ(b)
(5.3)
n0 b∈Tn
where Δ(b) is the single triadic cell connected to branch b ∈ Tn . Then, we observe that K(T, a) is the renormalized set K(T ) along the branch a as we did previously. Therefore, we verify that the law of K(T, a) is equal to the law of 3p K(T, a), with p ∈ Z. We will say that K(T, a) is semi-self-similar, preposition semi indicating that the renormalization factors for which K(T, a) remains, in invariant form, a strict sub-set of R, namely, the multiplicative sub-group of the powers of 3. K(T, a) law is stable using translation by integers. Combining translations by integers and multiplications by powers of 3, we find that, for any decimal d (in base 3), the law ofK(T, a) is equal to the law of K(T, a) + d: there is an invariance under translation. This will be referred to as a quasi-stationarity property, stationary being used only when invariance is achieved for all translation parameters. 5.1.3. Distributions of scale invariant masses Inspired by the preceding construction, we now propose that of a stationary scale invariant phenomenon. More precisely, we aim at building a random measure M (dx) verifying the following properties of stationarity and semi-self-similarity associated with a sub-group G multiplicative of R+ : L
M (dx − y) = M (dx), ∀y ∈ Rd L
M (λdx) = λH M (dx), ∀λ ∈ G
(stationary)
(5.4a)
(H-semi-self-similarity)
(5.4b)
In sections 5.1.3.1 and 5.1.3.2 two brief examples are presented. 5.1.3.1. Distribution of masses associated with Poisson measures Let Pn denote an infinity of independent Poisson measures on Rd , with intensity chosen as the Lebesgue measure and identical parameter. Let (xni , i ∈ In ) be a realization of Pn indexed by the set In . Let us denote by B the ball of Rd with center 0 and radius 1. Let us define the measure M0 by its density m(x): def
m(x) =
n0
2−nH
i∈In
1B (2n x − xni ).
(5.5)
Self-similar Processes
185
xn
hence, to point 2ni , we allotted a mass proportional to 2−n(H+d) , since the proportionality coefficient is equal to the volume of B. The contribution of these masses at scale n is proportional to 2−nH per volume unit. 5.1.3.2. Complete coding If we define a Cantor set on [0, 1[ only, the resulting set is not stable by dilation of a factor 3. We must define this set on R to make it stable by dilation of any unspecified power of 3. In the same way, the measure M0 defined above cannot be stable by dilation of a factor 2. However, the approach used as for Cantor sets can be adopted. We outline it only briefly. For any n 0, we define the measure Mn par Mn (dx) = 2−nH (M0 (x + 2−n ) − M0 (x))(dx). As distributions, this sequence Mn converges slightly towards M . Thus, we verify that M is semi-self-similar for the multiplicative sub-group of powers of 2. It is also stationary. Let us note that our construction seems to ascribe a specific role with the number 2. This is not the case and we can replace 2 by any b > 0 in (5.5). We then obtain the semi-self-similar measure for the multiplicative sub-group of powers of b. 5.1.4. Weierstrass functions With Weierstrass functions, we have a deterministic distribution model of mass with properties analog to equation (5.4). If b > 1 and 0 < H < 1, for x ∈ R, Weierstrass functions Wb,H are defined as (see [WEI 72]): def b−nH sin(bn x). (5.6) Wb,H (x) = n∈Z
We can easily verify the semi-self-similar property: Wb,H (bx) = bH Wb,H (x) We should note that the preceding constructions, intended to expand Cantor sets and renormalized sums of Poisson measures on R, have their match on Weierstrass functions by writing: 0 (x) = b−nH sin(bn x) Wb,H n0 0 and noticing that we have limp→+∞ bpH Wb,H (b−p x) = Wb,H (x).
The question that naturally arises is: are there probabilistic models which are self-similar and stationary? The traditional answer is positive, provided the stationarity condition is replaced by a stationary of the increments condition. Therefore, we proceed with the introduction of self-similar stochastic processes whose increments are stationary.
186
Scaling, Fractals and Wavelets
5.1.5. Renormalization of sums of random variables In this section, we present the results of Lamperti’s article [LAM 62] on self-similar process obtained as renormalization limits of other stochastic processes. Let us first recall Lamperti’s definition of a “semi-stable”1 stochastic process. DEFINITION 5.1.– A stochastic process X(x), with x ∈ R, is called semi-stable if, for any a > 0, there is a renormalization function b(a) > 0 such that: L X(ax), x ∈ R = b(a)X(x), x ∈ R When the function b(a) is of the form aH , the process X is called self-similar. Lamperti’s fundamental result shows that the possible choices for the renormalization function b(a) is actually limited. THEOREM 5.1 ([LAM 62, Theorem 1, p. 63]).– Any stochastic semi-stable process is self-similar. From now on, we must note that this result is not in contradiction with the existence of locally self-similar2 processes (see [BEN 98]). Let X and Y be two real stochastic processes indexed by Rd . If we assume hypothesis R, there exists a function f : R+ → R+ such that: X(ax) L d , x ∈ R = Y (x), x ∈ Rd lim a→+∞ f (a) Moreover, let us recall that a function L is a slowly varying function if, for any y > 0, we obtain limx→+∞ L(xy) L(x) = 1. THEOREM 5.2 ([LAM 62, Theorem 2, p. 64]).– Let X and Y be two stochastic processes such that there is a function f for which the hypothesis R is satisfied. Then, f necessarily has the following structure, with H > 0: f (a) = aH L(a) where L is a slowly varying function. 1. Not to be confused with the definition of a stable process. 2. These processes are presented in Chapter 6.
Self-similar Processes
187
As illustrations of this theorem, we offer the two traditional examples: – Brownian motion: X(x) =
[x]
ξk ,
f (n) =
√ n
k=1
where the ξk are independent Bernoulli on {−1, 1}. Brownian motion can be defined as a limit of X(nx) f (n) when n → +∞; – Lévy’s symmetric α-stable motion: X(x) =
[x]
ξk ,
1
f (n) = n α
k=1
where the ξk are independent identically distributed, stable, symmetric, random variables (see Chapter 14 in [BRE 68]). Lévy’s symmetric α-stable motion can be defined as a limit of X(nx) f (n) when n → +∞. Thus, stochastic self-similar processes appear as natural limits of renormalization procedures. Theorem 5.8 provides another example, which is neither Gaussian nor stable. 5.1.6. A common structure for a stochastic (semi-)self-similar process We now wish to propose a unified structure for the known (semi-)self-similar processes. The basic ingredients are as follows: – a self-similar parameter H > 0; – a “vaguelette” type basis (see [MEY 90]); – a sequence of independent and identically distributed random variables. Let Ed = {0, 1}d be the set of binary sequences of length d and Ed = Ed − {1, 1, . . . , 1}. Let us note by Λd the set of{(n, k), n ∈ Z, k ∈ Zd }. Let us nd observe that ψλ (x) = 2 2 ψ u (2n x − k), with x ∈ Rd and λ = (n, k, u), where n is the scale parameter, k the localization parameter and u the orientation parameter: if we have u = 0, φu is the mother wavelet, if we have u = 1, φu is the father wavelet. This dilation and translation structure, in fact, uses a structure of subjacent binary trees. Function ψ is assumed to rapidly decrease at infinity, and be null and Lipschitzian in zero. Let us observe that ξλ , with λ = (n, k, u) a random variable sequence such that, for any n, n , k, k : L
ξn ,k ,u = ξn,k,u
188
Scaling, Fractals and Wavelets
Let us then define the process X by: def 2−nH ψλ (x)ξλ X(x) =
(5.7)
λ
The process is semi-self-similar for the multiplicative group of power 2. Again, the number 2 does not play a crucial role. Later on, we will see that Brownian fractional motions have this structure. We observe that Weierstrass functions return to this framework by supposing that ψ(x) = sin(x)1[0,2π] (x). In fact: b−nH ψ(bn x − 2kπ) Wb,H (x) = n∈Z,k∈Z
The question of parameter identifiability for a semi-self-similar process is natural. We will be dealing with it in the next section. 5.1.7. Identifying Weierstrass functions 5.1.7.1. Pseudo-correlation Subject to existence, let us define the pseudo-correlation function of a deterministic function f by (see [BAS 62]): T 1 def f (x)f (x + τ ) dx γf (·) (τ ) = lim T →+∞ 2T −T A function is said to be pseudo-random if limτ →+∞ γf (·) (τ ) = 0, and pseudo-stationary when γf (·−r) = γf (·) for any r. It can be shown that Weierstrass functions are pseudo-random and pseudo-stationary. The example of Weierstrass functions shows that a semi-self-similar phenomenon is not solely determined by the self-similar parameter H. Is it possible to identify the parameters that generate these phenomena? We will see later that generally it is possible. To conclude this introduction, we will be focusing on Weierstrass functions, whose identification requires general tools, although their demonstration is elementary. To this end, let us introduce the quadratic variations of a function f : 2 N −1 k k+1 1 N −f f V (f ) = N N N k=0
Let us define: RN =
VN 1 2 log2 2 VN
By using the pseudo-random character of Weierstrass functions, we can show that RN measures H, when N → +∞.
Self-similar Processes
189
5.2. The Gaussian case As always, the Gaussian case is the best understood. The structure of stochastic, Gaussian, self-similar processes, also possessing stationary increments, is well-known. In his article [DOB 79], Dobrushin presents contemporary results, including his own works, in a definitive style and in the framework of generalized stochastic processes. Here, we give a “stochastic processes” version. 5.2.1. Self-similar Gaussian processes with r-stationary increments To describe Dobrushin’s results, we need some notations. 5.2.1.1. Notations Let Rd , d 1, be the usual Euclidean space; with x = (x1 , . . . , xd ), and d d |x| = 1 x2k . Let xy = 1 xk yk denote the scalar product of vectors x and y. 2
Let k = (k1 , . . . , kd ) ∈ Nd be a multi-index of length |k| = k1 + . . . + kd . We ∂ k1 ) ◦ . . . ◦ ( ∂x∂ d )kd . define Dk = ( ∂x 1 Let f be a function of Rd in R. Let TF (f ) denote its Fourier transform and TF −1 (f ) its inverse Fourier transform. When f is a distribution, the same notations are used for the (inverse) Fourier transform. Let us recall that, for k ∈ N: TF (Dk f )(ξ) = i|k| ξ k TF (f )(ξ) Let us denote by S(Rd ) (or S where there is no ambiguity) the Schwartz space of functions C ∞ with rapid decrease and rapid decrease derivatives; S (Rd ) (or S ) then denotes the space of moderate distributions. Let T ∈ S . The translation operator τk def is defined as #τk T, φ$ = #T, τ−k φ$ for φ ∈ S, with τ−k φ(x) = φ(x + k). Let f be a function of Rd in R and n an integer. Let f ⊗n denote a function on def (Rd )n defined as f ⊗n (x1 , . . . , xn ) = (f (x1 ), . . . , f (xn )). Finally, for k ∈ Rd , let us introduce the translation operator τk f ⊗n = (τk f (x1 ), . . . , τk f (xn )). 5.2.1.2. Definitions In this section, the same notations are used for distributions and functions. DEFINITION 5.2.– Let (X(x)), with x ∈ Rd , be a (possibly) generalized stochastic process: – X is said to be stationary if, for any integer n and any h ∈ Rd : L
τh X ⊗n = X ⊗n
190
Scaling, Fractals and Wavelets
– let r be a non-zero integer. X is said to possess stationary r-increments if Dk X is stationary for any k such that |k| = r; – let r be a non-zero integer and H > 0. X is said to be (r, H) self-similar if there is a polynomial P of degree r so that P (D)X is self-similar with parameter H. NOTE 5.1.– X is said to have stationary increments if it is 1-stationary. X is said to be self-similar with parameter H if it is (1, H) self-similar. 5.2.1.3. Characterization THEOREM 5.3 ([DOB 79, Theorem 3.2, p. 9]).– Let X be a Gaussian (r, H) self-similar process, with H < r, r-stationary increments and a polynomial P . Then, there is a function S of Rd in R, on the unit sphere Σd , such that, for all functions φ of S: P (iξ)TF (φ)(ξ) TF (W )(dξ) #X, P (D)φ$ = d d R |ξ| 2 +H S ξ |ξ| Let us now introduce the pseudo-differential3 operator L, with symbol ρ(ξ) = ξ S( |ξ| ): |ξ| d 2 +H
Lf (x) =
Rd
TF (f )(ξ)ρ(ξ)eixξ dξ
(5.8)
Then, the following weak stochastic differential equation can be deduced (see [BEN 97]): LX = W ◦ where W ◦ is a Gaussian white noise. Now, let us give some examples. 5.2.2. Elliptic processes Let L be the pseudo-differential operator defined in (5.8). L is called elliptic if two constants 0 < a A < +∞ exist such that a S A on the sphere Σd . By analogy, the corresponding process X will be called elliptic [BEN 97]. Generally, self-similar Gaussian processes (r, H) with r-stationary increments, with 0 < H < 1, admit the following representation. 3. See [MEY 90] for traditional results on operators.
Self-similar Processes
191
THEOREM 5.4 ([BEN 97]).– Let X be a self-similar Gaussian (r, H) self-similar process, with r-stationary increments, with 0 < H < 1: – X admits the following harmonic representation: X(x) =
r−1 (ixξ)k k=0 k! TF (W )(dξ) d ξ +H 2 |ξ| S |ξ|
eixξ −
Rd
– X is the unique solution of the following stochastic elliptic differential equation: LX = W ◦ Q(D)X(0) = 0 for any Q such that d◦ Q < r As a particular case, we can mention the harmonic representation [MAN 68] of fractional Brownian motion of parameter H: r = 1 and S ≡ 1: BH (t) =
eixξ − 1
R
1
|ξ| 2 +H
TF (W )(dξ)
5.2.3. Hyperbolic processes DEFINITION 5.3.– The operator L defined in (10.8) is called hyperbolic if its symbol d 1 is of the form ρ(ξ) = i=1 |ξi |Hi + 2 . THEOREM 5.5 (FRACTIONAL B ROWNIAN SHEET [LEG 99]).– Fractional Brownian sheet, defined as: X(x) =
d ixi ξi ) e −1
Rd i=1
1
|ξi |Hi + 2
TF (W )(dξ)
satisfies the following equality:
X(λ1 x1 , . . . , λd xd )
Rd
L
=
d )
i X(x1 , . . . , xd ) Rd λH i
i=1
COROLLARY 5.1.– When λ1 = · · · = λd and H = H1 + · · · + Hd , the hyperbolic process X obtained is self-similar with parameter H, with H between 0 and d. In contrast with the elliptic case, H > 1 is hence allowed, though the Brownian fractional sheet is non-derivable.
192
Scaling, Fractals and Wavelets
5.2.4. Parabolic processes Let A be a pseudo-differential operator of dimension n − 1, i.e., its symbol is a function of Rn−1 in R. Let L be the pseudo-differential operator of dimension n, whose symbol is a function of R × Rn−1 in R and defined by L = ∂t − A. Let us consider the stochastic differential equation LX = W ◦ . By analogy with the classification of operators, X is said to be parabolic. The most prominent example is the Ornstein-Uehlenbeck process: t e−(t−s)A W (ds, dxy) OU (t, x) = 0
Rd
The operator ∂t is renormalized with a factor 12 ; the operator A can be renormalized with an arbitrary factor. Generally, a parabolic process is not self-similar. 5.2.5. Wavelet decomposition In this section, we expand Gaussian self-similar processes on a wavelet basis, which hence also constitutes a basis for the self-reproducing Hilbert space of the process4. 5.2.5.1. Gaussian elliptic processes Let ψ u , with u ∈ Ed , be a Lemarié-Meyer generating system [MEY 90]. Let ψλ , with λ ∈ Λd , be the generated orthonormal basis of wavelets. Let us assume that X verifies hypotheses and notations, as in Theorem 5.4. Let us define φu , with u ∈ Ed , with the harmonic representation: k r−1 eixξ − k=0 (ixξ) u k! TF (ψ u )(dξ) φ (x) = d ξ +H Rd 2 |ξ| S |ξ| Let us then define the associated family of wavelets φλ , with λ ∈ Λd . THEOREM 5.6 ([BEN 97]).– There is a sequence of normalized Gaussian normal random 2D variables ηλ such that: 2−j(r−1+H) ηλ φλ (x) X(x) = λ∈Λd
If X is self-similar in the usual sense, then this decomposition is a renormalized distribution of mass, as defined in section 5.1.3.
4. See [NEV 68] for self-reproducing Hilbert spaces.
Self-similar Processes
193
5.2.5.2. Gaussian hyperbolic process THEOREM 5.7.– With the same notations as those of Theorem 5.6, we obtain the decomposition: 2−(n1 H1 +···+nd Hd ) φλ1 (x1 ) × · · · × φλd (xd )ηλ1 ,...,λd X(x) = (λ1 ,...,λd )∈(Λ1 )d
where the sequence ηλ1 ,...,λd consists of normalized Gaussian random 2D variables. Hyperbolic processes enable us to model multiscale random structures, with preferred directions. 5.2.6. Renormalization of sums of correlated random variable Let BH denote fractional Brownian motion of parameter H. Let us consider the increments of size h > 0: Xk = h−H (BH ((k + 1)h) − BH (kh)). The following properties are well-known: – X0 is a normalized Gaussian random variable; – the sequence (Xk ) is stationary; – a law of large numbers can be written as: 1 2 Xk = 1 n→+∞ n + 1 n
lim
(a.s.)
(5.9)
k=0
Difficulties only start when trying to estimate the convergence speed in (5.9). Before exhibiting the results of these questions, let us give an outline of the Rosenblatt process and law. The Rosenblatt law is defined by its characteristic data function, which can be found in [TAQ 75, p. 299]. We can define a stochastic process (ZD (t))t>0 called the Rosenblatt process whose law in every moment is a Rosenblatt law of parameter D. We can find the functional characteristic of (ZD (t1 ), . . . , ZD (tk )), with k 1, in [TAQ 75]. 5.2.7. Convergence towards fractional Brownian motion 5.2.7.1. Quadratic variations Many statistical estimation procedures are based on quadratic variations. It is hence useful to recall quadratic variations of fractional Brownian motion. THEOREM 5.8.– Let:
k+1 k − BH Xk,N = N H BH N N
194
Scaling, Fractals and Wavelets
be the renormalized increments of a fractional Brownian motion of order H. The following results are proven in [GUY 89, TAQ 75]: – when 0 < H < 34 : √
[N t]
N
2 (Xk,N − 1)
k=0
converges in law, when N → +∞, towards σH BH (t), where BH is fractional Brownian motion of order H; – for 34 < H < 1:
[N t]
N
2(1−H)
2 (Xk,N − 1)
k=0
converges in law, when we have N → +∞, towards a Rosenblatt process Z(1−H) (t). Theorem 5.8 admits a generalization to functions G of L2 (R, μ), where μ is the 2 Gaussian density (2π)−1/2 exp(− x2 ). It is known (see, for example, [NEV 68]) that form an orthonormal basis for L2 (R, μ). Hermite’s polynomials Hk , with k 0, Let us expand G on this basis: G(x) = k0 gk Hk (x). If we have g0 = g1 = 0 and g2 = 0, Theorem 5.8 remains valid – up to changes in the variances of the limit processes (see [GUY 89, TAQ 75]). Theorem 5.8 thus provides examples of non-Gaussian self-similar process with stationary increments. Other examples are discussed later. 5.2.7.2. Acceleration of convergence Instead of standard increments Xk,N , let us now consider the second order increments: k+1 k k−1 H − 2BH + BH Yk,N = N σ BH N N N where σ is chosen such that Yk,N is of variance 1. Then, the following result can be obtained (cf. [IST 97]). THEOREM 5.9.– Quantity: √
[N t]
N
2 (Yk,N − 1)
k=0
converges in law, when N → +∞, towards a fractional Brownian motion BH of order H. The frontier H = 34 disappears, thanks to the introduction of the generalized variations Yk,N .
Self-similar Processes
195
5.2.7.3. Self-similarity and regularity of trajectories It is generally admitted in the literature, following the works by Mandelbrot, that self-similarity necessarily goes with sample path irregularity, for example, of Hölderian type. From a simple example, we now show that such an association does not hold in general. To construct such a process, let us start with an infinitely derivable function φ, with compact support and such that there exists a neighborhood of 0 which is not included in the support of φ. Let us then define a stochastic process as: φ(tξ) TF (W )(dξ) (5.10) X(t) = 1 +H R |ξ| 2 for which the following original result can be obtained. THEOREM 5.10.– X, as defined in (5.10), is a zero-mean Gaussian process that possesses the following properties: – X is self-similar with parameter H ∈ R; – the trajectories of X, for t = 0, are infinitely derivable. Let us mention that X does not have stationary increments. It is precisely this loss of increment stationarity that allows the regularity of trajectories. 5.3. Non-Gaussian case 5.3.1. Introduction In this section, we study processes whose laws are either subordinate to the law of the Brownian measure (cf. Dobrushin [DOB 79]), or symmetric α-stable laws (hereafter SαS). Samorodnitsky and Taqqu’s book [SAM 94] is one of the most prominent reference tools for stable processes and is largely used here. Two classes of process are studied: processes represented by moving averages and those defined by harmonizable representation. These two classes are not equivalent (cf. Theorem 5.11 below). Censov’s process, and a variant, i.e. Takenaka processes, are also analyzed. Let us mention that Takenaka processes do not belong to either of the two aforementioned classes. However, all these processes have a point in common: they are elliptic, i.e., they are the solutions of a stochastic elliptic equation of noninteger degree. Ellipticity makes it possible to expand such processes on appropriate wavelet basis, as in [BEN 97]. In the Gaussian case, this decomposition is of Karhunen-Loeve type. Therefore, the question, which is still unanswered is: are the SαS, self-similar
196
Scaling, Fractals and Wavelets
processes, with stationary increments, elliptic? Finally, ellipticity enables us to construct a wide variety of self-similar processes with stationary increments subordinated to the Brownian measure. 5.3.2. Symmetric α-stable processes 5.3.2.1. Stochastic measure Let d be a non-zero integer and μ a measure on Rd × E. A stochastic measure Mα on Rd is SαS with control measure μ if, for any p < α, there is a constant cp,α such that, for any test function φ, we obtain (see [SAM 94, section 10.3.3]): αp p α |φ(x)| μ(dx) (5.11) E|Mα (φ)| = cp,α Rd
Like previous notations, the measure Mα above possesses the following properties: L
– stationarity. For any x ∈ Rd : τx (Mα ) = Mα ; L
– unitarity. Let Ux f (y) = eixy f (y). For any x ∈ Rd : Ux (Mα ) = Mα ; L
d
– homogenity. For any λ > 0: rλ M = λ α M ; – symmetry: L
−Mα = Mα NOTE 5.2.– Taken as distributions, let TF (Mα ) be the Fourier transform of Mα . If Mα is stationary, TF (Mα ) is unitary. If we have α > 1 and if Mα is d/α-homogenous, then TF (Mα ) is d/α -homogenous, with α1 + α1 = 1. 5.3.2.2. Ellipticity Let Q and S be two functions of the unit sphere Σd−1 of Rd in R+ . Let Mα be a stochastic measure and μ be the Lebesgue measure on Rd . Let us consider, when they exist, the following stochastic processes: d d x−y H− α H− α − |y| |x − y| Q Q(y) Mα (dy) X(x) = |x − y| Rd (5.12) eixξ − 1 (dξ) Z(x) = M α d +H S ξ Rd |ξ| α |ξ| THEOREM 5.11 ([SAM 94], p. 358).– Processes X and Z are defined whenever 0 < H < 1 and 0 < α < 2. So, they have stationary increments and are self-similar of order H. There exists no pair (Q, S) such that X and the real part of Z have proportional laws.
Self-similar Processes
197
NOTE 5.3.– When α = 2, X and Z processes can have proportional laws for conveniently chosen couples (Q, S), particularly for Q = S ≡ 1. Let us define the operator τd,α by: 1 x d (ξ) = |x|H− α Q ξ d +H |x| α |ξ| τd,α S |ξ| where the Fourier transform is taken as distributions. Then, the following Plancherel formula holds. THEOREM 5.12 ([BEN 99]).– Let 0 < α < 1. There then exists two constants c and c such that the processes defined in (5.12) admit the following harmonizable representation: eixξ − 1 (5.13) X(x) = c ξ TF (Mα )(dξ) d +H τd,α S |ξ| Rd |ξ| α ' ( d d −1 −1 |x − y|H− α τd,α Z(x) = c S(x − y) − |y|H− α τd,α S(y) TF (Mα )(dy) Rd
Let Θ be a symmetric function of Σd−1 in R+ . Let us consider the symbol: d ξ +H β ρβ,Θ (ξ) = |ξ| Θ |ξ| and the corresponding pseudo-differential operator: Lβ,Θ f (x) = ρβ,Θ (ξ)TF (f )(ξ)eixξ dξ
(5.14)
Rd
THEOREM 5.13 ([BEN 99]).– Processes X and Z of Theorem 5.11 are the unique solutions of the following systems: Lα ,Θ X = Mα
(a.s.)
X(0) = 0 with Θ = τd,α S and: Lα ,S Z = TF (Mα )
(a.s.)
Z(0) = 0 NOTE 5.4.– Generally, the sample paths of X and Z are not continuous, hence the definition of X(0) and Z(0) in average.
198
Scaling, Fractals and Wavelets
5.3.3. Censov and Takenaka processes Let us now consider two examples of self-similar processes with stationary increments, which are neither Gaussian nor stable. Let us consider the affine space Ed of dimension d. Let Vt , t ∈ Rd , denote the set of hyperplans separating the origin 0 from t. Let Nα stand for a SαS measure, with control measure dσ(s)dρ, where dσ(s) is the surface measure of the unit sphere Σd−1 . The Censov process is defined by: C(t) = Nα (Vt ) Let us now consider the set Bt of spheres of Ed that separate the points 0 and t. Each of these spheres is determined by its center x and radius r. Let Mαβ be a SαS measure, with control measure μ(dx, dr) = rβ−d−1 dxdr. Takenaka process, with exponent β is defined as: T β (t) = Mαβ (Bt ) THEOREM 5.14 ([SAM 94]).– When α > 1 and β < 1, processes C and T β are β well-defined. They are self-similar processes (of order H = α1 for C(t) and H = α for T (t)), with stationary increments. THEOREM 5.15 ([SAM 94]).– For α < 2, when projected in any arbitrary direction, processes C and T β have non-proportional laws. 5.3.4. Wavelet decomposition Let M be a stochastic measure verifying (5.11) and possessing stationarity, unitary, homogenity and symmetry properties. Let ψλ be a family of wavelets as defined in section 5.2.5. Let 1 < β 2. We denote by ψλβ the family 2j/β ψ u (2j x − k). A simple observation leads to the following result. LEMMA 5.1.– We have: M (dx) =
ψλα M (ψ α )
Λd
def = ψλα ξλα Λd
Moreover, let us define: φα λ
=
eixy − 1 TF (ψλα )(y) dy ρ(y)
Self-similar Processes
199
where the function ρ is the denominator of (5.12) or (5.13). From this, we deduce the following decomposition: α 2−jH φα X(x) = λ ξλ λ∈Λd
where the
(ξλα )
verify the following stationarity properties. For any j: L
∀
L
∀r.
α α ξj+,k,u = ξj,k,u α α = ξj,k,u ξj,k+r,u
(5.15)
This property can be compared with that given in [CAM 95] (for second order processes) or in [AVE 98] (general case) for the wavelet coefficients of self-similar processes with stationary increments. 5.3.5. Process subordinated to Brownian measure Let (Ω, F, P) be a probabilized space and Wd the Brownian measure on L2 (Rd ). The space L2 (Ω, F, P) is characterized by its decomposition in chaos [NEV 68]. Let us briefly recall this theory. Let Σn be the symmetric group of order n. For any function of n variables, we define: def 1 f (xσ(1) , . . . , xσ(n) ) f ◦n (x1 , . . . , xn ) = n! Σn
Let us also define the symmetric stochastic measure of order n, i.e. (n) Wd (dx1 , . . . , dxn ) on L2 ((Rd )n ) by: def
(n)
Wd (A1 × · · · × An ) = Wd (A1 ) × · · · × Wd (An ) where any two Ai are disjoint. In addition, it is imposed that the expectation (n) of Wd (f1 . . . fn ) is always zero. As an example, with n = 2, we obtain Wd2 (f g) = Wd2 (f )Wd2 (g) − #f, g$: Wd (f ) = Wd (f ◦n ) (n)
def
(n)
The following properties are established: (n) (m) E Wd (f ), Wd (g) = δn,m #f, g$ For any F ∈ L2 (Ω, F, P), there exists a sequence fn ∈ L2 ((Rd )n ), n 0, with: (n) Wd (fn ) F = n0 (0)
and this decomposition is unique. Moreover, we have EF = Wd .
200
Scaling, Fractals and Wavelets
THEOREM 5.16.– Let 0 < H < 1: – process Y n defined by: n
def
Y (x1 , . . . , xn ) =
eix·ξ − 1
Rd
|ξ|
dn 2 +H
TF (Wdn )(dξ)
is self-similar (of order H), with stationary increments; – process X n defined by: def
X n (x) = Y n (x, . . . , x) is self-similar (of order H), with stationary increments; – if an is a summable square sequence, process X defined by: def
X(x) =
an X n (x)
n0
is self-similar (of order H), with stationary increments. This theorem shows how difficult a general classification of self-similar processes with stationary increments can be. Let us note, moreover, that we considered the elliptic case only. We could, for example, also think of combinations of hyperbolic and elliptic cases. This difficulty is clearly indicated by Dobrushin in the issues raised in the comments of [DOB 79, Theorem 6.2, p. 24]. 5.4. Regularity and long-range dependence 5.4.1. Introduction As opposed to what the title of this section may suggest, we will address this question only by means of filtered white noise and for the Gaussian case. Despite its restricted character, this class of examples allows us to question the connection between the regularity of trajectories on the one hand and the long-range correlation5 on the other hand. To begin with, let us once again consider fractional Brownian motion BH , with parameter H. The sample paths of BH are Hölderian with exponent h (a.s.), for any
5. The analysis of decorrelation and mixing process properties, is an already old subject (see, for example, [DOU 94, IBR 78]).
Self-similar Processes
201
h < H, but are not Hölderian with exponent H (a.s.). In addition, we can verify that, for Δ > 0, k ∈ N: " ! |Δ|2H E BH (Δ) BH (k + 1)Δ − BH (kΔ) = c 2(1−H) |k| The decrease, with respect to lag Δ, of the correlation of the increments of X is slow. It is often incorrectly admitted that the Hölderian character and the slow decrease of the correlation of the increments are tied together. 5.4.2. Two examples 5.4.2.1. A signal plus noise model Let H and K be such that 0 < K < H < 1. Let S1 and S2 , be two functions on the sphere Σd−1 with values in [a, b], with 0 < a b < +∞. Then, let us consider the process X defined by: eixξ − 1 eixξ − 1 def ,1 (dξ) + TF (W2 )(dξ) W X(x) = d d Rd |ξ| 2 +H S1 ξ Rd |ξ| 2 +K S2 ξ |ξ| |ξ| def
= YH (x) + ZK (x)
where it is assumed that W1 and W2 are two independent Wiener processes. The process X can be viewed as signal YH corrupted by noise ZK . Indeed, ZK is more irregular than YH . It is shown in Chapter 6 that X is locally self-similar, with parameter K. The local behavior is, indeed, dominated by K: X(λx) L = ZK (x) x lim K λ→0 λ x However, we can also verify: lim
λ→+∞
X(λx) λH
x
L = YH (x) x
The global behavior is hence dominated by H. 5.4.2.2. Filtered white noise Now, let us consider the process: eixξ − 1 def TF (W )(dξ) T (x) = d d ξ Rd |ξ| 2 +H S1 ξ 2 +K S 2 |ξ| |ξ| + |ξ|
202
Scaling, Fractals and Wavelets
It satisfies the following properties. T (λx) L lim = YH (x) x H λ→0 λ x T (λx) L lim = ZK (x) x K λ→+∞ λ x 5.4.2.3. Long-range correlation Previous results show that X is Hölderian, with exponent k < K (a.s.) and is not Hölderian with exponent K, and also that Y is Hölderian with exponent h < H (a.s.) and is not Hölderian with exponent H. The long-range correlation of X is dominated by H: X(kh) X (k + 1)h − X(kh) c = lim E |h|2H |h|→∞ |k|2(1−H) while the long-range correlation of T is dominated by K: T (kh) T (k + 1)h − T (kh) c = lim E 2K 2(1−K) |h| |h|→∞ |k| Thus, from these generic examples, we can see that long-range correlation and Hölderian regularity are two distinct concepts. 5.5. Bibliography [AVE 98] AVERKAMP R., H OUDRÉ C., “Some distributional properties of the continuous wavelet transform of random processes”, IEEE Trans. on Info. Theory, vol. 44, no. 3, p. 1111–1124, 1998. [BAR 95] BARLOW M., “Diffusion on fractals”, in Ecole d’été de Saint-Flour, Springer, 1995. [BAS 62] BASS J., “Les fonctions pseudo-aléatoires”, in Mémorial des sciences mathématiques, fascicule 153, Gauthier-Villars, Paris, 1962. [BEN 97] B ENASSI A., JAFFARD S., ROUX D., “Elliptic Gaussian random processes”, Rev. Math. Iber., vol. 13, p. 19–90, 1997. [BEN 98] B ENASSI A., C OHEN S., I STAS J., “Identifying the multifractional function of a Gaussian process”, Stat. Proba. Let., vol. 39, p. 337–345, 1998. [BEN 99] B ENASSI A., ROUX D., “Elliptic self-similar stochastic processes”, in D EKKING M., L ÉVY V ÉHEL J., L UTTON E., T RICOT C. (Eds.), Fractals: Theory and Applications in Engineering, Springer, 1999. [BEN 00] B ENASSI A., C OHEN S., D EGUY S., I STAS J., “Self-similarity and intermittency”, in Wavelets and Time-frequency Signal Analysis (Cairo, Egypt), EPH, 2000.
Self-similar Processes
203
[BRE 68] B REIMAN L., Probability, Addison-Wesley, 1968. [CAM 95] C AMBANIS S., H OUDRÉ C., “On the continuous wavelet transform of second-order random processes”, IEEE Trans. on Info. Theory, vol. 41, no. 3, p. 628–642, 1995. [DOB 79] D OBRUSHIN R.L., “Gaussian and their subordinated self-similar random fields”, Ann. Proba., vol. 7, no. 3, p. 1–28, 1979. [DOU 94] D OUKHAN P., “Mixing: properties and examples”, in Lecture Notes in Statistics 85, Springer-Verlag, 1994. [DUB 97] D UBRULLE B., G RANER F., S ORNETTE D. (Eds.), Scale Invariance and Beyond: Proceedings of the CNRS School (Les Houches, France), EDP Sciences and Springer, 1997. [DUR 97] D URAND J., Sables, poudres et grains, Eyrolles Sciences, Paris, 1997. [FRI 97] F RISCH U., Turbulence, Cambridge University Press, 1997. [GRI 89] G RIMMETT G., Percolation, Springer-Verlag, 1989. [GUY 89] G UYON X., L EON J., “Convergence en loi des H-variations d’un processus Gaussien stationnaire”, Annales de l’Institut Henri Poincaré, vol. 25, p. 265–282, 1989. [GUY 94] G UYON E., T ROADEC J.P., Du sac de billes au tas de sable, Editions Odile Jacob, 1994. [HAV 87] H AVLIN S., B EN -H VRAHAM D., “Diffusion in disordered media”, Advances in Physics, vol. 36, no. 6, p. 695–798, 1987. [HER 90] H ERMANN H., ROUX S., Statistical Models for the Feature of Disordered Media – Random Materials and Processes, North-Holland, 1990. [IBR 78] I BRAGIMOV I., ROZANOV Y., “Gaussian random processes”, in Applications of Mathematics 9, Springer-Verlag, 1978. [IST 97] I STAS J., L ANG G., “Quadratic variations and estimation of the Hölder index of a Gaussian process”, Annals of the Institute Henri Poincaré, vol. 33, p. 407–436, 1997. [KOL 40] KOLMOGOROV A., “Wienersche Spiralen und einige andere interessante Kurven im Hilbertsche Raum”, Comptes rendus (Dokl.) de l’Académie des sciences de l’URSS, vol. 26, p. 115–118, 1940. [LAM 62] L AMPERTI J., “Semi-stable stochastic processes”, Trans. Am. Math. Soc., vol. 104, p. 62–78, 1962. [LEG 99] L ÉGER S., P ONTIER M., “Drap brownien fractionnaire”, Note aux Comptes rendus de l’Académie des sciences, S. I, vol. 329, p. 893–898, 1999. [MAN 68] M ANDELBROT B.B., VAN N ESS J.W., “Fractional Brownian motions, fractional noises, and applications”, SIAM Review, vol. 10, no. 4, p. 422–437, 1968. [MEY 90] M EYER Y., Ondelettes et opérateurs, Hermann, Paris, 1990. [NEV 68] N EVEU J., Processus Gaussiens, Montreal University Press, 1968. [SAM 94] S AMORODNITSKY G., TAQQU M.S., Stable Non-Gaussian Random Processes, Stochastic Models with Infinite Variance, Chapman and Hall, New York and London, 1994.
204
Scaling, Fractals and Wavelets
[TAQ 75] TAQQU M.S., “Weak convergence to fractional Brownian motion and the Rosenblatt process”, Z.W.G., vol. 31, p. 287–302, 1975. [WEI 72] W EIERSTRASS K., “Ueber continuirliche Functionen eines reellen Arguments, die fuer keinen Werth des letzteren einen bestimmten differentialquotienten besitzen”, Koenigl. Akad. Know. Mathematical Works II, vol. 31, p. 71–74, 1872. [WIL 98] W ILLINGER W., PAXSON V., TAQQU M.S., “Self-similarity and heavy tails: structural modeling of network traffic”, in A DLER R.J., F ELDMAN R.E., TAQQU M.S. (Eds.), A Practical Guide to Heavy Tails: Statistical Techniques and Applications, Springer-Verlag, p. 27–53, 1998.
Chapter 6
Locally Self-similar Fields
6.1. Introduction Engineers and mathematicians interested in applications have to use many different models to describe reality. The objective of this chapter is to explain the usefulness of locally self-similar fields. First, we will show how the traditional concept of self-similarity often proves too narrow to model certain phenomena. Then, given the diversity of existing locally self-similar models, we will present the panorama of relations that are found among them. Finally, we will familiarize the reader with the techniques used in this field. In order to understand the genesis of locally self-similar fields, it is necessary to go back to their common ancestor: the fractional Brownian motion (FBM). Historically, the popularity of simple random models having properties of self-similarity in principle can be traced back to [MAN 68]. In particular (if restricted to the Gaussian processes), Mandelbrot and Van Ness show that there exists a single process with stationary increments, self-similar of order H (for 0 < H < 1). This property implies that a change of scale on the index amounts to a scaling on the process value: L
BH (x) = H BH (x) See Chapter 5 for more precise explanations. We will thereafter note by BH the FBM of order H. One of the most interesting properties of fractional Brownian motion of order H is the Hölderian regularity of order H, noted by C H (with near
Chapter written by Serge C OHEN.
206
Scaling, Fractals and Wavelets
logarithmic factors) of the trajectories. Indeed, FBM is a good candidate for modeling phenomena which, using a statistical processing, are found to have C H trajectories and are supposed, for theoretical reasons, to be self-similar. The importance of the identification of H, starting from the samples of the phenomenon necessarily taken in discrete time, is thus crucial. At the same time, it is necessary to remember that FBMs are processes with stationary increments, which simplifies the spectral study of the process but is too restrictive for certain applications. Indeed, in many fields (when we want to simulate textures in an image), it is expected, a priori, that order H depends on the point at which the process is observed. For example, if, using a random field, we want to model the aerial photographing of a forest, we would like to have a model where the parameter of self-similarity around the point x, noted by h(x), depends on the geological nature of the ground in the vicinity of x. However, a spatial modulation of the field law is generally incompatible with the property of self-similarity, which is the overall property. Consequently, the problem consists of arriving at a concept of a sufficiently flexible, locally self-similar field, so that the parameters which define the field law could vary with the position, yet be simple enough to enable the identification of these parameters. Unfortunately, the simple approach, which consists of reproducing H by a function h(x) in the formula giving the covariance of a FBM, is not satisfactory: generally, we can show that there does not exist any Gaussian field having this generalized covariance. We will thus have to introduce the mathematical tools which will make it possible to build models generalizing FBMs and also to identify the functional parameters of these models. These theoretical recaps will be dealt with in section 6.2, where we will consider each time the relevance of the concept introduced for an example of fractional Brownian motion. In this context, we discuss traditional techniques for the study of Gaussian fields and also the tools of analysis in wavelets. Using this theoretical framework, in section 6.3 we will formally define the property of local self-similarity and present two examples which form the base of all the later models. Having established that these models are not sufficiently general for the applications, we will penetrate into the multifractional world in section 6.4. In each preceding model, specific attention will be given to the regularity of the trajectories and, in section 6.5, we shall develop the statistical methods which make it possible to estimate this regularity. This is what we call model identifiability. At this point, it is necessary to clarify that the term “fields” is used for families of random variables indexed by groups of d for d 1. In applications, the most interesting cases correspond to d > 1 (for example, d = 2, for the images). However, certain statements, particularly those concerning identification, will relate only to processes (i.e., fields where we have d = 1).
Locally Self-similar Fields
207
6.2. Recap of two representations of fractional Brownian motion We begin with the presentation of tools for the study of locally self-similar fields based on the concept of reproducing kernel Hilbert space. We derive from this a Karhunen-Loeve expansion regarding FBM. We find that there is a spectral representation of fractional Brownian motion called harmonizable. It will be an occasion to recall some concepts of the multiresolution analysis. 6.2.1. Reproducing kernel Hilbert space The study of Gaussian fields is largely facilitated by a tool of analysis which is traditionally associated with these fields: reproducing kernel Hilbert space. From a physical point of view, the reproducing kernel Hilbert space can be regarded as a space which describes the energy associated with a Gaussian field within the meaning of a spectral energy. On the other hand, mathematically, a reproducing kernel Hilbert space is a Hilbert space of deterministic functions whose standards characterize all the properties of the field. See [NEV 68] for a detailed study. Let us now recall its formal definition. DEFINITION 6.1.– Let (Xx )x∈d be a centered Gaussian field (i.e., E(Xx ) = 0). We will call Gaussian space associated with X the space of square integrable random variables (noted by L2 (Ω, A, P )) and made up with the help of linear combinations of the variables Xx and their limits, that is:
n (6.1) λi Xxi HX =adh Z, such that ∃n ∈ N and ∃λi for i=1 to n and Z = i=1
where adh means that we take the closure of the set for the topology defined by L2 (Ω, A, P ). The space:
HX = hZ , such that ∃Z ∈ HX / hZ (x) = E(ZXx∗ ) equipped with the Hermitian product: ∀Z1 , Z2 ∈ HX ,
#hZ1 , hZ2 $ = E(Z1 Z2∗ )
(6.2)
is the reproducing kernel Hilbert space associated with the field X. It is verified, according to (6.2), that the application: h : HX −→ HX Z −→ hZ
(6.3)
208
Scaling, Fractals and Wavelets
is an isometry between Gaussian space and the reproducing kernel Hilbert space, while the Hermitian product on HX is ad hoc. In particular, this application is bijective, meaning that for all the functions h ∈ HX there is only one corresponding random variable Z of HX . Moreover, HX contains the functions of y: hXx (y) = R(x, y) resulting from the covariance of X: R(x, y) = E(Xx Xy∗ ) and we can describe the reproducing kernel Hilbert space as the closure of the finite linear combinations of functions R(x, ·) for HX . Lastly, the name reproducing kernel Hilbert space comes from the property verified by its scalar product: #R(x, ·), R(y, ·)$HX = R(x, y)
(6.4)
However, the most important aspect of reproducing kernel Hilbert space is the fact that the choice of an orthonormal base of this space makes it possible to obtain a series representation of the field, which is often called Karhunen-Loeve expansion. THEOREM 6.1.– Any orthonormal base (en (x))n∈N of HX , is associated with an orthonormal base of HX , i.e. (ηn )n∈N , by the relation: hηn = en The random variables (ηn )n∈N are the centered independent Gaussian variables of variance 1 and the field can be represented by: Xx (ω) =
+∞
ηn (ω)en (x)
(6.5)
n=0
where the ω are hazards of the space of probability Ω and convergence in (6.5) is in the direction L2 (Ω). The preceding theorem is true for any field and for any orthonormal base of HX . In fact, a martingale type argument shows that convergence is almost sure, which is important, particulary when simulating the fields of interest here. Nevertheless, a judicious choice of the orthonormal base of HX is necessary for conveniently studying the regularity of the trajectories of these fields. We will illustrate these ideas in the fundamental example of FBM in the next section. 6.2.2. Harmonizable representation By way of example, let us seek the reproducing kernel Hilbert space of the FBM: we will deduce from it a Karhunen-Loeve expansion which will form the basis for
Locally Self-similar Fields
209
studying the trajectorial regularity of the generalizations for FBM. Let us begin with the definition of the FBM, starting from its covariance. DEFINITION 6.2.– We will call FBM of order H the real centered Gaussian field BH given by the covariance: 1
x 2H + y 2H − x − y 2H E BH (x)BH (y) = 2
(6.6)
where 0 < H < 1 and where is the Euclidean norm on d . Let us begin with some elementary comments explaining this presentation. To simplify our study, we shall assume a field with real values. In addition, it is sometimes more vivid to define the FBM by expressing the variance of the increments, which is: 2 E BH (x) − BH (y) = x − y 2H In fact, this property characterizes the FBM if we additionally impose BH (0) = 0 a.s. If H = 12 and d = 1, the FBM is a standard Brownian motion and the increments are then independent if they are taken on separate intervals; however, this case is exceptional, and the majority of methods used for the standard Brownian do not apply to the other H. On the other hand, we note that the FBM is a field with stationary increments whatever the H, which constitutes the starting point for representing its covariance. Indeed, it is traditional to represent fields with stationary increments through a spectral measure. While following the example of the stationary processes (see [YAG 87] for a general presentation), we obtain: R(x, y) =
d
(eix·ξ − 1)(e−iy·ξ − 1) μ(dξ)
(6.7)
where μ is the spectral measurement. In the case of FBM, we can guess the spectral measurement from the formula: d
|eix·ξ − 1|2 dξ 2 = CH
x 2H
ξ d+2H (2π)d/2
(6.8)
where CH is a strictly positive constant; the preceding formula gives: R(x, y) =
1 2 CH
d
(eix·ξ − 1)(e−iy·ξ − 1) dξ
ξ d+2H (2π)d/2
= #kx , ky $L2 (d )
(6.9)
210
Scaling, Fractals and Wavelets
where we have:
#f, g$L2 (d ) =
d
f (ξ)g ∗ (ξ)
dξ (2π)d/2
The covariance can still be written by using Parseval’s formula, which expresses that Fourier transform is an isometry of L2 : ,y $L2 (d ) ,x , k R(x, y) = #k
(6.10)
where fˆ is the Fourier transform of f . By using the Fourier inversion theorem: kx =
−1 eix·ξ d
CH ξ 2 +H
Equation (6.9) is an attempt to associate the covariance with a functional scalar product and thus with (6.4). That will enable us, according to [BEN 97b], to have a convenient description of the reproducing kernel Hilbert space of the FBM. To this end, let us define the operator of isometry J between HBH and L2 (d ). DEFINITION 6.3.– We define the linear operator J of L2 (d ) on HBH by assuming: J (kx ) = R(x, ·) For ψ ∈ L2 (d ): J (ψ)(y) =
d
dξ (e−iy·ξ − 1) ˆ ∗ (ψ) (ξ) d (2π)d/2 CH ξ 2 +H
(6.11)
The reproducing kernel Hilbert space of the FBM can be written:
HBH = f, ∃ψ ∈ L2 (d ) such that f = J (ψ) Moreover, J is an isometry: #J (ψ1 ), J (ψ2 )$HBH = #ψ1 , ψ2 $L2 (d )
(6.12)
The properties of J contained in Definition 6.3 are proved in pages 24 and 25 of [BEN 97b]. This presentation of reproducing kernel Hilbert space makes it possible to easily build an orthonormal base of HBH such that the associated Karhunen-Loeve expansion almost surely does converge. The first stage consists of choosing an
Locally Self-similar Fields
211
orthonormal base of L2 (d ) which is adapted to our problem. For this, let us start from a multiresolution analysis of L2 (d ) (see [MEY 90a]): it is known that there are functions ψ (l) ∈ L2 (d ) for l pertaining to L = {0, 1}d \ {(0, . . . , 0)} such that their (l) (ξ) are C ∞ and vanish beyond the limit of 2π ξ 8π . Fourier transforms ψ 3 3 We then suppose: (l)
dj
ψj,k (x) = 2 2 ψ (l) (2j x − k)
j, k ∈ Z
(6.13)
(l)
Below, we will note λ = (j, k, l), Λ = Z2 × L and ψj,k = ψλ . Conventionally, in a multiresolution analysis (ψλ )λ∈Λ is an orthonormal base of L2 (d ) and function ψλ is localized around k 2−j which we will identify, by using unconventional language, with λ. Here ψλ in particular shows a fast decrease: |ψλ (x)|
C 2dj/2 1 + |2j x − k|K
∀K ∈ N
(6.14)
From the base (ψλ )λ∈Λ , we will build an orthonormal base of the reproducing kernel Hilbert space of the FBM by assuming: dξ e−ix·ξ − 1 ˆ ∗ (ψλ ) (ξ) (6.15) ϕλ (x) = d d/2 +H (2π) d 2 C
ξ H When d = 1, functions ϕλ are “morally” the fractional integrals of functions ψλ , within the meaning of Chapter 7. To be convinced of this, it is necessary to express the fractional integration operator of the previously mentioned chapter regarding the Fourier transform of the function to which we apply it. It will be noted, however, that the correspondence is not exact. Nevertheless, the principal purpose of functions ϕλ is that they “inherit,” in a certain manner, the localization properties of functions ψλ : these are “wavelet” type functions in the terminology of [MEY 90b], which results in: 1 1 (6.16) + |ϕλ (x)| C(K) 2−Hj 1 + |2j x − k|K 1 + |k|K |ϕλ (x) − ϕλ (y)| C(K) 2−Hj
2j |x − y| 1 + |2j x − k|K
∀K ∈ N
(6.17)
where we suppose that we have j 0 and (j, l) = (0, 0). Consequently, the series expansion (6.5) of the FBM becomes: ηλ ϕλ (x) (6.18) BH (x) = λ∈Λ
where (ηλ )λ∈Λ is a sequence of centered Gaussian random variables independent of variance 1. Thanks to localization (6.16), it is possible to say that, roughly, the
212
Scaling, Fractals and Wavelets
behavior in the vicinity of x of the FBM BH depends mainly on the random variables ηλ for λ approaching x; in particular, the series of (6.18) almost surely converges. In this chapter we will continue to use the representation known as harmonizable for FBM of order H. In this, the FBM appears as a white noise to which a filter is applied within the meaning of the theory of the signal. Although FBM can be defined by its harmonizable representation (see Chapter 7 of [SAM 94]), we will here deduce it from (6.18). From this point of view, it is necessary to assume: ˆ ∗ (dξ) = W
∗
,λ (ξ) ηλ ψ
λ∈Λ
dξ (2π)d/2
(6.19)
This is a Gaussian random measure and it integrates the deterministic functions f of L2 (d ) into possibly complex values and provides a Gaussian random variable: ∗ ,λ (ξ) dξ ˆ ∗ (dξ) = f (ξ)W ηλ f (ξ)ψ (6.20) (2π)d/2 d d λ∈Λ It is necessary to understand the left-hand side of (6.20) like a notation for the Gaussian random variable defined by the convergent series in L2 (Ω) of the right-hand side. Since the variables ηλ are independent, it is deduced that: . . . .
d
.2 . ˆ ∗ (dξ). f (ξ)W .
L2 (Ω)
= f 2L2 (d )
On the basis of (6.18) and (6.20), we obtain: e−ix·ξ − 1 ˆ ∗ W (dξ) BH (x) = d +H d CH ξ 2
(6.21)
(6.22)
The FBM is thus a white noise filtered through the filter: g(x, ξ) =
e−ix·ξ − 1 d
CH ξ 2 +H
(6.23)
It should be noted that there are other filters leading to fields which have the same law as that defined in (6.22). In the next section, we will see that it is possible to define the generalizations of FBM which do not have stationary increments, on the basis of the reproducing kernel Hilbert space in FBM or its harmonizable representation.
Locally Self-similar Fields
213
6.3. Two examples of locally self-similar fields 6.3.1. Definition of the local asymptotic self-similarity (LASS) In this section, we will precisely define the property of local self-similarity which we are seeking. Let us recall, to this end, the property of self-similarity verified by the FBM: ∀ ∈ + ,
∀x ∈ d ,
L
BH (x) = H BH (x)
(6.24)
L
where = means that, for all n ∈ N and any choice of (x1 , . . . , xn ) in n , the vector (BH (x1 ), . . . , BH (xn )) has the same law as H (BH (x1 ), . . . , BH (xn )). As we saw in the introduction, it is not easy to localize this overall property while preserving an identifiable model for which the trajectories of the process have locally, in the vicinity of a point, the desired Hölderian regularity. The asymptotic definition, presented initially in [BEN 97b], corresponds to these objectives. DEFINITION 6.4.– Let there be a function h: d → ]0, 1[. A field X will be called locally self-similar (LASS) of multifractional function h if: X(x + u) − X(x) L lim = a(x) BH (u) u∈d (6.25) ∀x ∈ d , h(x) →0+ u∈d where a is a strictly positive function and BH is a FBM of order H = h(x). The topology with which we equip the trajectory space to define the convergence in law is that of uniform convergence on each compact. This definition can be reformulated qualitatively by saying that a locally self-similar process admits at each point x ∈ d , to a standardization of the near variance given by a(x), a Brownian fractional tangent BH . It is a satisfactory generalization of the self-similar property. Indeed, it is easy to verify that a FBM is locally self-similar to the constant multifractional function equal to its order. In the case of FBM, we have: L
X(x + u) − X(x) = X(u) because of the stationarity property of the increments and it is noted, while applying (6.24), that for a FBM, the term: X(x + u) − X(x) H is constant in law. This elementary verification explains the denominator of (6.25) as well as the role of localization of the asymptotic → 0+ . The last advantage of Definition 6.4 is that it enables the construction of non-trivial examples of locally self-similar processes, as we will see in the following section.
214
Scaling, Fractals and Wavelets
6.3.2. Filtered white noise (FWN) In [BEN 98b], we propose to generalize the harmonizable representation (6.22) by calling filtered white noise (FWN) any process corresponding to a filter g(x, ξ) of the form: a(x) b(x) −ix·ξ − 1) + + R(x, ξ) (6.26) g(x, ξ) = (e d d
ξ 2 +H1
ξ 2 +H2 for 0 < H1 < H2 < 1. The term in parentheses in (6.26) is an asymptotic expansion in high frequency and we will find, in the following definition, the precise assumptions which express that R(x, ξ) is negligible in front of: b(x) d
ξ 2 +H2 when ξ → +∞. DEFINITION 6.5.– We will call filtered second order white noise a process X which admits harmonizable representation: a(t) b(t) −itξ ˆ ∗ (dξ) (6.27) − 1) + + R(t, ξ) W ∀t ∈ , X(t) = (e 1 1 |ξ| 2 +H1 |ξ| 2 +H2 where there are the two following hypotheses. HYPOTHESIS 6.1.– In the preceding definition, we have 0 < H1 < H2 < 1 and R(t, ξ) ∈ C 1,2 ([0, 1] × ) is such that: m+n ∂ C ∂tm ∂ξ n R(t, ξ) |ξ| 12 +η+n for m = 0, 1 and n = 0, 1, 2 with η > H2 . The symbol C denotes a generic constant. HYPOTHESIS 6.2.– In the preceding definition, we have a, b ∈ C 1 ([0, 1]) and, for every t ∈ [0, 1], we have a(t)b(t) = 0. Limiting ourselves to an expansion of order 2 is a convention adopted with the aim of facilitating the presentation of the identification algorithms. For the same reason, we suppose that filtered white noises are the processes indexed by t ∈ . For a better understanding of the relationship between filtered white noises and FBM, it is enough to suppose that R(t, ξ) is identically zero. We find Xt = CH1 a(t)BH1 (t) + CH2 b(t)BH2 (t), for BH1 , BH2 two fractional Brownian motions and the CH constant defined in (6.8). It should, however, be noted that, even in this simplified example, BH1 and BH2 are not independent and therefore
Locally Self-similar Fields
215
the law of X is not trivially deduced from FBM. This last example illustrates an additional virtue of filtered white noises: their definition authorizes not only the functional parameters a(t) and b(t) to vary according to the position, but also the superposition of self-similar phenomena of orders H1 and H2 . In addition, we will find an interpretation of H2 in terms of long dependence in Chapter 5. On the other hand, starting from formula (6.27), it is difficult to find a convenient expression of the reproducing kernel Hilbert space of a filtered white noise. Moreover, the filtered white noises do not maintain the overall properties of FBMs, that is to say, self-similarity and stationarity of the increments, which is an advantage when modeling certain phenomena. Since the filter of a white filtered noise is asymptotically equivalent in high frequency to that of a FBM, only the local properties remain. For example, the filtered white noises verify a property of local self-similarity of the following type. PROPOSITION 6.1.– A filtered second order white noise X associated with a filter of form (6.26) is locally self-similar of constant multifractional function equal to H1 : lim+
→0
X(t + u) − X(t) H1
u∈
L = CH1 a(t) BH1 (u) u∈
(6.28)
A proof of this result can be found in [BEN 98a], regarding the multifractional processes. 6.3.3. Elliptic Gaussian random fields (EGRP) Another manner of generalizing FBM, which constitutes the approach adopted in [BEN 97b], consists of starting from the reproducing kernel Hilbert space. By returning to Definition 6.3, we can already represent the reproducing kernel Hilbert space norm by means of the operator J and the formula: #J (ψ1 ), J (ψ2 )$HBH = #ψ1 , ψ2 $L2 (d ) However, this formula can be presented differently. For every function f, g of space D0 of zero functions in 0, C ∞ with compact support: #f, g$HBH = #AH f, g$L2 (d )
(6.29)
2 where AH is a pseudo-differential operator of symbol CH
ξ 1+2H (CH indicates the constant defined in (6.8)), i.e.: dξ 2 eix·ξ CH
ξ 1+2H fˆ(ξ) (6.30) AH (f ) = (2π)d/2 d
216
Scaling, Fractals and Wavelets
A demonstration of (6.29) is found in Lemma 1.1 of [BEN 97b]. This equation is also equivalent to: AH = J −2
(6.31)
2 It should be noted that the symbol of the operator AH , σ(ξ) = CH
ξ 1+2H is homogenous in ξ and does not depend on x, which respectively corresponds to the self-similarity property and increment stationarity of the process. Consequently, it is natural to consider Gaussian processes which are associated with the symbols σ(x, ξ) which also depend on the position. The property of stationarity of the increments is lost. However, if we impose that σ(x, ξ) is elliptic of order H, i.e., controlled by
ξ 1+2H when ξ → +∞, in the precise sense that there exists C > 0 such that, for all x, ξ ∈ d :
C(1 + ξ )2H+1 σ(x, ξ)
1 (1 + ξ )2H+1 C
(6.32)
we then obtain processes called elliptic Gaussian random processes (EGRP), which locally preserve many properties of the FBM. In this chapter, we will define the elliptic Gaussian random processes in a less general manner than in [BEN 97b]. DEFINITION 6.6.– Let AX be the pseudo-differential operator defined by: dξ eitξ σ(t, ξ)fˆ(ξ) ∀f ∈ D0 , AX (f ) = 1/2 (2π)
(6.33)
of the symbol σ(t, ξ) verifying for 0 < H < 1, the following hypothesis: HYPOTHESIS 6.3 (H).– There exists R > 0: – for every t ∈ and for i = 0 to 3: i ∂ σ(t, ξ) 2H+1−i ∂ξ i Ci (1 + |ξ|)
for |ξ| > R
– > such that:
|σ(s, ξ) − σ(t, ξ)| (1 + |ξ|)2α+1+ |s − t| – it is elliptic of order H (see (6.32)). We will then call elliptic Gaussian random processes of order H the Gaussian processes of reproducing kernel Hilbert space given by adherence of D0 for the norm (#AX f, f $L2 () )1/2 and provided with the Hermitian product: #f, g$HX = #AX f, g$L2 ()
Locally Self-similar Fields
217
Let us make some comments to clarify Definition 6.6. First, we restrict ourselves to one dimension mainly for the same reasons of simplicity as for the filtered white noises. In addition, let us note that if σ does not depend on t, then AX still verifies: AX = JX−2
(6.34)
for the isometry: def
JX (ψ)(y) =
dξ e−iyξ − 1 ˆ ∗ * (ψ) (ξ) (2π)1/2 σ(ξ)
∀ψ ∈ L2 ()
This is the same, in fact, as saying that we have a harmonizable representation of X: −itξ e −1 ˆ ∗ * W (dξ) X(t) = σ(ξ) * It is therefore enough to have an asymptotic expansion of (e−itξ − 1)/ σ(ξ) of the type (6.26) so that X is a filtered white noise. On the other hand, the FBMs are not elliptic Gaussian random processes of the type defined earlier, since the lower inequality of ellipticity (6.32) is not verified. Moreover, if the symbol depends on t, relation (6.34) is no longer true and the elliptic Gaussian random processes are no longer filtered white noises. Let us reconsider Hypothesis 6.3. The first two points are necessary so that the symbol AX behaves asymptotically at high frequency “as if” it does not depend on t. If we want to distinguish the two models roughly, we can consider that elliptic Gaussian random processes have more regular trajectories, whereas filtered white noises lend themselves better to identification. Consequently, let us reconsider the manner of determining the local regularity of an elliptic Gaussian random process and very briefly summarize the reasoning of [BEN 97b]. The starting point for the study of the regularity in the elliptic Gaussian random processes is, as for FBM, a Karhunen-Loeve expansion of the elliptic Gaussian random processes in adapted bases. The selected orthonormal base is built starting from the base (6.13) of L2 () by supposing: 1
φλ = (AX )− 2 (ψλ ) where the fractional power of AX is defined by means of a symbolic calculation on the operators. This leads to: ηλ φλ a.s. and in L2 (Ω) (6.35) X= λ
The regularity of X is the consequence of “wavelet” type estimates which relate to φλ and its first derivative and which resemble to (6.16). A precise statement is
218
Scaling, Fractals and Wavelets
found in Theorem 1.1 of [BEN 97b]; the essential point is the decrease in 2−Hj of the numerator with respect to the scale factor j – it is indeed this exponent which governs the almost sure regularity of the process. Thanks to the traditional techniques on the random series (see Chapters 15 and 16 of [KAH 85]), we find that, “morally”, the trajectories are Hölderian of the exponent H. On page 34 of [BEN 97b], we will find a great number of results describing very precisely the properties of the local and overall continuity modules of the elliptic Gaussian random processes; we will only mention here, by way of example, a law of the local iterated logarithm. THEOREM 6.2.– If X is an elliptic Gaussian random process of order H, then, for all t ∈ , we have: lim sup ε→0
|X(t + ε) − X(t)| & = C(t) |ε|H log log( 1ε )
(a.s.)
(6.36)
with 0 < C(t) < +∞. Thus, considering that “the trajectories are Hölderian of the exponent H” is equivalent to forgetting the iterated logarithm factor. On the other hand, it should be noted that if we are interested only in the continuity module of the elliptic Gaussian random processes, without wanting to specify the limit C(t), then metric entropic techniques (see [LIF 95]) valid for all Gaussian processes are applicable. Lastly, elliptic Gaussian processes are locally self-similar and subject to a convergence property of their symbol at high frequency. PROPOSITION 6.2.– If an elliptic Gaussian random process X of order H is associated with a symbol verifying: lim
|ξ|→+∞
σ(t, ξ) = a(t) |ξ|1+2H
∀t ∈
then X is a locally self-similar of constant multifractional function equal to H: X(t + u) − X(t) L lim = a(t) BH (u) u∈ (6.37) H →0+ u∈ 6.4. Multifractional fields and trajectorial regularity The examples of the preceding section lead to the principal objection concerning both elliptic Gaussian processes and filtered white noises: the property of local self-similarity shows that, in spite of the modulations introduced by the symbol or filter, the multifractional function remains constant. The multifractional Brownian motion (MBM), introduced independently by [BEN 97b, PEL 96], is a model where a non-trivial multifractional function appears. This can be defined by its harmonizable representation.
Locally Self-similar Fields
219
DEFINITION 6.7.– Let h: d → ]0, 1[ be a measurable function. We will call MBM of function h any field admitting the harmonizable representation: 1 Bh (x) = v h(x)
d
e−ix·ξ − 1 d
ξ 2 +h(x)
W (dξ)
(6.38)
where W (dξ) is a Brownian measure and, for every s ∈ ]0, 1[: v(s) =
d
1/2 2 1 − cos(ξ1 ) dξ
ξ d+2s (2π)d/2
(6.39)
where ξ1 is the first co-ordinate of ξ. As in (6.19), we define a general Brownian measure starting from an orthonormal base (gn (x))n∈N of L2 (d ) and a sequence (ηn )n∈N of centered independent Gaussian variables of variance 1, by supposing: dξ f (ξ)W (dξ) = ηn f (ξ)gn∗ (ξ) (6.40) d/2 (2π) d d n∈N for any function f of L2 (d ). In addition, the function of standardization v 2 of (6.8) and it is noted immediately that, if function corresponds to the constant CH h is a constant equal to H, the MBM is a fractional Brownian motion of order H (a unifractional Brownian). Before studying the properties of the MBM, we will establish the link between its harmonizable representation and the representation in the form of moving average obtained by [PEL 96]. 6.4.1. Two representations of the MBM To summarize the link between the harmonizable representation and the moving average representation of the MBM, we will say that they are Fourier transforms of each other. To be more precise, let us start from Definition 6.7 of a process indexed by (which is the case under consideration in [PEL 96]). Let us suppose that the ˆ ∗ (dξ); we have a series expansion of the MBM: Brownian measure of (6.38) is W 0 / e−it· − 1 1 , Bh (t) = ηλ , ψλ (6.41) 1 v h(t) λ∈Λ |.| 2 +h(t) L2 () However, Parseval’s identity leads to: /
e−it· − 1 , , ψλ 1 |.| 2 +h(t)
0 = L2
/ eit· − 1
, ψλ 1 +h(t)
|.| 2
0 (6.42) L2
220
Scaling, Fractals and Wavelets 1
The Fourier transform of (eit· − 1)/|.| 2 +h(t) is calculated by noticing that the transform of a homogenous distribution is also homogenous: it· − 1 e
|.|
1 2 +h(t)
1 1 (s) = C h(t) |t − s|(h(t)− 2 ) − |s|(h(t)− 2 )
(6.43)
We deduce from it the following theorem, whose proof is found in [COH 99]. ˆ ∗ (dξ) of (6.19). The MBM of the THEOREM 6.3.– Let the Brownian measure be W harmonizable representation: 1 Bh (t) = v h(t)
e−itξ − 1 ˆ ∗ W (dξ) 1
ξ 2 +h(t)
(6.44)
is equal almost surely to a deterministic multiplicative function close to the symmetric moving average:
+∞ ! −∞
1 1 " |t − s|(h(t)− 2 ) − |s|(h(t)− 2 ) W (ds)
where the Brownian measure is given by: W (ds) = ηλ ψλ (s) ds
(6.45)
(6.46)
λ∈Λ
This theorem calls for several comments. First of all, when h(t) = 12 : ! 1 1 " |t − s|(h(t)− 2 ) − |s|(h(t)− 2 ) is not clearly defined, but the proof of the theorem shows that we must suppose: " ! 1 1 0 0 def − log |t − s| − |s| = log |t − s| |s| in (6.45). Now that we know that there is primarily only one MBM, we can state the local self-similarity associated with its multifractional function. On this subject, let us remember theorem 1.7 of [BEN 97b], which finds its symmetric (match) in Proposition 5 of [PEL 96]. PROPOSITION 6.3.– A MBM of function h of Hölderian class C r , with r > supt h(t), is locally self-similar to multifractional function h.
Locally Self-similar Fields
221
6.4.2. Study of the regularity of the trajectories of the MBM This section will recall the results known about the trajectory regularities of the MBMs. To carry out this study, both in [BEN 97b, PEL 96], a hypothesis of regularity is stated on the multifractional function h itself. In this section, we assume that the following hypothesis is verified: HYPOTHESIS 6.4.– Function h is Hölderian of exponent r (noted h ∈ C r ) with: r > sup h(t) t∈
This hypothesis of surprising formulation has long been considered to be related to the technique of the proof. In fact, we will see by outlining the proof of [BEN 97b] that we cannot do better for a MBM and that the obstruction comes from the “low frequencies”. Let us begin with the random series representation of the MBM (6.41), which we present differently: 1 ηλ χλ t, h(t) Bh (t) = v h(t) λ∈Λ
(6.47)
where the function: χλ (x, y) =
dξ e−ixξ − 1 , ∗ ψλ (ξ) 1 +y (2π)1/2
ξ 2
(6.48)
,λ does not is analytical in its two variables (the fact that the support of functions ψ contain 0 is used here). Similarly, the standardization function v is analytical and does not cancel itself on ]0, 1[. It follows that, if we truncate the series (6.47) by considering only a finite number of dyadics λ, then the random function which results from it has the same regular multifractional function h. In addition, the irregularity of the trajectories of the MBM is a consequence of high frequency phenomena (i.e., dependent on χλ (t, h(t)) for |λ| → +∞). We can find in [BEN 97b] the high frequency estimates for the MBM, which are generalizations of (6.16). For every K ∈ N: 1 1 (6.49) + |χλ t, h(t) | C(K) 2−h(t)j 1 + |2j x − k|K 1 + |k|K |χλ t, h(t) − χλ s, h(s) | j (6.50) j|h(t) − h(s)| −h(s,t)j 2 |t − s| + j|h(t) − h(s)| C(K) 2 + 1 + |2j t − k|K 1 + |k|K with h(s, t) = min(h(s), h(t)). We notice, in particular, the factor 2−h(t)j which leads, for reasons identical to those of section 6.3.3, to the conclusion that the
222
Scaling, Fractals and Wavelets
MBM is, up to logarithmic factor, almost surely Hölderian of the exponent h(t). If Hypothesis 6.4 is omitted, it is not difficult to see that “the Hölder exponent” of the MBM in t is given by min(h(t), r) which amounts to saying that it is the most irregular part of the high and low frequency of the MBM which imposes the overall regularity. Let us recall one of the results of [BEN 97b]. THEOREM 6.4.– If X is a MBM of the multifractional function verifying Hypothesis 6.4, then, for all t ∈ , we have: lim sup ε→0
|X(t + ε) − X(t)| & = C(t) |ε|h(t) log log( 1ε )
a.s.
(6.51)
with 0 < C(t) < +∞. For the issue of the MBM simulation, see [AYA 00], where there are some indications regarding this question. We will also note the existence of the FracLab toolbox in this field. 6.4.3. Towards more irregularities: generalized multifractional Brownian motion (GMBM) and step fractional Brownian motion (SFBM) We saw in the preceding section that the MBM provides a model for locally self-similar processes with varying multifractional functions and pointwise exponents. However, Hypothesis 6.4 – essential within the strict framework of MBM – is cumbersome for certain applications. Let us quote two examples where we would wish for regularities which are worse than Hölderian regularities. First, in the rupture models which are important in image segmentation, we wish that the Hölder exponent had some discontinuities. Let us be clearer about this problem through a metaphor. Let us suppose that we have an aerial image on which we want to distinguish the limit of a field and a forest. It is usual to model the texture of a forest by a FBM of the Hölder exponent H1 . In the same way, for the portion of the field, we can think of a FBM of exponent H2 . The question arises as to how “to connect” the two processes on the border. There is a possibility of considering a MBM within the meaning of Definition 6.7, for which function h takes the two values H1 and H2 . However, this MBM does not correspond to the image. Indeed, it shows a discontinuity at the place where function h jumps. However, on an image, only the regularity changes suddenly and most often, the field remains continuous. To model this type of rupture, let us recall the construction of the step fractional Brownian motion (SFBM) of [BEN 00]. In addition, the Hölder exponent of the MBM varies very slowly for applications with so-called developed turbulence (see [FRI 97] for an introduction to this subject). Indeed, the physics of turbulence teach us that the accessible data from measurements are not the Hölder exponents of the studied quantities, but their multifractal spectrum.
Locally Self-similar Fields
223
The description of multifractal spectrum is beyond the scope of this chapter (see Chapter 1, Chapter 3 and Chapter 4), but it is enough to know that for a function whose Hölder exponent is itself C r , this spectrum is trivial. We can thus be convinced that the MBM is not a realistic model for developed turbulence. In order to obtain processes whose trajectories have Hölder exponents which vary abruptly, Ayache and Lévy Véhel have proposed a model called the generalized multifractional Brownian motion (GMBM) in [AYA 99]. We present their model in the second part of this section. 6.4.3.1. Step fractional Brownian motion The multifractional functions associated with the SFBMs are very simple, which makes it possible to have a reasonable model for the identification of ruptures. We limit ourselves to multifractional functions in steps: h(t) =
K
1[ai ,ai+1 [ (t) Hi
(6.52)
i=0
with a0 = −∞ and aK+1 = +∞, and where ai is an increasing sequence of realities. By taking (6.47), we arrive at the following definition. DEFINITION 6.8.– Let Λ+ = { 2kj , for k ∈ Z, j ∈ N}. By SFBM we mean the process of multifractional function h defined by (6.52): (6.53) ηλ χλ t, h(λ) Qh (t) = λ∈Λ+
where functions χλ are defined by (6.48). In the preceding definition, there are some differences as compared to (6.47). Some of them are technical, like the suppression of standardization v(h(t)) or the absence of negative frequencies. On the other hand, the SFBM has continuous trajectories whereas the MBM which corresponds to a piecewise multifractal function is discontinuous. This phenomenon occurs due to the replacement of χλ (t, h(t)) by χλ (t, h(λ)). Indeed, the first function is discontinuous as h at points ai , this jump disappearing in χλ (t, h(λ)). However, the fast decay property of functions t → χλ (t, h(λ)) when |t − λ| → +∞ causes the SFBMs to have local properties very close to those of the MBM outside the jump moments of the multifractional function. The following theorem, which more precisely describes the regularity of the SFBM, can be found in [BEN 00]. THEOREM 6.5.– For any open interval I of , we suppose: H ∗ (I) = inf{h(t), for t ∈ I}
224
Scaling, Fractals and Wavelets
If Qh is a SFBM of the multifractional function h, then Qh is the overall Hölderian of exponent H for all 0 < H < H ∗ (I), on any compact interval J ⊂ I. Thus, in terms of regularity, the SFBM is a satisfactory model. We will see, in section 6.5, that we can completely identify the multifractional function: moments and amplitudes of the jumps. 6.4.3.2. Generalized multifractional Brownian motion Let us now outline the work of [AYA 99]. The authors propose to circumvent the “low frequency” problems encountered within the definition of MBM, by replacing the multifractional function h with a sequence of regular functions hn , whose limit, which will play the role of the multifractional function, can be very irregular. Let us first specify the technical conditions relating to the sequence (hn )n∈N . DEFINITION 6.9.– A function h is said to be locally Hölderian of exponent r and of constant c > 0 on if, for all t1 and every t2 , such that, |t1 − t2 | 1, we have: |h(t1 ) − h(t2 )| c|t1 − t2 |r Such a function will be called (r, c) Hölderian. We can consequently define the multifractional sequences which generalize the multifractional functions for the GMBM. DEFINITION 6.10.– We will call a multifractional sequence a sequence (hn )n∈N of Hölderian functions (r, cn ) with values in an interval [a, b] ⊂ ]0, 1[ and we will call its lower limit a generalized multifractional function (GMF): h(t) = lim inf hn (t) n→+∞
if (hn )n∈N verifies the following properties: – for all and all t0 , there exists n0 (t0 , ) and h0 (t0 , ) > 0 such that, for all n > n0 and, |h| < h0 we have: hn (t0 + h) > h(t0 ) − – for all t, we have h(t) < r and cn = O(n). In the preceding definition, it is essential that the generalized multifractional function is a limit when the index n tends towards +∞; we will see that this translates the high frequency portion of the information contained in the multifractional sequence. In addition, the GMF set contains very irregular functions like, for example, 0 < a < b < 1: t −→ b + (a − b)1F (t)
Locally Self-similar Fields
225
where F is a set of the Cantor type. A proof of this result, as well as an opening point of discussion on the set of the GMF, is found in [AYA 99]. Lastly, a process can be associated with a multifractal sequence in the following manner. DEFINITION 6.11.– We will call a GMBM associated with a multifractional sequence noted by (h) = (hn )n∈N any process permitting the harmonizable representation: e−it·ξ − 1 Y(h) (t) = W (dξ) 1 +h0 (t) |ξ|<1 |ξ| 2 (6.54) +∞ e−it·ξ − 1 + W (dξ) 1 n−1 ξ<2n |ξ| 2 +hn (t) n=1 2 The comparison of this definition with that of MBM is instructive: if it is supposed that functions (hn ) of the multifractional sequence are all equal to a function h verifying Hypothesis 6.4, then it is noted that the GMBM is a MBM with near normalization: 1 Bh (t) Y(h) (t) = v h(t) It is also noted that the law of a GMBM depends on the whole of the multifractional sequence and not only on the generalized multifractional function limit. In fact, writing the GMBM in series form, let us guess that the nth function of the multifractional sequence “governs” the behavior of the GMBM at scale 1/2n−1 . To clarify this idea, let us suppose that the white noise which intervenes in formula ˆ ∗ (dξ) and let us consider the development in series of the GMBM which (6.54) is W is deduced from it: +∞ e−it·ξ − 1 , ∗ (6.55) ψ (ξ) dξ ηλ Y(h) (t) ≈ 1 +hn (t) λ 2n−1 ξ<2n |ξ| 2 λ∈Λ+ n=1 The expression above is only approximate (and therefore the symbol ≈) because it does not take low-frequency phenomena into account; we have in fact omitted the integral on {|ξ| < 1} in (6.54). This minor inaccuracy is not detrimental to the heuristic reasoning to come, which seeks to explain that the regularity of the GMBM depends in fact on h(t) = lim inf n→+∞ hn (t). Still, to eliminate the technical ,λ of the definition of W ˆ ∗ (dξ) cancel problems, let us suppose that functions ψ k j j+1 themselves outside [2 , 2 ] if we have λ = 2j (this means neglecting constants 2π 3 (l) on (6.13)). and 8π which appear when describing the dependence of function ψ 3
Under these assumptions, formula (6.55) is simplified because integrals are zero except if the scale index j of λ = 2kj is equal to n − 1; we obtain: +∞ k χλ j , hj+1 (t) η kj (6.56) Y(h) (t) ≈ 2 2 j=0 k
226
Scaling, Fractals and Wavelets
This clarifies the natural correspondence between hj+1 and the scale j. In particular, we understand why the regularity of the GMBM depends on the behavior of hn (t) when n becomes large. This result is Theorem 2 of [AYA 99], the precise statement of which we now recapitulate. THEOREM 6.6.– Let: Y (h)(t + ) − Y (h)(t) def = 0 αY (h) (t) = sup α, lim →0 α be the pointwise Hölder exponent for all t ∈ of a GMBM Y (h). Then: ∀t ∈ ,
αY (h) (t) = h(t)
(a.s.)
is the generalized multifractional function of the GMBM. To conclude this section on the GMBM, it should be added that, under additional assumptions bearing on the multifractional sequence of a GMBM, the generalized multifractional function of a GMBM has been identified in [AYA 04a, AYA 04b]. 6.5. Estimate of regularity In this section, we will estimate the regularity of the processes by means of quadratic variations. It is of course not the only method: see Chapter 2 and Chapter 9 for alternative approaches. 6.5.1. General method: generalized quadratic variation First, we will fix the general framework of identification methods. In particular, in this section, we shall identify processes indexed by . The hypothesis that the processes are Gaussian authorizes us to proceed to the identification starting from a unique trajectory of the process, which we suppose to have been observed in discrete times: we will assume X( Np ) as known for 0 p N . The various estimators which we use are built from the generalized quadratic variation of the observation X( Np ) which we can write: VN (w) =
N −2 p=0
w
p + 1 p 2 p p + 2 X − 2X +X N N N N
(6.57)
where w is a function of weight which serves to localize the quadratic variation if we seek to estimate the functions (for example, the multifractional function of a MBM). We will note: VN (w) = VN in the sequence if w is equal to the constant 1.
Locally Self-similar Fields
227
We will try to bring out, in the method of identification, the techniques which apply to all the models already presented. In fact, all these models have trajectories of Hölderian regularity, and it is this which will guide us in building estimators. From this point of view, it is noted that the introduction of variations to quantify the irregularity of the trajectories is natural. Let us outline an example of the use of formula (6.57) in the simplest of cases: the identification of the order of a FBM. Since the order is a global parameter, we can take w constantly equal to 1. By remembering that the trajectories of our processes are nearly C H Hölderian, we deduce from (6.57) that: VN ≈ N
1−2H
N −2 1 p ,ω C N p=0 N
(6.58)
where C( Np , ω) is the random Hölder constant associated with the trajectory X at the point Np . Indeed, the term: X p + 2 − 2X p + 1 + X p N N N ' p + 1 ( ' p + 1 p ( p + 2 −X − X −X = X N N N N and thus when N → +∞: 2 X p + 2 − 2X p + 1 + X p ≈ N −2H N N N and thus (6.58). In addition, we would like to apply a law of large numbers to the terms between brackets in (6.58). However, the random variables C( Np , ω) are not, in general, independent and only the asymptotic property of decorrelation of the fractional Brownian increments enables the use of a principle of the “law of large numbers” type. Formula (6.58) explains the expression of the estimator of H: ˆ N = 1 log2 VN/2 + 1 (6.59) H 2 VN Thanks to the theorem of the central limit related to the term within brackets in (6.58), we obtain the rate of the convergence of the estimators according to the discretization step N1 . We find, at this juncture of our reasoning, the factor which forces us to choose a generalized quadratic variation rather than a traditional quadratic variation: V˜N (w) =
N −1 p=1
p 2 p p + 1 X −X w N N N
228
Scaling, Fractals and Wavelets
Indeed, in their article [LEO 89], the authors note that, for H > 34 , no central limit theorem exists. We will see that the methodology presented for a FBM remains valid for the other models. 6.5.2. Application to the examples The starting point and the model most adapted to identification is that of the filtered white noises. For these processes, it is possible, not only to identify the first order parameters H1 and the modulation function a (thanks to the actually weighted quadratic variations), but also the parameters (H2 , b(x)) which are of the second-order as regards their influence on the local regularity of the trajectories. First, we will discuss in detail the arguments and the estimators valid for filtered white noises and then we will explain how the principles developed within this framework apply to more sophisticated models. 6.5.2.1. Identification of filtered white noise Let us quickly describe the stages for identifying of the parameters of a filtered white noise given by the formula of Definition 6.5: a(t) b(t) −itξ ˆ ∗ (dξ) − 1) + + R(t, ξ) W X(t) = (e 1 1 |ξ| 2 +H1 |ξ| 2 +H2 ˆ ∗ (dξ), we show By using isometric properties (6.21) of the Brownian measure W that: 1 a2 (t)w(t) dt (6.60) lim N 2H1 −1 E VN (w) = F (2H1 ) N →+∞
0
where: F (x) = 16
sin4 ( 2t ) dt |t|x+1
is defined for x ∈ ]0, 2[. Consequently, the estimate of the first-order parameters from a fine study of the variance of VN (w) is described by the following theorem, taken from [BEN 98b]. THEOREM 6.7.– Let X be a filtered white noise given by formula (6.27). If the weight function w is of class C 2 and is with dependence in ]0, 1[, then the estimators: VN/2 1 hatHN = log2 +1 (6.61) 2 VN
Locally Self-similar Fields
229
and: ˆ
N 2HN −1 VN (w) IˆN (w) = ˆN ) F (2H
(6.62)
almost surely converge when N → +∞ towards H1 and:
1
a2 (t)w(t) dt
I(w) = 0
Moreover: – if H2 − H1 > 12 , then: √ ˆ N − H1 and N H
√
N ˆ IN (w) − I(w) LogN
converge in distribution towards a centered random Gaussian variable; – if H2 − H1 12 , then: ˆ N − H1 2 CN 2(H1 −H2 ) E H and: 2 E IˆN (w) − I(w) CLog2 (N ) N 2(H1 −H2 ) As regards the estimate of the functional parameter, the preceding theorem can disappoint, which limits itself to proposing an estimate of integrals of a2 against the weight functions w. To rebuild a pointwise estimator of a(t) starting from these integrals, a general method will be found in [IST 96]. To understand the convergence speeds determined by Theorem 6.7, it is necessary to know that the convergences of the estimators reveal two types of error. One comes from the central limit theorem and intervenes in the estimate of the first-order parameters; we will call it stochastic error. On the other hand, the second-order disturbance in the filter defining the filtered white noise creates a distortion. If H2 − H1 > 12 , stochastic error is dominant over distortion. Otherwise, the convergence speed is imposed by distortion. The estimate of the second-order factors (b, H2 ) is more difficult because it necessitates that we build functions that do not depend asymptotically on the first-order factors. An example of such a functional is given by: V N − 22H1 −1 VN 2
230
Scaling, Fractals and Wavelets
The intervention of the factor 22H1 −1 is necessary to compensate for the influence of the first-order parameters exactly. On the other hand, it must be estimated and for this we will use the convergence of: V N2 2
lim
N →+∞
VN 2
= 22H1 −1
which is sufficiently rapid for the compensation to always take. An estimator of the parameter H2 is thus obtained. THEOREM 6.8.– If the function: def
WN = V N − 2
V N2 2
VN 2
VN
(6.63)
the estimator: 2 ,2 (N ) = 1 − 1 log2 VN /2 + log2 WN/2 H 2 2 VN 2 WN
(6.64)
converges a.s. towards H2 when N → +∞. 6.5.2.2. Identification of elliptic Gaussian random processes Although it is possible to directly identify the symbol of an elliptic Gaussian random process of the form: 1
1
σ(t, ξ) = a(t)|ξ| 2 +H1 + b(t)|ξ| 2 +H2 + p(ξ)
(6.65)
when 0 < H2 < H1 < 1, for a and b two strictly positive C 1 functions, and for p a c∞ function such that p(ξ) = 1 if |ξ| 1 and p(ξ) = 0 if |ξ| > 2 (see [BEN 94]); a comparison carried out in [BEN 97a] between filtered white noises and elliptic Gaussian random processes makes it possible to obtain the result more easily. Let us make several comments on the symbols which we identify. The symbols of form (6.65) verify Hypothesis 6.3 (H1 ). In particular, function p was introduced so that the elliptic inequality of order H1 in ξ = 0 would be satisfied. In fact, (6.65) should be understood as an expansion in the fractional power in high frequency (|ξ| → +∞) of a general symbol. The identification of the symbol of an elliptic Gaussian random process X comes from the comparison of X with the filtered white noise: −itξ e −1 ˆ ∗ * W (dξ) (6.66) Yt = σ(t, ξ) This explains that the order of the powers for an elliptic Gaussian random process is reversed compared to that which we have for a filtered white noise. The results of identification for the elliptic Gaussian processes can be summarized by recalling the following theorem.
Locally Self-similar Fields
231
THEOREM 6.9.– If X is an elliptic Gaussian random process of the symbol verifying (6.65) and: 3H − 1 1 < H2 sup 0, 2 then the estimators: ˜ N = 1 log2 VN/2 + 1 H 2 VN
(6.67)
ˆ
N 2HN −1 VN (w) J˜N (w) = ˜N ) F (2H
(6.68)
for w ∈ C 2 [0, 1] with support included in ]0, 1[ and: VN 2 /2 WN/2 1 1 (H + log2 − log2 2 )N = 2 2 VN 2 WN
(6.69)
where WN is defined by (6.63), converge almost surely when N → +∞ towards respectively: 1 w(t) dt, H2 H1 , J(w) = 0 a(t) It should be noted that, for the elliptic Gaussian random processes, an additional condition for the identification of the parameter H2 is found, which is (3H1 − 1)/2 < H2 . This hypothesis is not only technical; a similar hypothesis is found in [INO 76] for Markovian Gaussian fields of order p: only the monomials of higher degree of a polynomial symbols are identifiable. We can thus only hope, within our framework, to identify the principal part of the symbol σ. 6.5.2.3. Identification of MBM To identify the multifractional function of a MBM, the generalized quadratic variations must be suitably localized. Indeed, in this case, a pointwise estimator of h is proposed in [BEN 98a]. It is necessary for us, however, to insist on the fact that we can prove the convergence of the estimators only for regular multifractional functions: in this section, we will suppose that the following hypothesis is verified. HYPOTHESIS 6.5.– Function h is of class C 1 . This hypothesis is, of course, more restrictive than Hypothesis 6.4. Let us specify the principles of the localization of the generalized quadratic variations. A natural method consists of utilizing the weight function: w = 1[t0 ,t1 ]
for 0 < t0 < t1 < 1
232
Scaling, Fractals and Wavelets
We thus obtain a localized variation in the interval [t0 , t1 ] which we will note: VN (t0 , t1 ) = VN (w) =
(6.70)
p {p∈Z,t0 N t1 }
$ %2 p+1 p p+2 − 2X +X X N N N
(6.71)
We can now define an estimator: VN/2 (t0 , t1 ) 1 hN log2 +1 (t0 , t1 ) = 2 VN (t0 , t1 ) which converges, when N → +∞, towards: inf h(s), s ∈ ]t0 , t1 [
(6.72)
(a.s.)
Indeed, it is the worst Hölderian regularity which is dominating for this estimate. We deduce from this intermediate stage that we must reduce the size of the observation interval [t0 , t1 ] as N increases if we want to estimate h(t). Let us suppose: def
VN, (t) = VN (t − , t + ) VN/2, def 1 ˆ log2 h ,N (t) = 2 VN,
(6.73) (6.74)
and let us apply the general principles of section 6.5 to VN, (t): VN, (t) ≈
1 N
p N
C2
∈[t− ,t+ ]
p p , ω N 1−2h( N ) N
(6.75)
It is clear that the larger is, the smaller the stochastic error due to a law of large numbers, since a great number of variables is added up; however, for a large , we introduce a significant distortion by having replaced: N 1−2h(t)
by
p
N 1−2h( N )
The choice of the convergence speed of towards 0 is carried out in the following theorem, extracted from [BEN 98a]. THEOREM 6.10.– Let X be a MBM of harmonizable representation: −it·ξ 1 e −1 Bh (t) = W (dξ) 1 v h(t) |ξ| 2 +h(t)
Locally Self-similar Fields
233
associated with a multifractional function h verifying Hypothesis 6.5. For = N −α with 0 < α < 12 and N → ∞: ˆ ,N (t) −→ h(t) h
(a.s.)
For = N −1/3 : ˆ ,N (t) − h(t) 2 = O Log2 (N )N −2/3 E h In the preceding statement, we used the standardization 1/v(h(t)) but the result is unchanged if this factor is replaced by any function C 1 of t. In addition, the choice of = N −1/3 renders the contributions of the asymptotic error and of the distortion of the same order and thus, in a certain sense, asymptotically minimizes the upper bound obtained for the quadratic risk. 6.5.2.4. Identification of SFBMs In the case of a SFBM, the multifractional function to estimate is a piecewise constant function h given by (6.52) and we will build an estimator of: Θ0 = (a1 , . . . , aK ; H0 , . . . , HK ) starting from the quadratic variation: p + 1 p (2 1 ' p + 2 − 2Qh + Qh Qh V˜N (s, t) = N n n n p
(6.76)
s n
(t0 , t1 ) also apply here: The principles evoked for the MBM about the estimator hN 1 log VN (s, t) = inf h(s), s ∈ ]s, t[ N →+∞ −2 log(N ) lim
(a.s.)
(6.77)
We will note: def
fN (s, t) =
1 log VN (s, t) −2 log(N )
According to a technique of [BER 00] for rupture detection, we formulate the difference between the estimate fN on an interval of length A > 0 on the right of t and on the left of t: DN (A, t) = fN (t, t + A) − fN (t − A, t)
(6.78)
Let us suppose as known ν0 = mini=1,...,K−1 |ai+1 − ai | the minimal distance between two jumps of h, as well as a minor η0 of the absolute value of the magnitude
234
Scaling, Fractals and Wavelets
of jumps δi = Hi − Hi−1 . By taking A < ν0 , we obtain the convergence of DN (A, t) towards: δi 1[ai ,ai +A[ (t) D∞ (A, t) = i such that δi >0
+
(6.79) δi 1[ai −A,ai [ (t)
i such that δi <0
Since A < ν0 , the various ruptures intervene separately on function D∞ (A, t), by a slit of width A on the right of the rupture moment ai for the case of a positive jump (i.e., δi > 0), or on the left of the rupture moment ai in the case of a negative jump (i.e., δi < 0). Consequently, for any threshold η ∈ ]η0 /2, η0 [ and any size of window A ν0 , we estimate the first time of the positive jump of D∞ (A, ·) starting from the first moment Nl such that DN (A, l/N ) η, then the second by using the same method but deviating from A in relation to the first found in a more formal way, we suppose: (N )
τˆ1
=
1 min{l ∈ Z, DN (A, l/N ) η} N
where (N )
τˆ1 (N )
If τˆ
= +∞ when DN (A, l/N ) < η, ∀l ∈ Z
< +∞: (N )
τˆ+1 =
1 (N ) min{l ∈ Z, l/N τˆ + A and DN (A, l/N ) η} N
and (N )
ςˆ1
=
1 max{l ∈ Z, DN (A, l/N ) −η} N
where (N )
ςˆ1 (N )
If ςˆm
= −∞ when DN (A, l/N ) > −η, ∀l ∈ Z
> −∞:
(N )
ςˆm+1 =
1 (N ) max{l ∈ Z, l/N ςˆm − A and DN (A, l/N ) −η} N
By uniting the two preceding families and then sorting in ascending order, we (N ) (N ) obtain a family of estimators of the jump times of h: (ˆ a1 , . . . , a ˆκN ) and we estimate
Locally Self-similar Fields
235
(N ) (N ) ˆ (N ) = fN (ˆ the value of h at the jump moments by assuming H a1 −10A, a ˆ1 −5A) 0 (N ) (N ) (N ) ) ˆι ˆ κ(N and H = fN (ˆ aι + A/3, a ˆι+1 − A/3) for ι = 1, . . . , κN − 1 and H = N (N ) (N ) (N ) (N ) fN (ˆ a1 +5A, a ˆ1 +10A). To finish, we build the estimator ΘN = (ˆ a1 , . . . , a ˆκN ; ) ˆ (N ) , . . . , H ˆ κ(N H N ), whose consistency is established in [BEN 00]. 0
THEOREM 6.11.– Qh is a step fractional Brownian process of function of scale h(·) verifying (6.52). If, moreover, A < ν0 and η ∈ ]η0 /2, η0 [, then we have limN →+∞ ΘN = Θ0 a.s. with Θ0 = (a1 , . . . , aK ; H0 , . . . , HK ). 6.6. Bibliography [AYA 99] AYACHE A., L ÉVY V ÉHEL J., “Generalised multifractional Brownian motion: definition and preliminary results”, in D EKKING M., L ÉVY V ÉHEL J., L UTTON E., T RICOT C. (Eds.), Fractals: Theory and Applications in Engineering, Springer-Verlag, p. 17–32, 1999. [AYA 00] AYACHE A., C OHEN S., L ÉVY V ÉHEL J., “The covariance structure of multifractional Brownian motion, with application to long range dependence”, in Proceedings of ICASSP (Istanbul, Turkey), 2000. [AYA 04a] AYACHE A., B ENASSI A., C OHEN S., L ÉVY V ÉHEL J., “Regularity and identification of generalized multifractional Gaussian processes”, in Séminaire de Probabilités XXXVIII – Lecture Notes in Mathematics, Springer-Verlag Heidelberg, vol. 1857, p. 290–312, 2004. [AYA 04b] AYACHE A., L ÉVY V ÉHEL J., “On the identification of the pointwise Hölder exponent of the generalized multifractional Brownian motion”, in Stoch. Proc. Appl., vol. 111, p. 119–156, 2004. [BEN 94] B ENASSI A., C OHEN S., JAFFARD S., “Identification de processus Gaussiens elliptiques”, C. R. Acad. Sc. Paris, series, vol. 319, p. 877–880, 1994. [BEN 97a] B ENASSI A., C OHEN S., I STAS J., JAFFARD S, “Identification of elliptic Gaussian random processes”, in L ÉVY V ÉHEL J., T RICOT C. (Eds.), Fractals and Engineering, Springer-Verlag, p. 115–123, 1997. [BEN 97b] B ENASSI A., JAFFARD S., ROUX D., “Gaussian processes and pseudodifferential elliptic operators”, Revista Mathematica Iberoamericana, vol. 13, no. 1, p. 19–89, 1997. [BEN 98a] B ENASSI A., C OHEN S., I STAS J., “Identifying the multifractional function of a Gaussian process”, Statistic and Probability Letters, vol. 39, p. 337–345, 1998. [BEN 98b] B ENASSI A., C OHEN S., I STAS J., JAFFARD S., “Identification of filtered white noises”, Stoch. Proc. Appl., vol. 75, p. 31–49, 1998. [BEN 00] B ENASSI A., B ERTRAND P., C OHEN S., I STAS J., “Identification of the Hurst exponent of a step multifractional Brownian motion”, Statistical Inference for Stochastic Processes, vol. 3, p. 101–110, 2000. [BER 00] B ERTRAND P., “A local method for estimating change points: the hat-function”, Statistics, vol. 34, no. 3, p. 215–235, 2000.
236
Scaling, Fractals and Wavelets
[COH 99] C OHEN S., “From self-similarity to local self-similarity: the estimation problem”, in D EKKING M., L ÉVY V ÉHEL J., L UTTON E., T RICOT C. (Eds.), Fractals: Theory and Applications in Engineering, Springer-Verlag, p. 3–16, 1999. [FRI 97] F RISCH U, Turbulence, Cambridge University Press, 1997. [INO 76] I NOUÉ K., “Equivalence of measures for some class of Gaussian random fields”, J. Multivariate Anal., vol. 6, p. 295–308, 1976. [IST 96] I STAS J., “Estimating the singularity function of a Gaussian process with applications”, Scand. J. Statist., vol. 23, no. 5, p. 581–596, 1996. [KAH 85] K AHANE J.P., Some Random Series of Functions, Cambridge University Press, second edition, 1985. [LEO 89] L EON J.R., O RTEGA J., “Weak convergence of different types of variation for biparametric Gaussian processes”, in Colloquia Math. Soc. J. Bolayi no. 57, 1989. [LIF 95] L IFSHITS M.A., Gaussian Random Functions, Kluwer Academic Publishers, 1995. [MAN 68] M ANDELBROT B.B., VAN N ESS J.W., “Fractional Brownian motions, fractional noises, and applications”, SIAM Review, vol. 10, p. 422–437, 1968. [MEY 90a] M EYER Y., Ondelettes et operateurs, Hermann, Paris, vol. 1, 1990. [MEY 90b] M EYER Y., Ondelettes et operateurs, Hermann, Paris, vol. 2, 1990. [NEV 68] N EVEU J., Processus aléatoires Gaussiens, Montreal University Press, SMS, 1968. [PEL 96] P ELTIER R.F., L ÉVY V ÉHEL J., “Multifractional Brownian motion: definition and preliminary results”, 1996 (available at http://www-syntim.inria.fr/fractales). [SAM 94] S AMORODNITSKY G., TAQQU M.S., Stable Non-Gaussian Random Processes, Chapman & Hall, 1994. [YAG 87] YAGLOM A.M., Correlation Theory of Stationary and Related Random Functions. Volume I: Basic Results, Springer, 1987.
Chapter 7
An Introduction to Fractional Calculus
7.1. Introduction 7.1.1. Motivations We give some traditional example applications of fractional calculus and then we briefly point out the theoretical references. 7.1.1.1. Fields of application The modeling of certain physical phenomena, described as long memory, can be carried out by introducing integro-differentials terms with weakly singular kernels (i.e., locally integrable but not necessarily continuous like tα−1 when 0 < α < 1) in the equations of the dynamics of materials. This is very frequent, for example, in linear viscoelasticity with long memory, where a fractional stress-strain dynamic relation can be proposed: see [BAG 86] for viscoelasticity; [KOE 84, KOE 86] for a presentation a little more formalized; [BAG 91] for a rich and quite detailed example; [BAG 83a] for a modal analysis in forced mode or [BAG 85] for a modal analysis in transient state and finally, [CAP 76] for a modeling which utilizes an equation with partial derivatives with fractional derivative in time. There are also applications for modeling in chemistry of polymers [BAG 83b] or for modeling of dynamics at the interface of fractal structures: see [LEM 90] for the applied physical aspect and [GIO 92] for the theoretical physical aspect.
Chapter written by Denis M ATIGNON.
238
Scaling, Fractals and Wavelets
Moreover, fractional derivatives can appear naturally when a dynamic phenomenon is strongly conditioned by the geometry of the problem: a simple, very instructive example is presented in [TOR 84]. See in particular [CARP 97] for examples in continuum mechanics and [POD 99] for many applications in engineering sciences. 7.1.1.2. Theories A detailed historical overview of the theory of fractional derivatives is given in [OLD 74]; moreover, this work is undoubtedly one of the first attempts to assemble scattered results. Recently, a theoretical synthesis was proposed in [MIL 93], where certain algebraic aspects of fractional differential equations of rational order are completely developed. In mathematics, the Russian work [SAM 87] is authoritative; it compiles a set of unique definitions and theories. Pseudo-differential operators are mentioned in Chapter 7 of [TAY 96] and the first article on the concept of diffusive representation was, as far as we know, section 5 of [STA 94]. During the last 10 years, a number of themes have developed: see, in particular, the book [MAT 98c] for a general theoretical framework [MON 98], and for several applications derived from them. 7.1.2. Problems From a mathematical point of view, these integro-differential relations or convolutions with locally integrable kernels (or L1loc , i.e., absolutely integrable on any interval [a, b]) are not simple to treat: analytically, the singular character of kernel tα−1 (with 0 < α < 1) problematizes the use of theorems based on the regularity of the latter (in [DAUT 84a], for example, the kernels are always supposed to be continuous). Numerically, it is not simple to treat this singularity at the temporal origin (although that is a priori possible by carrying out an integration by parts, which artificially increases the order of derivation of the unknown function, while keeping a convolution with a more regular kernel). In the theories mentioned in section 7.1.1.2, several problems appear. First, the definition of fractional derivatives poses problems for orders higher than 1 (in particular, fractional derivatives do not commute, which is extremely awkward and, in addition, the composition of integration and fractional derivatives of the same order do not necessarily give the identity); this leads, in practice, to the use of rather strict calculations and a reintroduction a posteriori of the formal solution in the
An Introduction to Fractional Calculus
239
starting equation, to check the coherence of the result. Second, the question of initial conditions for fractional differential equations is not truly solved: we are obliged to define zero or infinite initial values. Lastly, the true analytical nature of solutions can be masked by closed-form solutions utilizing a great number of special functions, which facilitates neither the characterization of important analytical properties of these solutions nor their numerical simulation. The focus of our work concerns the theory of fractional differential equations (FDE): first, we clarify various definitions by using the framework of causal distributions (i.e., generalized functions whose support is the positive real axis) and by interpreting results on functions expandable in fractional power series of order α (α-FPSE); second, we clarify problems related to fractional differential equations by formulating solutions in a compact general form and third, we establish a strong bond with diffusive representations of pseudo-differential operators (DR of PDO), which is a nearly incontrovertible concept when derivation orders are arbitrary. Finally, we study the extension to several variables by treating a fractional partial differential equation (FPDE) which in fact constitutes a modal analysis of fractional order. 7.1.3. Outline This chapter is composed of four distinct sections. First, in section 7.2, we give definitions of the fundamental concepts necessary for the study and handling of fractional formalism. We recall the definition of fractional integration in section 7.2.1. We show in section 7.2.2 that the inversion of this functional relation can be correctly defined within the framework of causal distributions and we examine the fundamental solutions directly connected to this operator. Lastly, we adopt a definition which is easier to handle, i.e., a “mild” fractional derivative, so as to be able to use fractional derivatives on regular causal functions. We examine in section 7.2.3 the eigenfunctions of this new operator and show its structural relationship with a generalization of Taylor expansions for non-differentiable functions at the temporal origin, like the functions expandable in fractional power series. Then, in section 7.3, we are interested in the fractional differential equations. These are linear relations in an operator of fractional derivative and its successive powers; it appears naturally that the rational orders play an important role, since certain powers are in direct relationship with the usual derivatives of integer orders. We thus examine fractional differential equations in the context of causal distributions (in section 7.3.2) and functions expandable into fractional power series (in section 7.3.3). We then tackle, in section 7.3.4, the asymptotic behavior of the fundamental solutions of these fractional differential equations, i.e., the divergence in modulus, the pseudo-periodicity or convergence towards zero of the eigenfunctions
240
Scaling, Fractals and Wavelets
of fractional derivatives (which plays a similar role to that of the exponential function in the case of integer order). Finally, in section 7.3.5, we examine a class of controlled-and-observed linear dynamic systems of fractional order and approach some typical stakes of automatic control. Then, in section 7.4, we consider fractional differential equations in one variable but when orders of derivations are not commensurate: there are no simple algebraic tools at our disposal in the frequency domain and work carried out in the case of commensurate orders does not apply any more. We further examine the strong bond which exists with diffusive representations of pseudo-differential operators. We give some simple ideas and elementary properties and then we present a general result of decomposition for the solutions of fractional differential equations into a localized or integer order part and a diffusive part. In section 7.5, finally, we show that the preceding theory in the time variable (which appeals, in the commensurate case, to polynomials and rational fractions in frequency domain) can extend to several variables in the case of fractional partial differential equations (we obtain more general meromorphic functions which are not rational fractions). With this intention, we treat an example conclusively: that of the partial differential wave equation with viscothermal losses at the walls of the acoustic pipes, i.e. an equation which reveals a time derivative of order three halves. Throughout this chapter, we will treat the half-order as an example, in order to clarify our intention. This chapter has been inspired by several articles and particularly [AUD 00, MAT 95a]. This personal work is also the fruit of collaborations with various researchers including d’Andréa-Novel, Audounet, Dauphin, Heleschewitz and Montseny. More recently, new co-authors have helped enlarge the perspective of our work: let us mention Hélie, Haddar, Prieur and Zwart. 7.2. Definitions 7.2.1. Fractional integration The primitive, canceling at initial time t = 0, reiterated an integer number n of times I n f , of an integrable function f , is nothing other than the convolution of f with a polynomial kernel Yn (t) = tn−1 + /(n − 1)!: τ1 t τn−1 def dτ1 dτ2 · · · f τn dτn I n f (t) =
0
0
0
= Yn f (t). By extension, we define [MIL 93] the primitive I α f of any order α > 0 by using function Γ of Euler which extends the factorial.
An Introduction to Fractional Calculus
241
DEFINITION 7.1.– The primitive of order α > 0 of causal f, locally integrable, is given by: def
I α f (t) = (Yα f )(t)
where we have set Yα (t) =
tα−1 + . Γ(α)
(7.1)
PROPOSITION 7.1.– The property Yα Yβ = Yα+β makes it possible to write the fundamental composition law: I α ◦ I β = I α+β for α > 0 and β > 0. Proof. To establish the property, it is enough to check that the exponents coincide, the numerical coefficient coming from the properties of function Γ: t (t − τ )α−1 τ β−1 dτ (Yα Yβ )(t) ∝ 0
= tα+β−1
1
(1 − x)α−1 xβ−1 dx
0
∝ Yα+β (t). To establish the fundamental composition law, it is enough to use the fact that the convolution of functions is associative, from where: def
I α+β f = Yα+β f = (Yα Yβ ) f = Yα (Yβ f ) = I α {I β f }. PROPOSITION 7.2.– The Laplace transform of Yα for α > 0 is: L[Yα ](s) = s−α
for e(s) > 0
(7.2)
i.e., with the right-half complex plane as a convergence strip. Proof. A direct calculation for s > 0 provides: +∞ +∞ α−1 t 1 def e−st dt = s−α xα−1 e−x dx = s−α L[Yα ](s) = Γ(α) Γ(α) 0 0 according to the definition of Γ; the result is continued to e(s) > 0 by analyticity. NOTE 7.1.– We see in particular that the delicate meaning given to a fractional power of the complex variable s is perfectly defined: s → sα indicates the analytical continuation of the power function on positive reals. It is the principal determination of the multiform function s → sα ; it has Hermitian symmetry.
242
Scaling, Fractals and Wavelets
PROPOSITION 7.3.– For a causal function f which has a Laplace transform in e(s) > af , we have: L[I α f ](s) = s−α L[f ](s)
for e(s) > max(0, af ).
(7.3)
Proof. This follows from the fact that the Laplace transform transforms a convolution into a product of the Laplace transforms and Proposition 7.2. In particular, we can prove Proposition 7.1 very simply, when we see that s−α−β = s s for e(s) > 0. −α −β
EXAMPLE 7.1.– For f locally integrable, i.e. f ∈ L1loc , we thus obtain: t 1 1 √ f (t − τ ) dτ. I 2 f (t) = πτ 0 7.2.2. Fractional derivatives within the framework of causal distributions 7.2.2.1. Motivation The idea of fractional derivatives of causal functions (or signals) is to obtain an inverse formula to that of fractional integration defined by (7.1), i.e.: f = I α (Dα f ) This is a rather delicate problem; it can be solved by calling upon the theory of Volterra integral equations (for example, see [KOE 84]). However, one of the major problems is the composition law or the law of exponents and particularly because fractional derivatives and integrals do not always commute, which poses delicate practical problems. That is why we propose to carry out the inversion of space of (7.1) within the more general framework of causal distributions (i.e., D+ distributions whose support is the positive real axis of the time variable), while referring to [SCH 65] in particular, even if it means returning later, in section 7.2.3, to an interpretation in terms of internal operation in a class of particular functions. 7.2.2.1.1. Passage to the distributions def
By following Definition 7.1, we pose naturally I 0 f = f , which gives, according to (7.1), f = Y0 f . It is clear that no locally integrable function Y0 can be a solution of the preceding convolution equation; on the other hand, the Dirac distribution is the neutral element of convolution of distributions [SCH 65]. From where necessarily: def
Y0 = δ and convolution in equation (7.1) is to be taken in the sense of distributions.
(7.4)
An Introduction to Fractional Calculus
243
7.2.2.1.2. Framework of causal distributions We could place ourselves within the framework of distributions, but convolution (which is the basic functional relations for invariant linear systems) is not, in general, associative. When the supports are limited from below (usually the case when we are interested in causal signals), we obtain the property known as convolutive supports, which enables the associative convolution property. , which is a convolution algebra, there is an associative property Therefore, in D+ of the convolution product, and the convolutive inverse of a distribution, if it exists, is unique (see lesson 32 of [GAS 90]), which allows a direct use of the impulse response h of a causal linear system. Indeed, let us consider the general convolution equation : in the unknown y ∈ D+
P y =x
(7.5)
represents the system and x ∈ D+ the known causal input; i.e. h, the where P ∈ D+ impulse response of the system, defined by:
P h = δ. Then, y = h x is the solution of equation (7.5); indeed: P y = P (h x) = (P h) x = δ x = x thanks to the associative convolution property of causal distributions. We thus follow [GUE 72] to define fractional derivatives Dα . DEFINITION 7.2.– The derivative in the sense of causal distributions of f ∈ D+ is: def
Dα f = Y−α f
where we have Y−α Yα = δ
(7.6)
. i.e., Y−α is the convolutive inverse of Yα in D+
At this stage, the problem is thus to identify the causal distribution Y−α , which we know could not be a function belonging to L1loc . Let us give its characterization by Laplace transform. PROPOSITION 7.4.– The Laplace transform of Y−α for α > 0 is: L[Y−α ](s) = sα
for e(s) > 0
i.e., with a right-half complex plane as the convergence strip.
(7.7)
244
Scaling, Fractals and Wavelets
Proof. We initially use the fact that, within the framework of causal distributions, the Laplace transform of a convolution product is the product of the Laplace transforms, which we apply to definition Y−α using (7.6), by taking into account Proposition 7.2 and L[δ](s) = 1, i.e.: s−α L[Y−α ](s) = 1 ∀s, e(s) > 0 which proves, on the one hand, the existence and, on the other hand, the declared result. NOTE 7.2.– We read on the behavior at infinity of the Laplace transform that Y−α will be less regular the larger α is. PROPOSITION 7.5.– The property Y−α Y−β = Y−α−β makes it possible to write the fundamental composition law: Dα ◦ Dβ = Dα+β for α > 0 and β > 0. PROPOSITION 7.6.– For a causal distribution f which has a Laplace transform in e(s) > af , we have: L[Dα f ](s) = sα L[f ](s)
for e(s) > max(0, af ).
(7.8)
Finally, we obtain the following fundamental result. PROPOSITION 7.7.– For α and β two real numbers, we have: – the property Yα Yβ = Yα+β ; – the fundamental composition law I α ◦ I β = I α+β ; by taking as notation convention I α = D−α when α < 0. EXAMPLE 7.2.– We seek to clarify the half-order derivation: we first calculate the distribution Y−1/2 , then we calculate D1/2 [f Y1 ] where f is a regular function. From the point of view of distributions, we can write Y−1/2 = D1 Y1/2 , where D1 is the derivative in the sense of distributions; maybe, by taking ϕ ∈ C0∞ a test function: ∞ 1 √ ϕ (t) dt #Y−1/2 , ϕ$ = #D1 Y1/2 , ϕ$ = −#Y1/2 , ϕ $ = − Γ(1/2) t 0 ∞ ∞ 1 1 1 lim √ ϕ(t) + ϕ(t) dt =− Γ(1/2) ε→0 2 t3/2 t ε ε ∞ 1 2 1 √ lim =− ϕ(t) dt − ϕ(ε) 2 Γ(1/2) ε→0 ε t3/2 ε 0 / 1 −3/2 pf (t+ ), ϕ = Γ(−1/2)
An Introduction to Fractional Calculus
245
where pf indicates the finite part within the Hadamard concept of divergent integral. We thus obtain the result, which is not very easy to handle in practice: −3/2
Y−1/2 =
pf (t+ ) . Γ(−1/2)
Let us now calculate the derivative of half-order of a causal distribution f Y1 where Y1 is the Heaviside distribution and f ∈ C 1 . Then, we have D1 [f Y1 ] = f Y1 + f (0)δ, from where, by taking into account D1/2 = I 1/2 ◦ D1 : def
D1/2 [f Y1 ] = Y−1/2 [f Y1 ] = Y1/2 D1 [f Y1 ] = Y1/2 [f Y1 ] + f (0)Y1/2 t 1 1 √ f (t − τ ) dτ + f (0) √ . = πτ πt 0 where two terms appear: the first is a convolution of L1loc functions and it is a regular term, i.e., continuous in t = 0+ ; the second is a function which diverges in t = 0+ , while remaining L1loc . Moreover, the preceding formulation√remains valid if we have f ∈ C 0 and f ∈ L1loc : i.e., for example, for t → Y3/2 (t) ∝ t, for which it is easy to check that we have D1/2 Y3/2 = Y1 , in other words the constant 1 for t > 0. PROPOSITION 7.8.– In general, for f ∈ C 0 such that f ∈ L1loc and 0 < α < 1: Dα [f Y1 ] = Y1−α [f Y1 ] + f (0) Y1−α . 7.2.2.2. Fundamental solutions We define operator Dα in the space D+ of causal distributions. Let us now seek α 1 the fundamental solution of operator D − λ.
DEFINITION 7.3.– The quantity Eα (λ, t) is the fundamental solution of Dα − λ for the complex value λ; it fulfills by definition: Dα Eα (λ, t) = λEα (λ, t) + δ.
(7.9)
PROPOSITION 7.9.– The quantity Eα (λ, t) is given by: ∞ ! " λk Y(1+k)α (t). Eα (λ, t) = L−1 (sα − λ)−1 , e(s) > aλ =
(7.10)
k=0
1. It is the extension of property D1 eλt Y1 (t) = λeλt Y1 (t) + δ in the case of integer order.
246
Scaling, Fractals and Wavelets
Proof. Let us take the Laplace transform of (7.9); it is: (sα − λ)L[Eα (λ, t)](s) = 1 for e(s) > 0 from where, for e(s) > aλ : ! " L Eα (λ, t) (s) = (sα − λ)−1 = s−α (1 − λs−α )−1 = s−α
+∞
(λs−α )k
for |s| > |λ|1/α
k=0
=
+∞
λk s−(1+k)α .
k=0
By taking the inverse Laplace transform term by term, see [KOL 69], and by using Proposition 7.2, we then obtain the result announced in the time domain (the series of functions (7.10) is normally convergent on every compact subset). EXAMPLE 7.3.– Let us examine the particular cases of integer and half-integer orders. On the one hand, for α = 1, we obtain the causal exponential: E1 (λ, t) = eλt Y1 (t) as fundamental solution of the operator D1 − λ within the framework of causal distributions. In addition, for α = 12 , we obtain: E1/2 (λ, t) =
+∞
λk Y 1+k 2
k=0
=
+∞ k=0
k−1
λk
t 2 Γ( k+1 2 )
√ +∞ (λ t)k = Y1/2 + λ Γ(1 + k2 ) k=0 which is the sum of an L1loc function (i.e. Y1/2 ) and of a power series in the variable √ λ t, which is thus a continuous function. 7.2.3. Mild fractional derivatives, in the Caputo sense 7.2.3.1. Motivation For 0 < α < 1, we saw, according to Proposition 7.8, that Dα f was not continuous in t = 0+ and that, even when we have f ∈ C 1 , which might at first seem slightly
An Introduction to Fractional Calculus
247
paradoxical, it would be preferred that Dα f is defined, in a certain sense, between f and f . Moreover, we have just seen that the fundamental solutions Eα (λ, t) are not continuous at the origin t = 0+ ; from the analytical point of view, that is likely a priori to be awkward when initial values are given in a fractional differential equation. For α > 1, the analytical situation worsens, since the objects which are handled become very rapidly distributions which move away from regular functions; for example: Y−n = δ (n)
for n a natural integer.
(7.11)
These considerations justify the description of “mild” applied to fractional derivatives dα , which we now define. 7.2.3.2. Definition We are naturally led to extract from the preceding definitions the more regular or milder parts, according to the example introduced in [BAG 91]. The definition we propose does actually coincide with that given by Caputo in [CAP 76]. DEFINITION 7.4.– For a causal function f and continuous from the right at t = 0: def
dα f = Dα f − f (0+ ) Y1−α .
(7.12)
In particular, if f ∈ L1loc , then dα f = Y1−α [f Y1 ]. For 0 < α < 1, to some extent, we extract from f (continuous but non-derivable) an intermediate degree of regularity (connected to the Hölder exponent of f at 0). It appears that dα f can be a continuous function, for example, when f ∈ C 0 and f ∈ L1loc , which was not the case for the fractional derivative in the sense of distributions Dα f . Moreover, we can note that, in the case of the integer order, the second derivative of function f is defined like the derivative of f , i.e., exactly like def the following iteration of the operator of derivation; in other words, d2 = (d1 )◦2 . In view of these remarks, we propose the following definition. DEFINITION 7.5.– For 0 < α 1, we will say that f is of class Cαn if all the n sequentially mild derivatives of order alpha of f exist and are continuous, even at t = 0, i.e.: (dα )◦k f ∈ C 0 for 0 k n. The idea of sequentiality is introduced in a completely formal manner in Chapter 6 of [MIL 93] and more as a curiosity than something fundamentally coherent. Moreover, it is not the same dα which is used, but a definition which coincides with Dα for certain classes of functions, for 0 < α < 1. However, one of the inherent difficulties in the definition used is that the fundamental composition law is lost, whereas Dnα = (Dα )◦n is obtained immediately according to Proposition 7.5.
248
Scaling, Fractals and Wavelets
EXAMPLE 7.4.– For α = 12 , let us apply successively D1/2 to f causal given by: √ f = b0 + b1 t + b2 t + b3 t3/2 + b4 t2 . Let us reformulate this expansion on the basis of Yk/2 ; it becomes: f = a0 Y1 + a1 Y3/2 + a2 Y2 + a3 Y5/2 + a4 Y3 D1/2 f = a0 Y1/2 + a1 Y1 + a2 Y3/2 + a3 Y2 + a4 Y5/2 d1/2 f
D1 f = a0 Y0 + a1 Y1/2 + a2 Y1 + a3 Y3/2 + a4 Y2
(d1/2 )2 f
d1 f
D3/2 f = a0 Y−1/2 + a1 Y0 + a2 Y1/2 + a3 Y1 + a4 Y3/2 (d1/2 )3 f
D2 f = a0 Y−1 + a1 Y−1/2 + a2 Y0 + a3 Y1/2 + a4 Y1
(d1 )2 f
(d1/2 )4 f
iff
a1 =0
The interest of the operator d1/2 and its successive powers (noted from now on by (d ) instead of (d1/2 )◦k to make the writing less cumbersome) is manifest here. Indeed, with the choice of f considered, (d1/2 )k f are continuous functions at t = 0+ ; 4 ∞ and even f ∈ C1/2 since we have according to Definition 7.5, we see that f ∈ C1/2 1/2 5 1/2 k (d ) f ≡ 0! Moreover, we obtain the property ak = [(d ) f ](t = 0+ ) and thus the following formula: 1/2 k
f=
4
ak Y1+ k 2
with ak = [(d1/2 )k f ](t = 0+ ).
k=0
This is a kind of fractional Taylor expansion of f causal, in the vicinity of 0, which we will generalize in section 7.2.3.4. 7.2.3.3. Mittag-Leffler eigenfunctions We defined operator dα and noted that it acted in an internal way in the class of functions: we can naturally seek the eigenfunctions of this operator in this class of functions.
Cα∞
An Introduction to Fractional Calculus
249
DEFINITION 7.6.– For 0 < α 1, Eα (λ, t) is the eigenfunction of dα for the complex eigenvalue λ, initialized at 1; it fulfills, by definition: α d Eα (λ, t) = λEα (λ, t), (7.13) Eα (λ, 0+ ) = 1. PROPOSITION 7.10.– The quantity Eα (λ, t) is given by: ! " Eα (λ, t) = I 1−α Eα (λ, t) = L−1 sα−1 (sα − λ)−1 , e(s) > aλ =
∞
λk Y1+αk (t) = Eα (λtα +)
(7.14)
k=0
where Eα (z) is the Mittag-Leffler monogenic function defined by the power series: def
Eα (z) =
+∞ k=0
zk . Γ(1 + αk)
(7.15)
Proof. It is enough to express (7.13) by using Definition 7.4 of dα , i.e.: Dα Eα (λ, t) = λEα (λ, t) + 1Y1−α from where, under the terms of Definition 7.3 of Eα (λ, t) as a fundamental solution of operator (Dα − λ), we obtain as the solution of the preceding equation with second term the a priori causal distribution: Eα (λ, t) = Y1−α (·) Eα (λ, ·) (t) ∞ k = Y1−α λ Y(1+k)α (t) k=0
=
∞
λk Y1+αk (t)
k=0
=
∞ k=0
λk tαk + Γ(1 + αk)
= Eα (z = λtα +) The result sought in the causal distributions is thus, as stated, a continuous function directly connected to the Mittag-Leffler monogenic functions [MIT 04]. EXAMPLE 7.5.– Let us examine the particular cases of the integer and half-integer orders.
250
Scaling, Fractals and Wavelets
On the one hand, for α = 1, we obtain the causal exponential: E1 (λ, t) = eλt Y1 (t) as eigenfunction of usual derivation d1 (which actually belongs to the class of C1∞ functions). On the other hand, for α = 12 , we obtain: E1/2 (λ, t) =
+∞
λk Y1+ k 2
k=0
√ +∞ (λ t)k = Γ(1 + k2 ) k=0
√ = exp (λ2 t) [1 + erf (λ t)]
where erf is the error function (to be evaluated in all the complex plane). 7.2.3.4. Fractional power series expansions of order α (α-FPSE) The series expansion of functions Eα (λ, t) highlighted in Proposition 7.10 suggests the following definition naturally. DEFINITION 7.7.– For 0 < α 1, the sequence (ak )k0 of complex numbers makes it possible to define the formal series: f (t) =
∞
ak Y1+αk (t)
(7.16)
k=0
which takes an analytical meaning of fractional power series expansion of order α (α-FPSE), as soon as |ak | are bounded from above by a geometric sequence for example; the uniform convergence of the series of functions then takes place on every compact subset of [0, +∞[. PROPOSITION 7.11.– Any expandable function in fractional power series of order α is of class Cα∞ ; and it fulfills, in particular: ak = [(dα )k f ](0+ )
for
k0
(7.17)
Proof. The proof is provided in a way similar to the calculation in the example studied in section 7.2.3.2 for a series comprising a finite number of terms. There is no problem of commutation between the operator dα and the infinite summation, since the series of functions of class Cα∞ converges uniformly on every compact subset: in other words, term by term derivation dα is perfectly licit.
An Introduction to Fractional Calculus
251
NOTE 7.3.– Just as any function of class C ∞ is not necessarily expandable in power series (PSE), any function of class Cα∞ is also not necessarily expandable in fractional power series of order α. We will introduce the function later on: ! √ " ψ 1 (t) = L−1 e− s , e(s) > 0 ∝ t−3/2 exp(−1/4t) Y1 (t), ∞ which is of class C1/2 but which is not expandable in fractional power series of half-order.
7.3. Fractional differential equations In Chapter 5 of [MIL 93], examples of fractional differential equations are examined and we note, in particular, problems of initial value (0 or +∞). In Chapter 6 of [MIL 93], the vectorial aspect is considered (we will start with that) and the idea of sequentiality is present, from a rather formal point of view which, for us, involves reserves of an analytical nature which we already stated. In section 7.3, we commence by treating and analyzing an example, which justifies the resolution of fractional differential equations within two quite distinct frameworks: causal distributions in section 7.3.2 and functions with a fractional power series expansion of order α in section 7.3.3; we are finally concerned, in section 7.3.4, with the asymptotic behavior of the solutions of fractional differential equations, which is a question connected with the basic concept of stability. 7.3.1. Example An integro-differential equation (in y, Y1/2 y , y for example, where y of class C 1 is sought) can be written either with derivatives in the sense of distributions (D1/2 y, D1 y), or with mild derivatives (d1/2 y, (d1/2 )2 y), which makes d1 y disappear. Let us clarify the passage in a particular framework for integro-differential equation with right-hand side: y (t) + c1 (Y1/2 y )(t) + c2 y(t) = x(t)
for t > 0,
with y(0) = a0 .
(7.18) (7.19)
7.3.1.1. Framework of causal distributions In D+ (7.18)-(7.19) is written in a single equation which uses the initial condition, i.e.:
D1 [yY1 ] + c1 D1/2 [yY1 ] + c2 yY1 = xY1 + a0 {Y0 + c1 Y1/2 } which can be vectorially formulated in the following manner: y y 0 1 0 1/2 = + D −c2 −c1 D1/2 y xY1 + a0 {Y0 + c1 Y1/2 } D1/2 y
(7.20)
(7.21)
252
Scaling, Fractals and Wavelets
and is thus solved simply by defining E1/2 (Λ, t) by the power series in the square matrix (exactly as for the matrix exponential): def
E1/2 (Λ, t) =
+∞
k
Λ Y(k+1)/2 = Y1/2 I + Λ
k=0
+∞ k=0
√
Λ
k
k
t Γ(1 + k2 )
from where the solution of (7.21) in D+ , which we will develop in section 7.3.2, is obtained: (7.22) D1/2 y = Λy + xD ⇐⇒ y(t) = E1/2 (Λ, ·) xD (t).
The notations are obvious; let us specify only that the index D of xD means that the vector contains not only the second member x, but also distributions related to initial condition a0 . 7.3.1.2. Framework of fractional power series expansion of order one half We work now with the mild derivative of order one half and seek a function y of 2 which is also of class C 1 ; equation (7.20) is then written in an equivalent class C1/2 manner: (d1/2 )2 y + c1 d1/2 y + c2 y = x for t > 0, y(0) = a0 with ! 1/2 " d y (0) = 0 which can be vectorially formulated in the following way: y y 0 1 0 + d1/2 1/2 = −c2 −c1 d1/2 y x d y y a0 with (0) = 0 d1/2 y
for t > 0,
(7.23) (7.24)
(7.25) (7.26)
and is solved by defining E1/2 (Λ, t) by the power series in the square matrix Λ: def
E1/2 (Λ, t) =
+∞
√ Λk Y1+ k = E1/2 (Λ t) 2
k=0
from where, with obvious notations, the solution of (7.25) and (7.26), which we will develop in section 7.3.3, is obtained: (7.27) d1/2 y = Λy + x ⇔ y(t) = E1/2 (Λ, t)y(0) + E1/2 (Λ, ·) x (t).
An Introduction to Fractional Calculus
253
7.3.1.3. Notes Under initial vectorial condition (7.26) or under two initial scalar conditions (7.24), set the non-integer order initial condition to zero to ensure the C 1 regularity of the solution of the physical starting problem (7.18), which has only one physical initial condition given by (7.19). We obtain a response to the initial condition which is of ∞ , but of class C1k for k = 0, 1 only; it is the same for the impulse response h class C1/2 to the input x. However, in presenting a general theory of fractional differential equations, nothing prevents us from considering [d1/2 f ](0) = a1 as an independent parameter, which will make it possible to speak about response to the integer or non-integer initial conditions ai . Thus, the problem, in general, would be, instead of (7.23)-(7.24): 1/2 2 d y + c1 d1/2 y + c2 y = x for t > 0, ⎧ ⎨y(0) = a0 with ⎩!d1/2 y "(0) = a 1 or, instead of (7.20): D1 [yY1 ] + c1 D1/2 [yY1 ] + c2 yY1 = xY1 + a0 {Y0 + c1 Y1/2 } + a1 Y1/2 . NOTE 7.4.– In terms of application to physics, the rational case α = p1 is interesting; the relations between (d1/p )np y and (d1 )n y will indeed have to be clarified. However, it is rather the case of commensurate orders of derivation which is suitable for an algebraic treatment in general, which is a treatment in every respect analogous to that carried out for 12 . PROPOSITION 7.12.– Any scalar fractional differential equation of commensurate order with α of degree n can be brought back to a vectorial fractional differential equation of order α of degree 1 in dimension n. Proof. It is valid for a fractional differential equation in Dα as for a fractional differential equation in dα , since we have the crucial property of sequentiality. We have just seen it on an example of order 12 and degree 2; the proof, in general, is straightforward. We thus give results directly in vectorial form later on, i.e., by extracting the first component from the vector solution; in other words, the solution of the scalar problem.
254
Scaling, Fractals and Wavelets
7.3.2. Framework of causal distributions DEFINITION 7.8.– By definition, we have: def
Eα (Λ, t) =
+∞
Λk Y(1+k)α (t)
(7.28)
k=0
The matrix Eα (Λ, t) is expressed like a power series in the matrix Λ which, after reduction of the latter (eigenvalues λi of multiplicity mi ), is made explicit on the basis of fundamental solutions and their successive convolutions Eαj (λi , t), with 1 j mi . It is a first extension to the fractional case of the matrix exponential concept. PROPOSITION 7.13.– For the j-th times convolution of fundamental solutions, we have: Eαj (λ, t) = L−1 [(sα − λ)−j , e(s) > aλ ] j−1 ∂ 1 = L−1 (sα − λ)−1 , e(s) > aλ (j − 1)! ∂λ j−1 ∂ 1 Eα (λ, t) = (j − 1)! ∂λ =
+∞
j−1 Cj−1+k λk Y(j+k)α (t).
k=0
Proof. It is a formal calculation without much interest; let us note the use of the parametric derivative with respect to the complex parameter λ. In the integer case (α = 1), it is written simply E1j (λ, t) = Yj (t)E1 (λ, t), which can prove itself directly by using the following convolution property of causal functions f and g: " ! f (τ )eλτ g(τ )eλτ (t) = [f (τ ) g(τ )](t) eλt .
PROPOSITION 7.14.– We have: Dα y = Λy + xD
⇐⇒
y(t) = Eα (Λ, ·) xD (t)
(7.29)
where vector xD contains, on the one hand, distributions related to the initial conditions of vector y and, on the other hand, a regular function x or right-hand side.
An Introduction to Fractional Calculus
255
Proof. It derives from Eα (Λ, t), which is the fundamental solution of the matrix operator (Dα I − Λ); this can be achieved by Laplace transform, as for Proposition 7.9. The fundamental relation is established (see Definition 7.3): Dα Eα (Λ, t) = ΛEα (Λ, t) + I δ
(7.30)
from where the announced result is obtained. 7.3.3. Framework of functions expandable into fractional power series (α-FPSE) DEFINITION 7.9.– By definition, we have: def
Eα (Λ, t) =
+∞
Λk Y1+kα (t).
(7.31)
k=0
The matrix Eα (Λ, t) is expressed like a power series in the matrix Λ which, in the case where Λ is diagonalizable (eigenvalues λi ), is made explicit on the basis of eigenfunctions Eα (λi , t), with 1 i n. It is the other extension to the fractional case of the matrix exponential concept. PROPOSITION 7.15.– We have: dα y = Λy + x ⇐⇒ y(0+ ) = y 0
y(t) = Eα (Λ, t)y 0 + Eα (Λ, ·) x (t)
(7.32)
where, this time, vector x is a continuous function (or input) which controls the fractional differential system. Proof. By using Definition 7.4, the left-hand side of (7.32) becomes: Dα y = Λy + x + Y1−α y 0 . We use Proposition 7.14 then, by taking: xD (t) = x(t) + Y1−α (t) y 0 . By noting that:
Eα (Λ, t) = Y1−α (·) Eα (Λ, ·) (t).
(7.33)
We then find the right-hand side of (7.32) to be the solution. By reinterpreting this result on a scalar fractional differential equation of degree n, it appears that y 0 is the vector of the n first coefficients of the expansion in fractional power series of order α of the solution y; in other words, the vector of the fractional order initial conditions.
256
Scaling, Fractals and Wavelets
To establish the link with physics, when α = p1 , it is advisable to initialize to 0 the fractional order initial conditions and to give the values of traditional initial position and velocity to the integer terms (on the example in y, Y1/2 y , y , of order 12 and degree 2, we took y(0) = y0 and d1/2 y(0) = 0; thus, the response to the only initial ∞ which is C 1 without being C 2 ). conditions is a function C1/2 Within this framework, it is then possible to treat fractional differential equations-α in an entirely algebraic way, by introducing the characteristic polynomial in the variable σ: P (σ) = σ n + cn−1 σ n−1 + · · · + c0 =
r )
(σ − λi )mi
i=1
of the fractional differential equation with the right-hand side: (dα )n y + cn−1 (dα )n−1 y + . . . + c0 y = x The responses hk (t) to the various initial conditions ak = [(dα )k y](0) for 0 k n − 1 and the impulse response h(t) of the system are linear combinations of Eαj (λi , t), with 1 j mi and 1 i r, which can be made explicit by the method of the unknown coefficients, or by algebraic means; in the generic case of distinct roots, for example, we obtain: n 1 1 def −1 = Eα (λ1 , t) · · · Eα (λn , t) = E (λ , t) h(t) = L (λ ) α i P (sα ) P i i=1 EXAMPLE 7.6.– Let us take again the example stated in a general way in section 7.3. Let λ1 , λ2 be the roots of P (σ) = σ 2 + c1 σ + c2 . The general solution of the system is given as: y(t) = (h x)(t) + a1 h1 (t) + a0 h0 (t) with the impulse response: 2 2 √ 1 λi 1 E E = (λ , t) = (λ t) h(t) = L−1 i i 1/2 1/2 P (λi ) P (λi ) P (s1/2 ) i=1 i=1 the response to the initial condition (half-integer) a1 : h1 (t) = I 1/2 h =
2 i=1
1 P (λ
i)
√ E1/2 (λi t)
and the response to the initial condition (integer) a0 : h0 (t) = D1/2 h1 + c1 h1 = h + c1 h1 =
2 λi + c1 i=1
P (λ
i)
√ E1/2 (λi t).
An Introduction to Fractional Calculus
257
j NOTE 7.5.– When there is a double root λ1 , the preceding expressions use E1/2 (λ1 , t) for j = 1, 2; they also give rise to algebraic simplifications which finally reveal √ E1/2 (λ1 , t) and t E1/2 (λ1 , t).
7.3.4. Asymptotic behavior of fundamental solutions 7.3.4.1. Asymptotic behavior at the origin We saw that Eα (λ, t) has an integrable singularity at the origin; we find the general result according to: PROPOSITION 7.16.– When t → 0+ : Eαj (λ, t) ∼ Yjα (t) =
tjα−1 + ∈ L1loc . Γ(jα)
Proof. This follows from Proposition 7.13; the equivalent in 0+ is deduced from it immediately. 7.3.4.2. Asymptotic behavior at infinity At the beginning of the 20th century, the mathematician Mittag-Leffler was interested in functions Eα (z) (for reasons unconnected with fractional calculus [MIT 04]): concerning asymptotic behavior when |z| → +∞ when α < 2, he established the exponential divergence type of the sector of the complex plane | arg z| < α π2 and convergence towards 0 outwards; the nature of convergence towards 0 and the asymptotic behavior on the limit were not examined. We found later, in the middle of the last century [BAT 54], the very nature of the convergence towards 0 for | arg z| > α π2 . We reuse similar results on the fundamental solutions and extend them, on the one hand, to the limit | arg λ| = α π2 and, on the other hand, to the successive convolutions ∗j α (λ, t). PROPOSITION 7.17.– The asymptotic behavior (when t → +∞) of the fundamental solutions of Dα − λ and their convolutions, which structurally appear in the solutions of fractional differential equations of order α (as the basis of {polynomials in t} × {exp(λt)} in the case of the integer order α = 1) is given by the position of the eigenvalues λ in the complex plane, which holds the place of the fractional spectral domain: – for |arg(λ)| < α π2 , Eαj (λ, t) diverges in an exponential way (more precisely {polynomial in tα } × {exp(λ1/α t)}); – for |arg(λ)| = α π2 , Eα1 (λ, t) is asymptotically oscillatory and Eαj (λ, t) with j 2 diverges in an oscillating polynomial way (in tα );
258
Scaling, Fractals and Wavelets
– for |arg(λ)| > α π2 , we obtain Eαj (λ, t) ∼ kj,α λ−1−j t−1−α . In this latter case, we note that Eαj (λ, t) ∈ L1 (]0, +∞[), which is crucial for the impulse responses (the notion of a bounded input-bounded output (BIBO) system is related to the integrable character of the impulse response; in short, L1 L∞ ⊂ L∞ ). Proof. See [MAT 96b, MAT 98a] for these tricky calculations of residues and asymptotic behavior of indefinite integrals depending on a parameter. Let us note that the analysis in the Laplace plane provides only one pole when | arg(λ)| < απ and the latter (if it exists) is accompanied by an integral term – or aperiodic multimode according to [OUS 83] – resulting from the cut on the negative real semi-axis imposed by the multiform character of s → sα : this is our first engagement with diffusive representation, which will be detailed further in section 7.4.1. EXAMPLE 7.7.– In the half-integer case, we can illustrate the asymptotic behavior in the two sides of Figure 7.1: the eigenvalue λ describes the plane in σ (which is only √ the “unfolded” Riemann surface s). That is translated in the Laplace plane either by a pole and a cut (or a “pole” in the first layer of the Riemann surface), or by the cut alone (or a “pole” in the second layer of the Riemann surface). 2
2
1.5
1.5
stable
1
stable
1
stable 0.5 Im(sigma)
Im(s)
0.5
unstable
0
-0.5
-0.5
stable
-1
-1
-1.5
-1.5
-2 -2
-1.5
-1
unstable
0
-0.5
0 Re(s)
0.5
1
1.5
2
-2 -2
stable
-1.5
-1
-0.5
0 Re(sigma)
0.5
1
1.5
2
Figure 7.1. Half-integer case: (a) Laplace plane in s; (b) plane in σ
√ In Figures 7.2 to 7.8, we represent the eigenfunctions E1/2 (λ t) (whose integral term decreases like t−1/2 ) in real and imaginary parts. We will note the asymptotically oscillatory character in Figure 7.4 and the absence of oscillatory term (or residue) in Figures 7.6 to 7.8; only the integral or diffusive part is present.
An Introduction to Fractional Calculus 5
4
3
2
1 0
0.2
0.4
0.8
0.6
1
t
√ Figure 7.2. For λ = 1, E1/2 (λ t). Exponentially divergent real behavior
6
8
4
6
2 4 t 0
0
0.2
0.4
0.6
0.8
1 2
-2 0 -4
0
0.2
0.4
0.6
0.8
1
t -2
-6
-4
-8
-10
-6
√ √ Figure 7.3. For λ = 3(1 + 0.9.i), (a): e[E1/2 (λ t)], (b): m[E1/2 (λ t)]. Oscillatory exponentially divergent behavior
2 2
1 1
t 0
0
0.2
0.4
0.6
0.8
1 0
0
0.2
0.4
0.6
0.8
t
-1 -1
-2 -2
√ √ Figure 7.4. For λ = 4(1 + i), (a): e[E1/2 (λ t)], (b): m[E1/2 (λ t)]. Asymptotically oscillating behavior
1
259
260
Scaling, Fractals and Wavelets
1
1.5
0.5 1
0
0
1 t
0.5
2
1.5
0.5
-0.5 0
0
0.2
0.4
0.8
0.6
1
t -1 -0.5
√ √ Figure 7.5. For λ = 4(0.8 + i), (a): e[E1/2 (λ t)], (b): m[E1/2 (λ t)]. Behavior converging in two times: oscillatory exponentially, then diffusive in t−1/2
1
0.6
0.5
0.8
0.4 0.6
0.3 0.4 0.2
0.2 0.1
0
0
0.2
0.4
0.6
0.8
1
0
0
0.2
0.4
t
0.6
0.8
1
t
√ √ Figure 7.6. For λ = 5i, (a): e[E1/2 (λ t)], (b): m[E1/2 (λ t)]. Diffusive behavior only in t−1/2
1
0.2 0.8
0.15 0.6
0.1 0.4
0.05 0.2
0
0.2
0.4
0.6 t
0.8
1
0
0
0.2
0.4
0.6
0.8
t
√ √ Figure 7.7. For λ = 4(−1 + i), (a): e[E1/2 (λ t)], (b): m[E1/2 (λ t)]. Diffusive behavior in t−1/2
1
An Introduction to Fractional Calculus
261
1
0.8
0.6
0.4
0.2
0
0.2
0.4
0.6
0.8
1
t
√ Figure 7.8. For λ = −10, E1/2 (λ t). Pure diffusive behavior in t−1/2
7.3.5. Controlled-and-observed linear dynamic systems of fractional order Let us assume the class of the following fractional linear dynamic systems: dα x = Ax + Bu y = Cx + Du We can study them under the angle of asymptotic stability, controllability, observability, stabilization by state feedback, construction of an asymptotic observer and stabilization by an observator-based controller. The results which relate to the controlled-and-observed linear dynamic systems of integer order [DAN 94, SON 90] can be generalized to the fractional order. In particular, a system in this class will have the property of: – stability if and only if | arg spec(A)| > α π2 ; – stabilizability by state feedback if and only if: ∃K
such that | arg spec(A + BK)| > α
π 2
which is fulfilled if the “ungovernable” modes of (A, B) in the traditional sense are stable in the α-sense and thus in particular if the pair (A, B) is governable in the traditional sense; – construction of an asymptotic observer if and only if: ∃L such that | arg spec(A + LC)| > α
π 2
262
Scaling, Fractals and Wavelets
which is fulfilled if the “unobservable” modes of (C, A) in the traditional sense are stable in the α-sense and thus in particular if the pair (C, A) is observable in the traditional sense; – stabilizability by observer-based controller if and only if it is stabilizable by state feedback and if we can build an asymptotic observer, which is specifically the case when the triplet (C, A, B) is minimal in the traditional sense. Further details can be found in [MAT 96c] for the concepts of observability and controllability and in [MAT 97] for the observer-based control. NOTE 7.6.– We should, however, be aware that the application range of the preceding approach is rather limited because it relies heavily on the commensurate character of the derivation orders and therefore makes a distinction between rational orders and others, which is theoretically restrictive and completely impracticable for digital simulation, for example. 7.4. Diffusive structure of fractional differential systems We now approach the study of fractional differential systems of incommensurate orders, linear and with constant coefficients in time, i.e., the pseudo-differential input(u)-output (y) systems of the form: K k=0
ak Dαk y(t) =
L
bl Dβl u(t)
l=0
corresponding, by Laplace transform, to the symbol: L bl sβl H(s) = Kl=0 . αk k=0 ak s
(7.34)
NOTE 7.7.– Strictly speaking, the term fractional should be reserved for the commensurate systems of orders (βl = lα1 and αk = kα1 ), whereas the term non-integer would be, in truth, more suitable; we conform here to the Anglo-Saxon use (fractional calculus). In section 7.4.1 we give a general structure result which shows to what extent the fractional differential systems are also diffusive pseudo-differential systems. In section 7.4.3, a characterization of the concept of long memory is given. Finally, in section 7.4.4, we recall the particular case of the fractional differential systems of commensurate orders, to which the general structure result naturally applies, but which allows, moreover, an explicit characterization of stability (in the sense of BIBO). However, first of all, in section 7.4.1, we recall some basic ideas on what diffusive representations of pseudo-differential operators are.
An Introduction to Fractional Calculus
263
7.4.1. Introduction to diffusive representations of pseudo-differential operators A first-order system, or autoregressive filter of order 1 (AR-1) in other contexts, is undoubtedly the simplest linear dynamic system imaginable which does not oscillate, but has a behavior of pure relaxation. A discrete superposition of such systems, for various time constants τk , or in an equivalent way for various relaxation constants ξk = τk−1 and various weights μk , gives a simple idea – without being simplistic2 – of the diffusive pseudo-differential operators required to simulate the fractional differential equations. When the superposition is discrete and finite, the resulting system is a system of a integer order with poles (real negative sk = −ξk ) and of zeros; on the other hand, if the superposition is either discrete infinite, or continuous for all the relaxation constants ξ > 0 and with a weight function μ(ξ), we obtain a pseudo-differential system known to be of the diffusive type, the function μ being called the diffusive representation of the associated pseudo-differential operator. In the sense of systems theory, a realization of such a system will be: ∂t ψ(t, ξ) = −ξ ψ(t, ξ) + u(t) +∞ y(t) = μ(ξ)ψ(t, ξ) dξ
(7.35) (7.36)
0
which is mathematically meaningful within a suitable functional framework (see e.g. [STA 94, MON 98, MAT 08] for technical details; the latter reference making the link with the class of well-posed linear systems). A simple calculation thus shows that the impulse response of the input u-output y system is: +∞ μ(ξ) e−ξt dξ. (7.37) hμ (t) = 0
Its transfer function or its symbol is then, for e(s) > 0: +∞ μ(ξ) Hμ (s) = dξ. s +ξ 0
(7.38)
EXAMPLE 7.8.– A simple case of a diffusive pseudo-differential operator is that of the fractional integrator I α , whose diffusive representation is μα (ξ) = sinπαπ ξ −α for 0 < α < 1.
2. Indeed, it is, on the one hand, by completion of this family within a suitable topological framework that we can obtain the space of diffusive pseudo-differential operators and, on the other hand and eventually, these simple systems which are programmed numerically by procedures of standard numerical approximation; see e.g. [HÉL 06b].
264
Scaling, Fractals and Wavelets
We see that one of the advantages of diffusive representations is to transform non-local problems of hereditary nature, in time, into local problems, which specifically enables a standard and effective numerical approximation (see e.g. [HEL 00]). On the other hand, when the diffusive representation μ is positive, the realization suggested has the important property of dissipativity of the pseudo-differential operator (a natural energy functional is then given by Eψ (t) = # +∞ μ(ξ) |ψ(t, ξ)|2 dξ), which is in this case of the positive type, which has 0 important consequences, particularly for the study of stability coupled systems (see [MON 97] and also [MON 00] for non-linear systems, time-varying, with hysteresis, etc.). Now, as far as stability is concerned, it is important to notice that some technicalities must be taken care of in an infinite-dimensional setting (namely, LaSalle’s invariance principle does not apply when the pre-compactness of trajectories in the energy space has not been proved a priori: this is the reason why we have to analyze the spectrum of the infinitesimal generator of the semigroup of the augmented system and resort to Arendt-Batty stability theorem, as has been done recently in [MAT 05]). 7.4.2. General decomposition result 1 tα−1 and by strictly limiting ourselves to the By re-using the notation Yα (t) = Γ(α) + case of strictly proper systems (βL < αK ), the following significant result is obtained (see [MAT 98a, AUD 00]).
THEOREM 7.1 (D ECOMPOSITION R ESULT).– The impulse response h of system (7.34) of symbol H has the structure: h(t) =
νi r i=1 j=1
si t
rij Yj (t) e
+∞
+
μ(ξ) e−ξt dξ
(7.39)
0
where si are complex poles in C \ − and where μ is a distribution. Moreover, in the case of a density, the analytical form of μ is given by: α +β K L k l 1 k=0 l=0 ak bl sin (αk − βl )π ξ μ(ξ) = K . π k=0 ak 2 ξ 2αk + 0k
(7.40)
For the proof, the idea is to apply the remainder theorem to function H(s) which is meromorphic in the cut plane C \ − . The diffusive term then follows naturally from the discontinuity of H on the cut on − ; precisely, it is shown that: μ(ξ) = lim+ ε→0
" 1 ! H(−ξ − iε) − H(−ξ + iε) . 2iπ
An Introduction to Fractional Calculus
265
In other words, the impulse response h of a fractional differential system breaks up r into a localized part hn of integer order n = i=1 νi and a part hμ of purely diffusive nature. We can find in [DAUP 00, HEL 00] a great number of examples illustrating this decomposition result on some non-standard oscillators. 7.4.3. Connection with the concept of long memory Finally, let us recall that such systems are said to have long memory in so far as the decrease of the impulse response (in the stable case) is not of exponential type. This is determined by a generalized expansion in ξ = 0 of distribution μ, which is followed by the application of the following lemma. LEMMA 7.1 (Watson).– For −1 < γ1 < γm < γm+1 , we have: μ(ξ) =
M −1
μm
m=1
=⇒
ξ γm + O(ξ γM ) Γ(1 + γm ) hμ (t) =
M −1 m=1
(7.41) μm
1 t1+γm
+ O(t−1−γM )
Thus, by juxtaposing the decomposition result (7.39), the expression of μ (7.40) and the asymptotic analysis (7.40), the following characterization of stability is obtained. THEOREM 7.2.– System (7.34) is stable in BIBO if and only if the two following conditions are verified: – in (7.39), we have e(si ) < 0, for all i; – the first exponent γ1 in (7.40) is strictly positive. It should be noted that a priori, si , although a finite number (see [BON 00]), is not known in a simple way in the general case. The situation is quite different when the system is more structured, as it emphasized below. 7.4.4. Particular case of fractional differential systems of commensurate orders The general result given before can then be expressed differently, by using a strong algebraic structure induced by the commensurate character of the derivation orders. When saying σ = sα , R is defined such that H(s) = R(σ). It is then enough to decompose the rational fraction R into simple elements (σ − λ)−m , to define
266
Scaling, Fractals and Wavelets
by inverse Laplace transform the corresponding basic elements Eαm (λ, t) and to characterize their stability by using an asymptotic analysis similar to the preceding one. The function Eαm (λ, t) is the fundamental solution of the operator (Dα − λ)m ; it belongs to the family of the Mittag-Leffler functions, which is a subset of hypergeometric special functions. Using these functions, we obtain the following structure result. PROPOSITION 7.18.– We have: h(t) =
mn N
rnm Eαm (λn , t)
n=1 m=1
with R(σ) =
N
mn
n=1
m=1 rnm
(σ − λn )−m .
A refined asymptotic analysis of the functions Eαm (λn , t) makes it possible to deduce the following fundamental result for BIBO stability when R = Q/P , with P, Q two coprime polynomials and 0 < α < 1. THEOREM 7.3.– We have: BIBO stability
⇐⇒
π | arg σ| > α , 2
∀σ ∈ C, P (σ) = 0.
(7.42)
In this latter case, the impulse response has the asymptotic: h(t) ∼ Kt−1−α
when
t → +∞.
(7.43)
NOTE 7.8.– In this case, the poles of the system appearing in decomposition (7.39) are known analytically; they are exactly sn = λn 1/α , but only for those of the preceding λn , which verify | arg λn | < απ. NOTE 7.9.– In the whole case α = 1, we find with (7.42) the traditional stability result: absence of poles in the closed right-half plane. 7.5. Example of a fractional partial differential equation An example of propagation phenomenon with long memory, very similar to that with which we will now deal is mentioned in [DAUT 84b]. This refers to the original Russian articles [LOK 78a, LOK 78b], but the fundamental difference which exists between the case presented and ours is that the space is unbounded; hence there are no discrete spectra or resonance modes of the physical system; moreover, no relationship with the eigenfunctions of fractional derivation appears.
An Introduction to Fractional Calculus
267
We thus examine the example of an acoustic pipe of finite length (consequently, space is bounded), as studied in [MAT 94] and summarized in [MAT 95b]. We present the physical problem in section 7.5.1 and commence by studying the controlled problem: the perturbation by a fractional derivative term, in time, of the traditional wave equation of the 1D waves is examined from the perspective of its spectral consequences in section 7.5.2 and from the point of view of its time-domain consequences in section 7.5.3. Lastly, we examine the response to the initial conditions in section 7.5.4, i.e., the free problem. 7.5.1. Physical problem considered The propagation of pressure waves in air, regarded as a real medium with viscous and thermal losses, has already been studied in acoustics, either in a closed space (bounded domain), or in an open space (unbounded domain): the approach which is generally made is in the frequency domain. A fractional partial differential equation (FPDE) was proposed in [POL 91] for the approximation known as of the broad pipes and is found again, within a very general framework, as an approximation at high frequencies in e.g. [FEL 00]; it is a wave equation where a fractional derivative term appears as a perturbation of the traditional wave equation: the perturbation parameter is conversely proportional to the radius of the cylindrical tube considered. We standardize the propagation velocity of the sound to 1 and the length of the tube to 1 and we note by ε the perturbation parameter. Within this framework, we consider the following linear dynamic system written in the sense of distributions, where u(t) is the input or boundary control (the pressure signal introduced at the left input of the tube in x = 0), X(t, x) the internal state of infinite dimension (since it is in fact a function of the abscissa x ∈ [0, 1]) and y(t) the output or the observation (the pressure signal that we listen to at the output of the tube in x = 1): 3 2 ∂t + 2ε∂t2 + ε2 ∂t1 X − ∂x2 X = 0,
t > 0,
x ∈ ]0, 1[
(7.44)
The initial conditions are identically zero: we have X(t = 0, x) = 0 and ∂t X(t = 0, x) = 0, and the dynamic boundary conditions are of absorbing type (with a0 b0 > 0, a1 b1 > 0): ⎧! 1 1 " ⎨ a0 (∂t + ε∂ 2 ) + b0 ∂−x X(t, x = 0) = a0 (∂t + ε∂ 2 )u(t) t t (7.45) ⎩!a (∂ + ε∂ 12 ) + b ∂ "X(t, x = 1) = 0 1 t 1 x t where {a1 , b1 } define a reflection coefficient of waves at the output of the tube: r1 = (b1 − a1 )/(a1 + b1 ) and where {a0 , b0 } define a reflection coefficient of waves at its input: r0 = (a0 − b0 )/(a0 + b0 ). The absorbing property is given by |ri | < 1, i.e., ai bi > 0; in the limiting case |ri | = 1, these are traditional boundary conditions
268
Scaling, Fractals and Wavelets
of Dirichlet or Neumann type. The total reflection coefficient is given by ρ = −r0 r1 . Lastly, the output of the system is: y(t) = X(t, x = 1).
(7.46)
7.5.2. Spectral consequences According to [KOL 69], we apply the Laplace transform in the sense of causal distributions to (7.44); we are then led to the following characteristic equation for the poles of the system: * (7.47) sεn + ε sεn = s0n √ = 0) with e( s) > 0, where s0n = −α0 +iωn0 are the poles of the uncontrolled (u(t) * system without loss (ε = 0), with the boundary conditions (7.45): α0 = − ln |ρ| fulfills the relation α0 > 0 when ai bi > 0, i.e., when there is loss of energy at the edges. Moreover, ωn0 = ω00 + nπ are also spaced with interval of π, symmetrically distributed with respect to 0 and including or not including 0 according to whether they are even modes (if ρ > 0, ω00 = 0) or odd (if ρ < 0, ω00 = π2 ). This classical result can be found in [RUS 78], for example. At this stage, we can summarize the three following lessons: – damping is more significant and depends upon the frequency: αnε > α0 , and the eigenpulsations are attenuated: |ωnε | < |ωn0 |; – the negative real axis becomes an integral part of the “spectrum”, with a weight o(ε) (see section 7.5.3.2); – a finite number of poles located at low frequency can even disappear if the perturbation parameter becomes too large, with α0 fixed. In Figure 7.9, we represent in the Laplace plane the poles s0n without loss (ε = 0) and the poles sεn with losses (ε = 0.25), for the configurations of odd modes o: ρ = −1 and of even modes ∗: ρ = +0.8. 7.5.3. Time-domain consequences Now, we calculate the impulse response of the system, using three methods which take the transfer function of the system as a starting point, i.e., up to a multiplicative factor: H(s) =
e−Γ(s) 1 − ρ e−2Γ(s)
(7.48)
√ where Γ(s) = s + ε s is the propagation constant and ρ the global reflection coefficient.
An Introduction to Fractional Calculus
269
20
15
10
Im(s)
5
0
-5
-10
-15
-20 -1
-0.5
Re(s)
0
0.5
Figure 7.9. Position in the Laplace plane of the poles of the models with losses (ε = 0.25) and without loss (ε = 0). Legend: o: ρ = −1, ∗: ρ = +0.8
7.5.3.1. Decomposition into wavetrains This is undoubtedly the most physically meaningful decomposition. By expanding the fraction in (7.48) into power series, which is legitimate for e(s) > 0 in the case |ρ| < 1, then by applying the inverse Laplace transform term by term, we obtain the following wavetrain decomposition of the system: h(t) =
∞
ρk ψ (2k+1)ε t − (2k + 1)
(7.49)
k=0
where ψ ε (t) is the fundamental solution of a 3D diffusion process (i.e., parabolic heat equation): ' √ ( 3 ε2 def (7.50) ψ ε (t) = L−1 e−ε s , e(s) > 0 ∝ εt− 2 e− 4t for t > 0. In Figure 7.10a, we represent function ψ 1 , that is, the elementary lossy wave: it is a function of class C ∞ of which all the derivatives are zero at t = 0+ and which decreases like t−3/2 at infinity; it is integrable and it even belongs to L1 , L2 , . . . , L∞ , 1 with norm ψ ε p = ε−2(1− p ) ψ 1 p . An interesting property is that we again find the case without loss of the classical wave equation: ψ 0 (t) = δ(t) (as a limit in the sense of distributions), which does not have any of the regularity properties of ψ ε when ε > 0. Moreover, the family of functions obeys the following scaling law: t ψ ε (t) = ε−2 ψ 1 2 ε
(7.51)
270
Scaling, Fractals and Wavelets 1
15
0.9 0.8 0.7
10
psi
psi
0.6 0.5 0.4
5
0.3 0.2 0.1 0 0
0.2
0.4
0.6
0.8
1 t
1.2
1.4
1.6
1.8
0 0
2
0.1
0.2
0.3
0.4
0.5 t
0.6
0.7
0.8
0.9
1
Figure 7.10. (a) Fundamental solution ψ 1 of a 3D diffusion process; (b) scaling law of functions ψ ε for ε = 0.25 (continuous line), 0.75 (dotted line) and 1.25 (indents) 10000
15
9000 8000 7000
10
5000
h
h
6000
4000 5
3000 2000 1000 0 0
1
2
3
t
4
5
6
7
0 0
1
2
3
t
4
5
6
7
Figure 7.11. Impulse response of the model with losses: (a) for ε = 0.01, (b) for ε = 0.25
which is shown in Figure 7.10b. Thus, it is clear that the waves which appear successively in h(t) in decomposition (7.49) have an increasingly low amplitude, but a temporal support (or a width with middle height) which is increasingly high. We illustrate this phenomenon as follows. In Figure 7.11a, ε = 0.01 is very small and the supports of successive waves remain separate. To some extent this resembles the case without loss, but with successive amplitudes which decrease in a way nearly independent of the total reflection coefficient ρ, whereas in Figure 7.11b, ε = 0.25 can no longer be compared with a perturbation, while the increasing spreading of the supports is verified. 7.5.3.2. Quasi-modal decomposition To apply the inverse Laplace transform to (7.48) directly by using residue calculus, we must take into account the cut √ along the negative real axis, which is imposed by the multiform character of s → s.
An Introduction to Fractional Calculus
271
It is this which very precisely creates an integral term in the modal decomposition, sometimes called aperiodic multimode, since we can regard it as the superposition of a continuous infinity of damped exponentials; the structure of the impulse response is thus the following: ∞ ε sεn t cn e + με (ξ) e−ξt dξ (7.52) h(t) = 0
n∈S ε
where some of sεn possibly disappeared from the discrete sum on the indices in S ε (this√occurs very exactly when equation (7.47) does not have solutions such that e( s) > 0). In the diffusive part, με = o(ε) shows that in a certain way, the integral term is a perturbation of order ε. NOTE 7.10.– It should be noted that we observe here a generalization of the decomposition result (7.39), the only difference being the infinitely countable character of the poles. For a detailed study of the family of diffusive representation ξ → με (ξ) indexed by ε, see Chapter 9 of [HEL 00]. 7.5.3.3. Fractional modal decomposition In (7.48), we carry out the decomposition of the meromorphic function in a series of normally convergent elementary meromorphic functions on every compact subset of the complex plane (see Chapter 5 of [CART 61]), by using: +∞ (−1)n 1 = sinh(z) n=−∞ z − inπ
which gives us: +∞ 1 (−1)n . H(s) = √ 2 ρ n=−∞ Γ(s) + α − i(nπ + ω0 )
Lastly, √ we break up each term of the series into rational fractions of the variable s: 1 1 1 1 √ = ε+ −√ √ s + ε s − s0n σn − σnε− s − σnε+ s − σnε− where σnε± are the solutions of (7.47): σ 2 + εσ = s0n
272
Scaling, Fractals and Wavelets
then, we carry out a inverse Laplace transform, i.e.: h(t) =
+∞ n=−∞
cεn E 12 (σnε+ , t) − E 12 (σnε− , t) .
(7.53)
This is exactly the fractional modal decomposition of order one half of the impulse response h(t); it has the impulse response structure of a fractional differential equation, with the difference that finite sums become series in the case of our equation with fractional partial derivatives (strictly speaking, it is advisable to examine in which way the series of functions (7.53) converges, so as to give a precise analytical meaning to our formal expression). 7.5.4. Free problem In addition, to completely analyze the model suggested, we considered the free problem, i.e., with zero order (u ≡ 0), but with non-zero initial conditions, i.e.: ' 1 1 2 ( 1 3 4 X − ∂x2 X = 0 for t > 0 and x ∈ ]0, 1[ ∂t2 + 2ε ∂t2 + ε2 ∂t2 with given integer initial conditions (physical): X(t = 0, x) = f (x) and
1
∂t2
2
X(t = 0, x) = g(x)
half-integer initial conditions (abstract) equal to zero: 1 1 3 ∂t2 X(t = 0, x) = ∂t2 X(t = 0, x) = 0
and homogenous (7.45) absorbing boundary conditions (u ≡ 0). In [MAT 96a], we carried out an extension to the infinite dimension of the results established on the fractional differential equations; we obtained the following half-integer modal decomposition of the solution: X(t, x) =
+∞ ! ε+ " ε− cn E1/2 (σnε+ , t) + cε− n E1/2 (σn , t) ψn (x) n=−∞
where (σnε± )n constitutes the half-integer temporal spectrum and where ψn are the spatial modes worth, as for the equation without loss (i.e., with ε = 0), ψn (x) = r0 exp(−s0n x) − exp(s0n x). In other words, the solution of the problem is a sum of space × time products, but contrarily to the case of the integer temporal order, the time evolution of each spatial vibration mode is not exponential, but monogenic in nature and of half-order.
An Introduction to Fractional Calculus
273
Lastly, with regard to the temporal decrease of the waves energy, it is, once again, by using the equivalent diffusive formulations that we can answer positively to this question (see [MAT 98b]). We would like to give some more insight on the use of diffusive representations for this type of FPDE: – in the case of a varying cross-section, that is, when curvature is present, the above model becomes a bit more complex; it is sometimes called the Webster-Lokshin model (see [HÉL 06a] for the model itself and its resolution in the Laplace domain in the case of piecewise-constant coefficients); – using the Hille-Yosida theory, the well-posedness of this system has first been proved in [HAD 03], with full technical details in Chapter 2 of [HAD 08b]; – numerical schemes taking advantage of this equivalent reformulation have been proposed and analyzed in [HAD 08a], with full technical details in Chapter 3 of [HAD 08b]; – using the Arendt-Batty stability theorem, the asymptotic stability of this system has first been proved in [MAT 06]. 7.6. Conclusion This chapter is only an introduction to fractional calculus and its multiple applications. Through the meticulous examination of the connections which exist between two definitions of fractional derivatives, detailed asymptotic analysis of the fundamental solutions of these operators (the decrease at infinity in t−1−α of stable solutions is the analytical expression of the physical phenomenon of long memory), we have shown the unquestionable contribution of a complex analytical approach, which always establishes the link between the temporal and frequential domains. We also hope to have illustrated the simplicity and the elegance of the diffusive representations which present, in our eyes, promising, if not decisive advantages in the fields of modeling, analysis, approximation and identification, without being restricted to the fractional case. 7.7. Bibliography [AUD 00] AUDOUNET J., M ATIGNON D., M ONTSENY G., “Diffusive representations of fractional and pseudo-differential operators”, in Research Trends in Science and Technology (Beirut, Lebanon), Lebanese American University, p. 171–180, March 2000. [BAG 83a] BAGLEY R.L., T ORVIK P.J., “Fractional calculus – A different approach to the analysis of viscoelastically damped structures”, AIAA J., vol. 21, no. 5, p. 741–748, 1983.
274
Scaling, Fractals and Wavelets
[BAG 83b] BAGLEY R.L., T ORVIK P.J., “A theoretical basis for the application of fractional calculus to viscoelasticity”, J. Rheology, vol. 27, no. 3, p. 201–210, 1983. [BAG 85] BAGLEY R.L., T ORVIK P.J., “Fractional calculus in the transient analysis of viscoelastically damped structures”, AIAA J., vol. 23, no. 6, p. 918–925, 1985. [BAG 86] BAGLEY R.L., T ORVIK P.J., “On the fractional calculus model of viscoelastic behavior”, J. Rheology, vol. 30, no. 1, p. 133–155, 1986. [BAG 91] BAGLEY R.L., C ALICO R.A., “Fractional order state equations for the control of viscoelastically damped structures”, J. Guidance, Control, and Dynamics, vol. 14, no. 2, p. 304–311, 1991. [BAT 54] BATEMAN H., Higher Transcendental Functions, McGraw-Hill, New York, vol. 3, chap. XVIII, p. 206–212, 1954. [BON 00] B ONNET C., PARTINGTON J.R., “Stabilization and nuclearity of fractional differential systems”, in Mathematical Theory of Networks and Systems Symposium (Perpignan, France), MTNS, June 2000. [CAP 76] C APUTO M., “Vibrations of an infinite plate with a frequency independent Q”, J. Acoust. Soc. Amer., vol. 60, no. 3, p. 634–639, 1976. [CARP 97] C ARPINTERI A., M AINARDI F. (Eds.), Fractals and Fractional Calculus in Continuum Mechanics, Springer-Verlag, CISM Courses and Lectures 378, 1997. [CART 61] C ARTAN H., Théorie élémentaire des fonctions analytiques d’une ou plusieurs variables complexes, Collection Enseignement des sciences, Hermann, Paris, 1961. [DAN 94] D’A NDRÉA -N OVEL B., C OHEN DE L ARA M., Commande linéaire des systèmes dynamiques, Masson, Paris, Collection MASC, 1994. [DAUP 00] DAUPHIN G., H ELESCHEWITZ D., M ATIGNON D., “Extended diffusive representations and application to non-standard oscillators”, in Mathematical Theory of Networks and Systems Symposium (Perpignan, France), MTNS, June 2000. [DAUT 84a] DAUTRAY R., L IONS J.L., Analyse mathématique et calcul numérique pour les sciences et les techniques, vol. 8, chap. XVIII, p. 774-785, Masson, Paris, 1984. [DAUT 84b] DAUTRAY R., L IONS J.L., Analyse mathématique et calcul numérique pour les sciences et les techniques, vol. 7, chap. XVI, p. 333-337, Masson, Paris, 1984. [FEL 00] F ELLAH Z.E.A., D EPOLLIER C., “Transient acoustic wave propagation in rigid porous media: A time-domain approach”, J. Acoust. Soc. Amer., vol. 107, no. 2, p. 683–688, 2000. [GAS 90] G ASQUET G., W ITOMSKI P., Analyse de Fourier et applications. Filtrage, calcul numérique, ondelettes, Masson, Paris, 1990. [GIO 92] G IONA M., ROMAN H. E., “Fractional diffusion equation on fractals: one-dimensional case and asymptotic behaviour”, Journal of Physics A: Mathematical and General, vol. 25, p. 2093–2105, 1992. [GUE 72] G UELFAND I.M., C HILOV G.E., Les distributions, volume 1, Dunod, Paris, Monographies universitaires de mathématiques 8, 1972.
An Introduction to Fractional Calculus
275
[HAD 03] H ADDAR H., H ÉLIE T., M ATIGNON D., “A Webster-Lokshin model for waves with viscothermal losses and impedance boundary conditions: strong solutions”, Mathematical and Numerical Aspects of Wave Propagation, p. 66–71, Springer Verlag, 2003. [HAD 08a] H ADDAR H., L I J.-R., M ATIGNON D., “Efficient solution of a wave equation with fractional order dissipative terms”, J. Comput. & Appl. Maths, 2008, forthcoming. [HAD 08b] H ADDAR H., M ATIGNON D., Theoretical and numerical analysis of the Webster-Lokshin model, Report, Institut National de la Recherche en Informatique et Automatique (INRIA), 2008, Research Report no. 6558. [HEL 00] H ELESCHEWITZ D., Analyse et simulation de systèmes différentiels fractionnaires et pseudo-différentiels linéaires sous représentation diffusive, PhD Thesis, ENST, December 2000. [HÉL 06a] H ÉLIE T., M ATIGNON D., “Diffusive reprentations for the analysis and simulation of flared acoustic pipes with visco-thermal losses”, Math. Models Meth. Appl. Sci., vol. 16, p. 503–536, January 2006. [HÉL 06b] H ÉLIE T., M ATIGNON D., “Representations with poles and cuts for the time-domain simulation of fractional systems and irrational transfer functions”, Signal Processing, vol. 86, p. 2516–2528, July 2006. [KOE 84] KOELLER R.C., “Applications of fractional calculus to the theory of viscoelasticity”, J. Appl. Mech., vol. 51, p. 299–307, 1984. [KOE 86] KOELLER R.C., “Polynomial operators, Stieljes convolution, and fractional calculus in hereditary mechanics”, Acta Mech., vol. 58, p. 251–264, 1986. [KOL 69] KÖLBIG K.S., Laplace transform, Lectures in the academic training programme, CERN, Geneva, Switzerland, 1968–1969. [LEM 90] L E M EHAUTÉ A., Les géométries fractales, Hermes, Paris, 1990. [LOK 78a] L OKSHIN A.A., “Wave equation with singular retarded time”, Dokl. Akad. Nauk SSSR, vol. 240, p. 43–46, 1978. [LOK 78b] L OKSHIN A.A., ROK V.E., “Fundamental solutions of the wave equation with retarded time”, Dokl. Akad. Nauk SSSR, vol. 239, p. 1305–1308, 1978. [MAT 94] M ATIGNON D., Représentations en variables d’état de modèles de guides d’ondes avec dérivation fractionnaire, PhD Thesis, University of Paris XI, November 1994. [MAT 95a] M ATIGNON D., D ’A NDRÉA -N OVEL B., “Décomposition modale fractionnaire de l’équation des ondes avec pertes viscothermiques”, in Journées d’études : les systèmes d’ordre non entier en automatique, Groupe de recherche Automatique du CNRS, April 1995. [MAT 95b] M ATIGNON D., D ’A NDRÉA -N OVEL B., “Spectral and time-domain consequences of an integro-differential perturbation of the wave PDE”, in Third International Conference on Mathematical and Numerical Aspects of Wave Propagation Phenomena (Mandelieu, France), INRIA, SIAM, p. 769-771, April 1995.
276
Scaling, Fractals and Wavelets
[MAT 96a] M ATIGNON D., “Fractional modal decomposition of a boundary-controlled-and-observed infinite-dimensional linear system”, in Mathematical Theory of Networks and Systems (Saint-Louis, Missouri), MTNS, June 1996. [MAT 96b] M ATIGNON D., “Stability results for fractional differential equations with applications to control processing”, in Computational Engineering in Systems Applications (Lille, France), IMACS, IEEE-SMC, vol. 2, p. 963–968, July 1996. [MAT 96c] M ATIGNON D., D ’A NDRÉA -N OVEL B., “Some results on controllability and observability of finite-dimensional fractional differential systems”, in Computational Engineering in Systems Applications (Lille, France), IMACS, IEEE-SMC, vol. 2, p. 952–956, July 1996. [MAT 97] M ATIGNON D., D ’A NDRÉA -N OVEL B., “Observer-based controllers for fractional differential systems”, in Conference on Decision and Control (San Diego, California), IEEE-CSS, SIAM, p. 4967–4972, December 1997. [MAT 98a] M ATIGNON D., “Stability properties for generalized fractional differential systems”, in ESAIM: Proceedings, vol. 5, p. 145–158, December 1998 (available at http://www.edpsciences.org/articlesproc/Vol.5/). [MAT 98b] M ATIGNON D., AUDOUNET J., M ONTSENY G., “Energy decay for wave equations with damping of fractional order”, in Fourth International Conference on Mathematical and Numerical Aspects of Wave Propagation Phenomena (Golden, Colorado), INRIA, SIAM, p. 638–640, June 1998. [MAT 98c] M ATIGNON D., M ONTSENY G. (Eds.), Fractional differential systems: Models, methods, and applications, ESAIM, Proceedings 5, December 1998 (available at http://www.edpsciences.org/articlesproc/Vol.5/). [MAT 05] M ATIGNON D., P RIEUR C., “Asymptotic stability of linear conservative systems when coupled with diffusive systems”, ESAIM Control Optim. Calc. Var., vol. 11, p. 487–507, July 2005. [MAT 06] M ATIGNON D., “Asymptotic stability of the Webster-Lokshin model”, Mathematical Theory of Networks and Systems (MTNS), Kyoto, Japan, 11 p. (CD-Rom), July 2006, (invited session). [MAT 08] M ATIGNON D., Z WART H., “Standard diffusive systems as well-posed linear systems”, 2008, submitted. [MIL 93] M ILLER K.S., ROSS B., An Introduction to the Fractional Calculus and Fractional Differential Equations, John Wiley & Sons, 1993. [MIT 04] M ITTAG -L EFFLER G., “Sur la représentation analytique d’une branche uniforme d’une fonction monogène”, Acta Math., vol. 29, p. 101–168, 1904. [MON 97] M ONTSENY G., AUDOUNET J., M ATIGNON D., “Fractional integro-differential boundary control of the Euler-Bernoulli beam”, in Conference on Decision and Control (San Diego, California), IEEE-CSS, SIAM, p. 4973–4978, December 1997. [MON 98] M ONTSENY G., “Diffusive representation of pseudo-differential time-operators”, in ESAIM: Proceedings, vol. 5, p. 159–175, December 1998 (available at http://www.edpsciences.org/articlesproc/Vol.5/).
An Introduction to Fractional Calculus
277
[MON 00] M ONTSENY G., AUDOUNET J., M ATIGNON D., “Diffusive representation for pseudo-differentially damped non-linear systems”, in I SIDORI A., L AMNABHI -L AGARRIGUE F., R ESPONDEK W. (Eds.), Nonlinear Control in the Year 2000 (Paris, France), Springer-Verlag, vol. 2, p. 163–182, 2000. [OLD 74] O LDHAM K.B., S PANIER J., The Fractional Calculus, Academic Press, New York and London, 1974. [OUS 83] O USTALOUP A., Systèmes asservis linéaires d’ordre fractionnaire, Masson, Paris, Série Automatique, 1983. [POD 99] P ODLUBNY I., Fractional Differential Equations, Academic Press, Mathematics in Science and Engineering 1998, 1999. [POL 91] P OLACK J.D., “Time domain solution of Kirchhoff’s equation for sound propagation in viscothermal gases: A diffusion process”, J. Acoustique, vol. 4, p. 47–67, 1991. [RUS 78] RUSSEL D.L., “Controllability and stabilizability theory for linear partial differential equations: Recent progress and open questions”, SIAM Rev., vol. 20, no. 4, p. 639–739, 1978. [SAM 87] S AMKO S.G., K ILBAS A.A., M ARICHEV O.I., Fractional Integrals and Derivatives: Theory and Applications, Gordon and Breach, 1987. [SCH 65] S CHWARTZ L., Méthodes mathématiques pour les sciences physiques, Hermann, Paris, Collection Enseignement des sciences, 1965. [SON 90] S ONTAG E.D., Mathematical Control Theory. Deterministic Finite Dimensional Systems, Springer-Verlag, Texts in Applied Mathematics 6, 1990. [STA 94] S TAFFANS O.J., “Well-posedness and stabilizability of a viscoelastic equation in energy space”, Trans. Amer. Math. Soc., vol. 345, no. 2, p. 527–575, 1994. [TAY 96] TAYLOR M.E., Partial Differential Equations. II: Qualitative Studies of Linear Equations, Springer-Verlag, Applied Mathematical Sciences 116, 1996. [TOR 84] T ORVIK P.J., BAGLEY R.L., “On the appearance of the fractional derivative in the behavior of real materials”, J. Appl. Mech., vol. 51, p. 294–298, 1984.
This page intentionally left blank
Chapter 8
Fractional Synthesis, Fractional Filters
8.1. Traditional and less traditional questions about fractionals Linear finite impulse response filters enable the design of the well-known moving average (MA) processes. The corresponding inverse filters are used to construct autoregressive (AR) processes. In this chapter, we study a family of filters receiving growing interest: fractional filters. They enable the definition of fractional processes as well as that of fractional Brownian motion. 8.1.1. Notes on terminology The word fractional is associated with filters and processes. It is both adapted and unadapted. On the one hand, it is adapted as it refers to the interesting class of non-integer derivative differential equations, which are constant-coefficient equations with non-integer derivatives of order n1 , n2 , . . . , np multiples of a basic fraction [MIL 93, OLD 74]. On the other hand, the term is inappropriate because, for most mathematical or filtering issues, power exponents do not consist of fractions: they can be non-rational real numbers, or even complex numbers. Nevertheless, as the term is commonly used in the literature, and as it is of no major consequence, we will continue using it. 8.1.2. Short and long memory Many stochastic models with short memory are known, e.g., independent variables, m-dependent variables, certain Markov processes, moving average, most
Chapter written by Liliane B EL, Georges O PPENHEIM, Luc ROBBIANO and Marie-Claude V IANO.
280
Scaling, Fractals and Wavelets
of the autoregressive moving average (ARMA) processes and many linear processes. The key consequence of short memory is that it often implies many limit theorems hold, such as the laws of large numbers, central limit theorems, large deviation theorems. With long memory, the situation is different. It has been more than a century since astronomers first noticed the existence of empirical series with persistent memory. Since then, similar phenomena have been observed, especially in chemistry, hydrology, climatology and economics. Such series obviously pose interesting statistical problems and their modeling as well as statistical processing have always been burning issues. See [BER 94] for historical examples modeled with fractional Brownian motion. Let us mention the existence of two reviews of long-range dependence, one written by Cox [COX 84] and the other by Beran [BER 94]. Moreover, a bibliographical guide has been created in [TAQ 92]. We focus on second order stationary processes, that is to say assuming the existence of a spectral density f . In this case, the habit is taken to characterize long memory by the non-summability of auto-covariance or by the non-finiteness of spectral density at the origin. However, both phenomena are not equivalent. Indeed, there exist fractional ARMA family processes whose spectral density is finite and non-zero, although their autocorrelation is non-summable. Furthermore, some authors have recently emphasized how useful spectral density models with singularities away from the origin were for explaining periodicity persistence. This is why we define long memory as the existence of a frequency λ0 in the neighborhood of which spectral density f (λ) behaves as |λ − λ0 |2d where d is negative. Additionally, the case where 2d is non-integer positive defines the intermediate memory situation. 8.1.3. From integer to non-integer powers: filter based sample path design Let us begin with discrete time. The well known and often studied ARMA processes are defined as solutions of recurrence equations whose second member is a simple process: X(t + 1) − a0 X(t) − a1 X(t − 1) + · · · − ap X(t − p) = b0 ε(t) − b1 ε(t − 1) + · · · − bq ε(t − q) where ε is a white noise iid, innovation of the process. By defining two polynomials and a rational fraction: A(z) =
p i=0
i
ai z ,
B(z) =
q j=0
bj z j ,
F (z) =
B(z) , A(z)
z∈C
Fractional Synthesis, Fractional Filters
281
we can rewrite the above equation as X(t + 1) = F (B)ε(t) where B is the usual delay operator: Bε(t) = ε(t − 1). By factorizing polynomials, we obtain J F (z) = j=1 (1 − αj z)dj where the dj are relative integers. This is how we can represent autoregressive moving average family processes. Filters F have impulse responses with an exponential decay towards 0. To design long memory processes, a straightforward approach consists of introducing filters whose impulse response has slow decay towards 0. Granger and Joyeux [GRAN 80], Hosking [HOS 81] and Gonçalves [GON 87] showed in their articles that we can obtain long memory by letting d in F (z) = (1 − αz)d take appropriate negative non-integer values and α = ±1. Then the impulse responses do not decay exponentially nor are they non-summable, but only square summable, which corresponds to a slow decay. These processes have unbounded spectral densities at λ = 0 and λ = π. We can combine F with another autoregressive moving average filter. The autocovariance function of the corresponding process behaves like k −2d−1 when k tends towards +∞ and its spectral density behaves like λ2d close to 0, which implies long memory when d < 0. J If exponents dj in F (z) = j=1 (1−αj z)dj are not all relative integers, we end up with the family of fractional filters. The processes obtained by filtering a white noise with F , when F belongs to this family, are called fractional ARMA processes. Their behavior – as far as memory properties, sample path regularity (for continuous time) and singularities of spectral density are concerned – is richer and more complex than that of traditional ARMA processes. This chapter studies this family, both in discrete and continuous time. An extension to distribution processes is also proposed. 8.1.4. Local and global properties In continuous time, the preferred prototype for a fractional process is fractional Brownian motion. Introduced by Mandelbrot in 1968, this process is non-stationary, Gaussian, centered and dependent on a single parameter H in ]0, 1[. The autocovariance kernel reads 12 (|t|2H + |s|2H − 2|t − s|2H ). If H = 12 , we obtain ordinary Brownian motion. Several properties account for the success of these processes: they are H-self-similar in law and almost all their sample paths have a Hausdorff dimension equal to 2 − H. Parameter H plays a central role. It both regulates global and local properties: – on the one hand, memory properties: if H > 12 , the increment process has long memory; if H 12 , memory is short; – on the other hand, H also determines the sample path regularity: the larger H is, the more regular the sample path. For continuous time, there exists a fractional ARMA family whose definition is identical to that of discrete time. Its parametric richness enables us to disconnect
282
Scaling, Fractals and Wavelets
regularity properties from memory properties: the parameters that regulate trajectories and memory range are no longer the same. Continuous time fractional ARMAs are stationary, but are not self-similar. 8.2. Fractional filters 8.2.1. Desired general properties: association ARMA filters, i.e. “rational” filters, are stable when associated in series, in parallel and even in feedback loop. It is one of their main properties. Fractional filters offer a less advantageous situation. Two fractional filters associated in series yield a fractional filter. However, if we move to parallel association, the resulting filter no longer belongs to the fractional filter family. In general, the sum of any two power law transfer functions (of the type z d ) is not a power law function, except if all the exponents dj s are integers. In other words, as opposed to the family of ARMA filters, that of fractional ARMA filters is not stable under parallel association. We wish to compensate for this drawback. We can extend the fractional filter family to a class of filters which remains stable under parallel or series associations. A way of achieving this consists of adding to fractional filters all the sums for these filters. In fact, a good way of extending the family is to ensure that the two following properties are satisfied. The first relates to the localization of the filter’s singular points, which must be situated in a well-chosen area. The second property relates to the increase at infinity of the filter’s transfer functions in the complex plane, which must not be too fast. In the discrete case, if they exist, the singularities are required not to be too unstable and, if they are oscillating, F should be regular in their vicinity. This family contains fractional filters and shares a common property with that of traditional ARMA filters: they are closed under both serial and parallel associations. In continuous time, F must be holomorphic in a conical area containing the right half-plane and its growth at infinity must be approximately that of a power law function. 8.2.2. Construction and approximation techniques The approximation of filter F by polynomials or rational fractions of the complex variable z is a traditional problem. The approximate Fa filter is used to filter a white noise in order to create an MA type approximation of the initial process. As for polynomial approximations, authors resort to truncated series expansions of F (z). The expansions are made on the basis of the z n or on the basis of Gegenbauer
Fractional Synthesis, Fractional Filters
283
polynomials [GRAY 89]. Expansion coefficients are easily calculated by recurrence. Because of this property, it is possible to calculate hundreds of thousands of terms (290,000 in [GRAY 89] to approximate a filter having two roots on the unit circle). Moreover, linear recurrences still exist for the coefficients of the impulse response of general fractional filters. They remain linear, but the coefficients are affine functions of time. In simple cases, the quadratic upper bound for the rest of the truncated series are easy to determine. Nevertheless, the number of terms increases with the memory range. To construct a very long memory process, a moving average process of a gigantic order is necessary. However, the simplicity of this procedure largely accounts for its success. In a traditional way, the analyticity of F and a criterion of infinite norm are both used. Other approximations are studied to solve stochastic problems that involve the properties of F on the imaginary axis in continuous time or on the unit circle in discrete time. Rational approximations are rarer although more promising. The principle lies in the search for an ARMA filter which minimizes a certain criterion, in general of type L2 . In [WHI 86], Whitfield brings together several ideas of approximations by a rational fraction. Certain criteria include a ponderation function in the criterion which essentially ensures the approximation in a frequency band. The procedures are built on linear or non-linear least square algorithms, recursive or not. The integral is replaced by a sum, or approximated by a trapezoid method. Various authors carry out approximations with an interesting intuitive sense. Let us quote Oustaloup [OUS 91], who chooses 2n + 1 zeros zj and 2n + 1 real poles pj , p z so that ratios zjj and j+1 pj do not depend on j. In [BON 92, CUR 86] approximations of Hankel matrices H were studied, whose first line consists of the desired impulse response. Calculations are carried out satisfactorily for the processes with intermediate memory. We then have access to an upper bound H∞ for the approximation error. Baratchart et al. [BARA 91] perfected a powerful algorithm, which we describe briefly. A linear system is considered, with constant coefficients, strictly causal, single . . . , fm , . . .) be its impulse response and f defined entry and single exit. Let (f1 , f2 , +∞ in the complex plane by f (z) = m=1 fm z −m . We assume that f belongs to Hardy − space H2 , i.e., it is square integrable on the unit circle. We then seek a rational fraction of maximum order n (to be determined) in H2− that minimizes the criterion f − r 22 . The problem at hand is that of minimizing a functional Ψn (q), where q is a polynomial
284
Scaling, Fractals and Wavelets
of P1n , the space of the real polynomials of maximum degree n whose roots are inside the unit disc and such that the coefficient of the highest degree is equal to one. It is shown that if f is holomorphic in the vicinity of the unit circle, then Ψn can extend to a regular function on Δn , the adherence of P1n in n . Adherence Δn is the set of polynomials of degree n whose roots are in the closed unit disc. A polynomial of the edge of Δn , ∂Δn , can then be factorized into a polynomial of degree k, every root of which is of module 1 and a polynomial qi internal to Δn−k . Moreover, if qi is a critical point of Δn−k , then ∇n (q), the gradient of Ψn at point q, is orthogonal to ∂Δn and points towards outside. By supposing, moreover, that ∇k is non-zero on ∂Δk and that the critical points of Ψk in Δk are not degenerated for 1 k n, the following method can give a local minimum: 1) we choose a point q0 interior to Δn as initial condition and integrate the vector −∇n ; 2) either a critical point is reached: if it is a local minimum, the procedure is completed otherwise, since it is not degenerated, it is unstable for small disturbances and the procedure can continue; 3) or the edge ∂Δn is reached in qb : then qb is decomposed up into qb = qu qi and we go back to stage 1) with qi and Δn−k . We end up reaching a minimum of Ψm (1 m n), qm , that gives, by a simple transformation, a local minimum of Ψn . This algorithm never meets the same point twice and convergence towards a local minimum is guaranteed. It was extended to the multivariable case and with time-varying coefficients (see [BARA 98]). 8.3. Discrete time fractional processes 8.3.1. Filters: impulse responses and corresponding processes Let us consider the fractional filter (of variable z) parameterized by αj and dj : F (z) =
J )
1 − αj z
dj
|z| < a
(8.1)
1
where a is defined below. αj are non-zero complex numbers. Let us note that αj−1 is a singular value of F if dj ∈ N. The following notations will be used: 1) E ∗ is the set of the singular points of F ; 2) a = min{|αj |−1 , j ∈ E ∗ }; 3) E ∗∗ is the set of the indices of the singular points whose module is equal to a; 4) d = min{Re(dj ), j ∈ E ∗∗ };
Fractional Synthesis, Fractional Filters
285
5) E ∗∗∗ is the subset of E ∗∗ corresponding to the singular points for which Re(dj ) = d. In (8.1), each factor is selected to satisfy (1 − αj z)dj = 1 when z = 0. In the domain |z| < a, F admits the series expansion: F (z) = 1 +
+∞
aj z j
j=1
where the series (aj )j1 is the convolution product of the development of the J factors in (8.1). When n tends to infinity and if F is not a polynomial, i.e., if E ∗ = ∅, then:
(J) n Γ(n − dj ) 1 + o(1) when n −→ +∞ (8.2) an = C0 αj Γ(−dj )n! ∗∗∗ j∈E
(j)
where C0 =
m=j (1
−
αm dm . αj )
From this we deduce that, if F is not a polynomial, then: 1) (aj ) belongs to l1 (N) if and only if a > 1 or (a = 1 and d > 0); 2) (aj ) belongs to l2 (N) if and only if a > 1 or (a = 1 and d > − 12 ). ∞ ∞ Consequently, 1 |aj | = +∞ and 1 |aj |2 < +∞ if and only if a = 1 1 and ∞ d ∈2 ] − 2 , 0]. Let us consider a transfer function of the form (8.1) with |a | < +∞, meaning that either F is a polynomial, or: j 1 a > 1 or
a = 1 and d > −
1 2
(8.3) ¯
¯ k z)dk Let us suppose now that if (αk , dk ) ∈ 2 , then the “conjugated” factor (1− α will appear in the right-hand side part of (8.1). Then, (aj ) is a real sequence. If (εn ) is a white noise, the process defined by: X(n) = ε(n) +
∞
aj ε(n − j)
(8.4)
j=1
is a second order stationary process, zero-mean, linear and regular, with a spectral with J density proportional to f (λ) = | j=1 (1 − αj exp(iλ))dj |2 . Moreover, if (ε(n)) is an iid sequence, (X(n)) is strictly stationary and ergodic. This process admits an autoregressive development of infinite order: X(n) +
∞ i=1
bj X(n − j) = ε(n) with
∞ j=1
b2j < +∞
286
Scaling, Fractals and Wavelets
Under the following conditions: 1) |αj | 1 for all j ∈ {1, . . . , J}; 2) if |αj | = 1, then Re(dj ) ∈ ] − 12 , 12 ], when n tends towards infinity, we have: 1) an = O(a−n ) and bn = O(b−n ) when α > 1; 2) an = O(n−d−1 ) and bn = O(nδ−1 ) when α = 1, with b = min{|αj |−1 , dj ∈ N} and δ = max{Re(dj ), |αj | = 1}. The coefficients (aj ) are the solution of the affine linear difference equation of order J: nan +
J
(n − k)qk − pk−1 an−k = 0,
n1
(8.5)
k=1
with aj = 0 if j < 0 and a0 = 1. The pj and qj are respectively the coefficients of the two relatively prime polynomials P and Q defined by Q(0) = 1 and: P (z) F (z) −αj dj = = F (z) 1 − α z Q(z) j 1 J
The coefficients bj are the solution of the same difference equation, replacing P by −P . These equations are useful to calculate the coefficients aj and bj , which enables us to obtain simulations or forecasts for the process X(n). 8.3.2. Mixing and memory properties We study two characterizations of the memory structure: the speed of covariance decrease#and the mixing coefficients. The covariance of the process X(n) is given by π σ(n) = −π exp(inλ)f (λ) dλ. When n tends to +∞, if F is a polynomial of degree k, then σ(n) = 0 for n > k, otherwise: 1) if a > 1, then σ(n) = O(a−n ); 2) if a = 1, then σ(n) = ( j∈E ∗∗∗ γj αjn n−1−2dj )(1 + o(1)). Consequently, the covariance (σ(n)) is not absolutely summable, thus X(n) has long memory if and only if a = 1 and d ∈ ] − 12 , 0], i.e., if a singularity is located on the unit circle. Another approach is to study mixing coefficients. Mixing coefficients measure a certain form of dependence between sigma-algebras. More precisely, if A and B are
Fractional Synthesis, Fractional Filters
287
two sigma-algebras, the strong mixing coefficient α is defined by:
P (A ∩ B) − P (A)P (B) α(A, B) = sup A∈A,B∈B
the mixing coefficient of absolute regularity β by:
β(A, B) = sup |P (B/A) − P (B)| B∈B
and the mixing coefficient of maximum correlation ρ by:
corr(X, Y ) ρ(A, B) = sup X∈L2 (A),Y ∈L2 (B)
It is said that a process X is mixing if the sigma-algebras generated by {X(t), t ∈ I} and {X(t), t ∈ J} have a mixing coefficient which tends towards 0 when d(I, J) tends towards infinity. Here, we will use the mixing coefficient:
0 , v ∈ Hn+∞ , Var (u) = Var (v) = 1 r(n) = sup |cov(u, v)|, u ∈ H−∞ where Hrs is the subspace of L2 generated by (X(j), j ∈ {r, . . . , s}). Then, (X(n)) is r-mixing if and only if F is a polynomial or if a > 1 and, in the latter case, r(n) = O(an ). If, moreover, (X(n)) is Gaussian, then (X(n)) is strongly mixing if and only if F is a polynomial or if a > 1 and, in this last case, (X(n)) is β-mixing with β(n) = O(an ). If (ε(n)) is an iid series with a probability density p such that: |p(x) − p(x + h)| dx C1 |h|
then, if a > 1, the process (X(n)) is β-mixing with β(n) = O(a−2n/3 ). According to the values of the parameters a and d, there are thus cases where the process X(n) is: – with long memory and not mixing; – with short memory and not mixing; – with short memory and mixing. 8.3.3. Parameter estimation In the context of long memory processes, there are two facets to estimation problems. The first is when long memory behaves like a parasitic phenomenon that tends to make traditional results obsolete. Examples of this kind are: regression parameters estimation with long memory noise, estimation of the marginal laws of a long memory process and rupture detection in a long memory process. The other aspect relates to the estimation of the parameters quantifying the memory length, i.e., in fact, spectral density parameters. Within this last framework, three types of methods were largely studied, depending on whether the model was completely or partially parameterized:
288
Scaling, Fractals and Wavelets
1) methods related to maximum likelihood, which, by nature, aim at estimating all the parameters of the model; 2) the estimation of the memory exponent d in ϕ(λ) = f ∗ (λ)|λ|−2d where ϕ is the spectral density; 3) the estimation of the autosimilarity parameter in fractional Brownian motion. For fractional ARMA processes whose transfer function has the form: F (z) = (1 − z)d
J )
(1 − αj z)dj ,
j=1
where dj are integer and where − 12 < d < 12 (referred to as ARFIMA), parametric methods give satisfactory results, provided that the order of the model is known a priori (see, for example, [GON 87]). When the order of the model is not known, semi-parametric methods, as in point 2), are fruitfully used. We can schematically describe these methods as follows. Let us use the flexible framework of models with spectral density: ϕ(eiλ ) = |1 − eiλ |−2d g(λ), ∞ where g can be written g(λ) = exp( k=0 θk cos kλ) and is a very regular function. The unique singularity of ϕ is located at λ = 0. Then, the natural tool to estimate the spectral density is the periodogram: InX (x)
n 1 | = X(t)eitx |2 2π t=1
The log of the periodogram estimates log ϕ: log ϕ(eiλ ) = −2d log |1 − eiλ | +
∞
θk cos kλ
k=0
After a truncation of the sum at the order q − 1, the parameters d, θ0 , θ1 ,. . . , θq−1 are estimated by a regression of log I(λ) against (−2 log |1 − eiλ |, cos kλ). The difficulty resides in two points: – the choice of q; – the study of the properties of the estimators. In [MOU 00], recent procedures based on penalization techniques give automatic methods for the choice of q, that can be tuned to the function g. The asymptotic
Fractional Synthesis, Fractional Filters
289
properties ensure convergences, for example towards Gaussian, of the estimator of d. Many estimators that rely on this principle were proposed, some of them being compared in [BARD 01, MOU 01]. When the singularities of the transfer function are not located at z = 1, another model is used. Let us assume that the transfer function is written: F (z) = (1 − eiλ0 z)d (1 − e−iλ0 z)d
J )
(1 − αj z)dj ,
j=1
with |αj | < 1, λ0 = 0. The spectral density takes the form ϕ(λ) = |1−ei(λ+λ0 ) |2d g(λ) with a very regular g. If λ0 is known, various authors [OPP 00] think that the estimate of d is made according to ideas developed for α = 1. When λ0 is not known, it has to be estimated. The idea is to use the frequency location where the periodogram takes its max. Although more sophisticated, the procedures elaborated by Yajima [YAJ 96], Hidalgo [HID 99] and Giraitis et al. [GIR 01], provide convergences in probability and in law of the estimators towards normal law. 8.3.4. Simulated example Several methods were proposed to simulate trajectories of fractional processes. Granger and Joyeux [GRAN 80] use an autoregressive approximation of order 100 obtained by truncating the AR(∞) representation combined with an initialization procedure based on the Cholesky decomposition. Geweke and Porter-Hudak [GEW 85] or Hosking [HOS 81] elaborate on the autocovariance and use a Levinson-Durbin-Whittle algorithm to generate an autoregressive approximation. In both cases, quality is not quantified – although it can be. Gray et al. [GRAY 89] approximate (X(n)) by a long MA obtained by truncating the MA (∞) representation. The second method seems inadequate in our case because there is no expression for the autocovariance function of fractional ARMA processes. These methods can easily be established thanks to differential equations (8.5). However, when (X(n)) is long-ranged, these methods require very long trajectories of white noise because of the slow decay of an ; for example, Gray et al. [GRAY 89] use moving averages of order around 290,000. Another idea consists of approximating F by a rational fraction B A and simulating the ARMA process with representation A(L)Y (n) = B(L)ε(n). This is the chosen approach for the example below. The algorithm used to calculate the polynomials A and B is developed by Baratchart et al. [BARA 91, BARA 98]. In principle, this algorithm, as with the theoretical results, is only valid when F has no singular point on the unit disc. However, it provides satisfactory results, from the perspective detailed below. The studied filter F reads: −0.2 −0.2 z − exp(−2iπ 0.231) F (z) = z − exp(2iπ 0.231)
290
Scaling, Fractals and Wavelets
It is approximated by a rational fraction whose numerator and denominator are of degrees 8 and 9, respectively. Figure 8.1 shows the localization of the poles and the zeros of this rational fraction and the singularities of F . 90 1 120
60
singularities zeros poles
0.8
0.6 150
30
0.4 0.2
180
0
330
210
300
240 270
Figure 8.1. Localization of the singularities () of the filter F (z) = (z − exp(2iπ 0.231))−0.2 (z − exp(−2iπ 0.231))−0.2 and of the poles (◦) and zeros (∗) of the rational fraction B approximating F A
It is worth noting that the rational fraction B A is stable; however, its poles and zeros are almost superimposed to the two singularities of the function F . In Figure 8.2, the amplitude of F on the unit circle (continuous line) is compared to that of the rational fraction (dotted line). The approximation is excellent away from a close vicinity around the singularities of F . Figure 8.3 is a simulated sample path of the ARMA process obtained with the rational approximation of the transfer function. We compare the impulse response of the process, its empirical covariance and spectral density (continuous line) with the impulse response, theoretical covariance and spectral density (dotted line), calculated directly from function F . The impulse response of the ARMA process satisfactorily matches the theoretical impulse response. The estimated spectral density for the ARMA process reasonably matches the theoretical spectral density. For small time lags, the autocorrelation calculated on the simulated process and the autocorrelation calculated as the inverse Fourier transform of the spectral density are very similar. This is no longer true for larger lags, which comes as no surprise because we have, on the one hand, a short memory process, and on the other, a long memory process.
Fractional Synthesis, Fractional Filters transfer function
module of the transfer function
4
20 15 modulus in dB
imaginairy part
2
0
−2
−4
291
10 5 0
0
2
4 6 real part
8
−5
10
Black diagram
−2 0 2 frequency in radians/second phase of the transfer function
20 20 phase in degrees
modulus in dB
15 10 5
10 0 −10
0 −20 −5
−20
−10 0 10 phase in degrees
20
−2 0 2 frequency in radians/second
Figure 8.2. Amplitude of F (z) (continuous line) and of the rational (z) (dotted line) approximating F on the unit circle fraction B A Simulated trajectories
Impulse reponse
3 1 2
0.8
1
0.6
0
0.4
−1
0.2 0
−2
−0.2 −3
0
50
100
0
Spectral density
2
50
100
Autocorrelation
10
0.2 0.1
1
10
0 −0.1
0
10
−0.2 −1
10
0
0.5
1
1.45
2
2.5
3
−0.3
0
20
40
60
Figure 8.3. Simulation of a sample path for a fractional ARMA process obtained from an approximating rational fraction. Comparison between simulated and theoretical impulse responses, spectral densities and autocorrelation functions
8.4. Continuous time fractional processes 8.4.1. A non-self-similar family: fractional processes designed from fractional filters FBMs were the first to be introduced as continuous time processes characterized by a fractional parameter [MAN 68]. They are interesting because they generalize ordinary Brownian motion while maintaining its Gaussian nature and self-similarity. However, consequently, they lose the independence of their increments. The key properties of these processes are governed by the unique parameter H. This simplicity
292
Scaling, Fractals and Wavelets
has its advantages and constraints. The design of continuous time processes, controlled by several parameters that make it possible to uncouple the local from long-range memory properties, is the subject of the present section. However, these new processes are not self-similar. Fractional ARMA processes are defined, as in the discrete case, by a fractional filter s, with (the complex number) parameters ak and dk : F (s) =
K )
(s − ak )dk
for Re(s) > a
(8.6)
k=1
The following notations are used: K 1) D = k=1 dk ; 2) E ∗ is the set of the singular points of F ; 3) a = max{Re(ak ), k ∈ E ∗ }; 4) E ∗∗ is the set of the indices of the singular points whose real part is equal to a; 5) d = min{Re(dk ), k ∈ E ∗∗ }. ¯
It is assumed that, if (ak , dk ) ∈ 2 , then the factor (s − a ¯k )dk is present in the right-hand side of (8.6). Under this hypothesis, D is real. Let us moreover assume, in this section, that D < 0, since the study of the case D > 0 is the subject of the next section. Then, the set E ∗ is not empty and the impulse response f , given by the inverse Laplace transform of F , is well-defined, real and locally integrable on + : f (t) =
1 2iπ
c+i∞
exp(st)F (s) ds
for t > 0,
c ∈ ]a, +∞[
(8.7)
c−i∞
No closed-form expression for f is available except when K = 1. Its behavior in the vicinity of 0+ is described by: f (t) ∼
t−(1+D) Γ(−D)
t → 0+
(8.8)
and, in the vicinity of +∞, by: f (t) ∼
λk t−(1+dk ) exp(ak t)
t → +∞
k∈E ∗∗
where λk are non-zero complex numbers depending on parameters ak and dk .
(8.9)
Fractional Synthesis, Fractional Filters
293
As in the discrete case, if D < 1−K, then f is the solution of the linear differential equation of order K whose coefficients are affine functions of t: K
(νj + tψj )f (j) (t) = 0
(8.10)
k=0
where νj and ψk are constants depending on parameters ak and dk . Then, the impulse responses enable us to define the processes (X(t)) as a stochastic integral of Brownian motion W (s): t f (t − s) dW (s) (8.11) X(t) = −∞
2
+
when f belongs to L ( ) and is given by (8.8). Then, (X(t)) is a zero-mean, stationary Gaussian process with spectral density # +∞ g(λ) = |F (iλ)|2 and with covariance function σ(t) = −∞ g(λ) exp(iλt) dλ. 8.4.2. Sample path properties: local and global regularity, memory Let us begin with the properties of the covariance function σ, from which the other properties are deduced. The memory properties of (X(t)) are given by the covariance behavior at infinity: 1) when a < 0, σ(t) = o(t−n ) when t → +∞ for all n ∈ N; −2dj −1 2) when a = 0, σ(t) ∼ , t → +∞, when γj are j∈E ∗∗ γj exp(aj t)t constants depending on F ; 3) σ is non-integrable if and only if a = 0 and d ∈ ] − 12 , 0[. For this latter case, (X(t)) is a long memory process. From the covariance behavior at infinity, we can also deduce that the process (X(t)) is strongly mixing if and only if a < 0. As in the discrete case, there are various memory-mixing scenarios. The regularity of the sample path is determined by the covariance behavior at the origin: 1) if D < − 32 , then σ(t) = σ(0)+γ1 t2 +o(t2+ε ), where γ1 is a non-zero constant and ε a positive number; 2) if D > − 32 , then σ(t) = σ(0)+γ2 t−2D−1 +o(t−2D−1 ), where γ2 is a non-zero constant; 3) if D = − 32 , then σ(t) = σ(0) +
t2 2
log t(1 + o(1)).
From this, we deduce that there is a process (Y (t)) equivalent to (X(t)), such that all the trajectories are Hölderian of exponent γ for all γ ∈ ]0, min(1, − 12 − D)[.
294
Scaling, Fractals and Wavelets
Moreover, if D < − 32 these trajectories are of class C 1 . The Hausdorff dimension of the sample paths of the process Y (t) is then equal to 1 if D < − 32 and to 52 + D if D ∈ [− 32 , − 12 [. In this studied family, there exist long memory processes X(t) (a = 0, d ∈ ] − 12 , 0[), which can be as regular (D < − 32 ) or irregular (D close to − 12 ) as desired. Conversely, there exist processes X(t) with short memory (a < 0) having arbitrary regularity, in contrast to fractional Brownian motion. However, only the processes obtained from F (s) = sd transfer functions are self-similar. 8.5. Distribution processes 8.5.1. Motivation and generalization of distribution processes To complement the survey of continuous time fractional ARMA processes, the idea that spontaneously comes to mind is to study the consequences of slackening the constraint D < 0. Then, the impulse response f is no longer a simple function, belonging to L2 , but can be defined as the inverse Laplace transform of F in the space D of distributions with support on + . Expression (8.11) no longer makes sense and it is necessary to define the process X differently. Distribution processes were introduced independently by Ito [ITO 54], and Gelfand and Vilenkin [GEL 64]. They can be defined in the following way: a second order distribution process X is a continuous linear application from the space of C ∞ test functions with compact support in , to the space of random variables whose moment of second order exists. We note: ∀ϕ ∈ C0∞ () X(ϕ) = #X, ϕ$ ∈ L2 (Ω) and we have, for all K compact of : ∃CK , ∃k, ∀ϕ ∈ C0∞ (K),
#X, ϕ$ L2 (Ω) CK ϕ k
Derivation, time shift, convolution and Fourier transform (denoted F(X) or ˆ are defined for distribution processes, as they are for distributions. Likewise, X) expectation, covariance and stationarity for distribution processes are defined as they are for continuous time processes. In particular (see [GEL 64]), if X is a second order stationary distribution process, then there exists η ∈ C such# that, for all ϕ ∈ C0∞ (), the expectation m of X is written m(ϕ) = E(X(ϕ)) = η ϕ(t) dt, and there exists a positive tempered measurement μ, called the spectral measurement, such that the # ˆ dμ(ξ) = ¯ covariance B of X is written B(ϕ, ψ) = E(X(ϕ)X(ψ)) = ϕ(ξ) ˆ ψ(ξ) ¯ If μ admits a density g with respect to the Lebesgue measure, g is called #σ, ϕ ∗ ψ$. the spectral density of X and then σ = F −1 (g). 8.5.2. The family of linear distribution processes Let us begin by putting forth the definition of this family. Let f be a function of L2 and X(t) the Gaussian process defined by: X(t) = f (t − s) dW (s)
Fractional Synthesis, Fractional Filters
295
where W (s) is the ordinary Brownian motion. Let fˇ(t) = f (−t). Then, X(t) is a distribution process if for ϕ ∈ C0∞ (): #X, ϕ$ = f (t − s) dW (s) ϕ(t) dt
It is easy to see that we can, in this case, permute the integrals and that, for all ϕ ∈ C0∞ (): f (t − s) dW (s) ϕ(t) dt = f (t − s)ϕ(t) dt dW (s)
It is then natural to introduce the following process: #X, ϕ$ = fˇ ∗ ϕ(s) dW (s)
(8.12)
with f ∈ D () such that fˇ ∗ ϕ(s) ∈ L2 (n ) for all ϕ ∈ C0∞ (n ). Let H −∞ () = ∪s∈ H s () be the Sobolev space, where H s () = {f ∈ S (), (1 + ξ 2 )s/2 |fˆ| ∈ L2 ()}, and S is the space of temperate distributions. It is shown that (8.12) properly defines a distribution process when the distribution f belongs to H −∞ (). The process X, with impulse response f , is then a zero-mean stationary Gaussian distribution process, with spectral density |fˆ(ξ)|2 and covariance σ = F −1 (|fˆ|2 ). As for continuous time processes, the regularity of a process X, with impulse response f ∈ H −∞ (n ), is characterized by the regularity of f : if f ∈ H −∞ (), the corresponding distribution process X belongs to the Hölder space C s (, L2 (Ω)) s (n ) (see [TRI 92] for the definition if and only if f belongs to the Besov space B2,∞ of these spaces, or Chapter 3). 8.5.3. Fractional distribution processes We can now define fractional ARMA processes for D > 0, i.e. when the inverse Laplace transform of f does not belong to L2 . In fact, a more general framework is available: let us suppose that F satisfies the two following assumptions: 1) F is holomorphic in the domain D = C \ {z, Re(z) a and |Im(z)| K|Re(z)|}; 2) there exists N > 0 and there exists C such that |F (z)| C(1 + |z|)N in D. This includes the functions F which verify (8.6). Under assumptions 1) and 2) above, f = L−1 (F ) exists [SCH 66], since L is defined by: ∀t > 0,
L(f )(s) = F (s) = #f (t), e−st $
for s ∈ C, Re(s) > a
296
Scaling, Fractals and Wavelets
The function f belongs to the space H −∞ () if a < 0 or (a = 0 and F (iξ) ∈ L2loc ()), this condition being satisfied in the case of fractional filters as long as d > − 12 if a = 0 and whatever the value of D. In fact, f belongs to the Besov −N −1/2 () and the associated process has regularity of order −N − 12 . space B2,∞ Moreover, if F verifies hypotheses 1) and 2) and if a < 0, then f is an analytical function for t > 0. More precisely, there exists K > 1 such that f has a holomorphic extension to {|Im(t)| K1 Re(t)}. If F is a fractional filter, these impulse responses are simple. They are zero on − and, except on {0}, they are very regular functions. The only serious irregularity is in 0, as can be seen by examining the following formula. Let δ denote the Dirac mass at t = 0, δ (j) its j th derivative in the sense of distributions and pv(tλ ) the principal value of tλ . If D ∈ N, the function f reads: f (t) = δ (D) (t) +
D
γj δ (D−j) (t) +
1
∞ D+1
γj
t−(D−j+1) Γ(j − D)
and if D ∈ \ N: f (t) =
∞ 1 pv(t−(D−j+1) ) vp(t−(D+1) ) + γj Γ(−D) Γ(−D + j) j=1
The function f is characterized by the same asymptotic behavior at infinity as in the continuous case and satisfies the same differential equation of order K with affine coefficients in t. Also, the regularity index of the distribution process X is −D − 12 . 8.5.4. Mixing and memory properties For distribution processes, memory properties also derive from the summability of the covariance function. As for the continuous case, a fractional ARMA distribution has a long memory, if and only if a = 0 and d ∈ ] − 12 , 0[. Regarding mixing properties, it is necessary to redefine the various coefficients extending usual definitions [DOU 94] to distribution processes. To this end, we replace the concept of distance in time by that of time-lag between supports of test functions. 0 and HT+∞ the Let X be a stationary distribution process; let us denote by H−∞ ∞ vectorial subspaces generated by {X(ϕ), ϕ ∈ C0 (] − ∞, 0])} and {X(ψ), ψ ∈ C0∞ ([T, +∞[)}. The mixing linear coefficient of the process X can then be defined by:
0 , Z ∈ HT+∞ rT = sup |corr(Y, Z)|, Y ∈ H−∞
This coefficient coincides with the usual linear coefficient of mixing when X is a time process. The other mixing coefficients can be defined similarly. In particular, linear mixing and ρ-mixing coefficients are equal and satisfy the following bounding relation: αT ρT 2παT
Fractional Synthesis, Fractional Filters
297
This implies that for Gaussian distribution processes, linear mixing, ρ-mixing and α-mixing are equivalent notions. Let us now assume that X is a linear distribution process, with transfer function F . A sufficient condition for the process X to be ρ-mixing reads: if F verifies assumptions 1) and 2), if F is bounded from below for large |z|, i.e.: there exists C, A such that, for |z| > A, C|z|N |F (z)| and moreover, if a < 0, then the ρ-mixing coefficient of the distribution process X tends towards 0 when T tends towards infinity. In this case, we obtain: ρT = O(ebT )
for all
b ∈ ]a, 0[.
However, if F has a singularity on the imaginary axis, i.e., if F can be written as F (z) = (z − iα)d G(z) with d ∈ C \ N, α ∈ and G continuous close to iα and G(iα) = 0, then the distribution process is not ρ-mixing. These two conditions yield the following result for fractional ARMA processes: the fractional distribution process, with transfer function F , is ρ-mixing if and only if a < 0. Then, ρT = O(ebT ) for all b ∈ ]a, 0[. This result complements the result obtained for continuous time processes, by providing an explicit mixing rate. Many authors [DOM 92, HAY 81, IBR 74, ROZ 63] studied the relation between the mixing properties and the spectral density of continuous time stationary processes. Their results either rely on more restrictive hypotheses than ours, but give better convergence speeds, or rely on hypotheses about functional spaces membership – difficult to verify in our case – but give necessary and sufficient conditions for the mixing coefficient to tend towards 0. 8.6. Bibliography [BARA 91] BARATCHART L., C ARDELLI M., C LIVI M., “Identification and rational L2 approximation: a gradient algorithm”, Automatica, vol. 27, no. 2, p. 413–418, 1991. [BARA 98] BARATCHART L., G RIMM J., L EBLOND J., O LIVI M., S EYFERT F., W IELONSKY F., Identification d’un filtre hyperfréquences par approximation dans le domaine complexe, Technical Report 219, INRIA, 1998. [BARD 01] BARDET J.M., L ANG G., O PPENHEIM G., TAQQU M., P HILIPPE A., S TOEV S., “Semi-parametric estimation of long-range dependence parameter: a survey”, in Long-range Dependance: Theory and Applications, Birkhäuser, Boston, Massachusetts, 2001. [BER 94] B ERAN J., Statistics for Long-memory Processes, Chapman & Hall, New York, 1994.
298
Scaling, Fractals and Wavelets
[BON 92] B ONNET C., Réduction de systèmes linéaires discrets de dimension infinie: étude de filtres fractionnaires, RAIRO Automat-Prod. Inform. Ind., vol. 26, no. 5-6, p. 399–422, 1992. [COX 84] C OX D.R., “Long-range dependence: a review”, in DAVID H.A., DAVID H.T. (ed.), Proceedings of the Fiftieth Anniversary Conference Iowa State, Iowa State University Press, p. 55–74, 1984. [CUR 86] C URTAIN R.F., G LOVER K., “Balanced realisation for infinite-dimensional systems”, in Operator Theory and Systems, Birkhäuser, Boston, Massachusetts, 1986. [DOM 92] D OMINGUEZ M., “Mixing coefficient, generalized maximal correlation coefficients, and weakly positive measures”, J. Multivariate Anal., vol. 43, no. 1, p. 110–124, 1992. [DOU 94] D OUKHAN P., Mixing Properties and Examples, Springer-Verlag, Lectures Notes in Statistics, 1994. [GEL 64] G ELFAND I.M., V ILENKIN N.Y., Generalized Functions, vol. 4, Academic Press, New York, 1964. [GEW 85] G EWEKE J., P ORTER -H UDAK S., “The estimation and application of long time series models”, J. Time Series Anal., vol. 4, p. 221–238, 1985. [GIR 01] G IRAITIS L., H IDALGO J., ROBINSON P.M., “Gaussian estimation of parametric spectral density with unknown pole”, Annals of Statistics, vol. 29, no. 4, p. 987–1023, 2001. [GON 87] G ONÇALVÈS E., “Une généralisation des processus ARMA”, Annales d’économie et de statistiques, vol. 5, p. 109–146, 1987. [GRAN 80] G RANGER C.W.J., J OYEUX R., “An introduction to long-memory time series models and fractional differencing”, J. Time Ser. Anal., vol. 1, p. 15–29, 1980. [GRAY 89] G RAY H.L., Z HANG N.F., W OODWARD W.A., “On generalized fractional processes”, J. Time Ser. Anal., vol. 10, no. 3, p. 233–256, 1989. [HAY 81] H AYASHI E., “The spectral density of a strongly mixing stationary Gaussian process”, Pacific Journal of Mathematics, vol. 96, no. 2, p. 343–359, 1981. [HID 99] H IDALGO J., “Estimation of the pole of long memory processes”, Mimeo, London School of Economics, 1999. [HOS 81] H OSKING J.R.M., “Fractional differencing”, Biometrika, vol. 68, p. 165–176, 1981. [IBR 74] I BRAGIMOV I., ROZANOV Y., Processus aléatoires Gaussiens, Mir, Moscou, 1974. [ITO 54] I TO K., “Stationary random distributions”, Mem. Coll. Sci. Kyoto Univ. Series A, vol. 28, no. 3, p. 209–223, 1954. [MAN 68] M ANDELBROT B.B., VAN N ESS J.W., “Fractional Brownian motions, fractional noises, and applications”, SIAM Review, vol. 10, no. 4, p. 422–437, 1968. [MIL 93] M ILLER K.S., ROSS B., An Introduction to Fractional Calculus and Fractional Differential Equations, John Wiley & Sons, 1993. [MOU 00] M OULINES E., S OULIER P., “Confidence sets via empirical likelihood: broadband log-periodogram regression of time series with long-range dependence”, Annals of Statistics, vol. 27, no. 4, p. 1415–1439, 2000.
Fractional Synthesis, Fractional Filters
299
[MOU 01] M OULINES E., S OULIER P., “Semiparametric spectral estimation for fractional processes”, in Long-range Dependence: Theory and Applications, Birkhäuser, Boston, Massachusetts, 2001. [OLD 74] O LDHAM K.B., S PANNIER J., The Fractional Calculus, Academic Press, 1974. [OPP 00] O PPENHEIM G., O ULD H AYE M., V IANO M.C., “Long-memory with seasonal effects”, Statistical Inference for Stochastic Processes, vol. 3, p. 53–68, 2000. [OUS 91] O USTALOUP A., La commande Crone, commande robuste d’ordre non entier, Hermes, Paris, 1991. [ROZ 63] ROZANOV Y., Stochastic Random Processes, Holdenday, 1963. [SCH 66] S CHWARTZ L., Théorie des distributions, Hermann, Paris, 1966. [TAQ 92] TAQQU M.S., “A bibliographical guide to self-similar processes and long-range dependence”, in Dependence in Probability and Statistics, Birkhäuser, 1992. [TRI 92] T RIEBEL H., Theory of Function Spaces, Birkhäuser, 1992. [WHI 86] W HITFIELD A.H., “Transfer function synthesis using frequency response data”, Int. J. Control, vol. 43, no. 5, p. 1413–1426, 1986. [YAJ 96] YAJIMA Y., “Estimation of the frequency of unbounded spectral densities”, ASA Proc. Business and Economic Statistics, Section 4-7, Amer. Statist. Assoc., Alexandria, VA.
This page intentionally left blank
Chapter 9
Iterated Function Systems and Some Generalizations: Local Regularity Analysis and Multifractal Modeling of Signals
9.1. Introduction There are many ways of carrying out the fractal analysis of a signal: evaluation and comparison of various measures and dimensions (for example, Hausdorff [FAL 90] or packing [TRIC 82], lacunarity [MAND 93], etc.). The objective of this chapter is to describe in detail two types of fractal characterizations: – analysis of the pointwise Hölderian regularity; – multifractal analysis and modeling. The first characterization enables us to describe the irregularities of a function f (t) by associating it with its Hölder function αf (t) which gives, at each point t, the value of the Hölder exponent of f . The smaller αf (t) is, the more irregular the function f is. A negative exponent indicates a discontinuity, whereas if αf (t) is strictly superior to 1, f is differentiable at least once at t. The characterization of signals through their Hölderian regularity has been studied by many authors from a theoretical point of view. For instance, it is related to wavelet decompositions [JAF 89, JAF 91, JAF 92, MEY 90a], signal processing applications [LEV 95, MAL 92] such as denoising, turbulence analysis [BAC 91] and image segmentation [LEV 96]. This approach is particularly relevant when the information
Chapter written by Khalid DAOUDI.
302
Scaling, Fractals and Wavelets
resides in the signal irregularity rather than, for example, in its amplitude or in its Fourier transform (this is notably the case for edge detection in image processing). The first part of this chapter is thus devoted to studying the properties of the Hölder function of signals. The question that naturally arises is the following: given a continuous function f on [0, 1], what is the most general form that can be taken by αf ? By generalizing the notion of iterated function systems (IFS), we answer this question by characterizing the class of functions αf and by giving an explicit method to construct a function whose Hölder function is prescribed in this class. This generalization enables us to define a new class of functions, that of generalized iterated function systems (GIFS). This will allow the development of a new approach to estimate the Hölder function of a given signal. An interesting feature of the Hölder function is that it can be very simple while the signal is irregular. For example, although they are nowhere differentiable, Weierstrass function [WEI 95] and fractional Brownian motion (FBM) [MAND 68] have a constant Hölder function. However, there are signals with very irregular appearance for which the Hölder function is even more irregular; e.g. continuous signals f such that αf is discontinuous everywhere. While the canonical example is that of IFS, it turns out to be more interesting to use another description for the signal: the multifractal spectrum. Instead of attributing to each t the value of the Hölder function, all the points with same exponent α in a sub-set Eα are aggregated and the irregularity is characterized in a global manner by calculating, for each value of α, the Hausdorff dimension of the set Eα . This yields a geometric estimation of the “size” of the subparts of the support of f where a given singularity appears. This type of analysis, first referenced in [MAND 72, MAND 74] and in the context of turbulence [FRI 95], has since been used often. It has been studied at a theoretical level (analysis of self-similar measures or functions in a deterministic [BAC 93, OLS 02, RIE 94, RIE 95] and random [ARB 02, FAL 94, GRA 87, MAND 89, MAU 86, OLS 94] context, extension to capacities, higher order of spectra [VOJ 95]) and directly applied (study of DLA sequences [EVE 92, MAND 91], analysis of earthquake distribution [HIR 02], signal processing [LEV 95] and traffic analysis). The second part of this chapter deals with multifractal analysis. Self-similar functions constitutes the paradigm of “multifractal signals”, as most of the quantities of interest can be explicitly calculated. In particular, it has been demonstrated that the multifractal formalism, which connects the Hausdorff multifractal spectrum to the Legendre transform of a partition function, holds for self-similar measures and functions, and various extensions of the latter were considered [RIE 95]. The multifractal formalism enables us to reduce the calculation of the Hausdorff spectrum
IFS: Local Regularity Analysis and Multifractal Modeling of Signals
303
(which is very complex, as it not only entails the determination all Hölder exponents but also the deduction of an infinity of Hausdorff dimension) to a simple calculation of a partition function limit, which offers less difficulty, both theoretically and numerically. In practice, self-similar functions and their immediate extensions are, most of the time, too rigid to model real signals, such as, for instance, speech signals, in an appropriate way. In this chapter, we present a generalization – weak self-affine (WSA) functions – that offers a satisfactory trade-off between flexibility and complexity in modeling. Weak self-affine functions are essentially self-similar functions for which the renormalization parameters can differ from one scale to another, while verifying certain conditions which enable us to preserve the multiplicative structure. The Hausdorff multifractal spectrum of WSA functions is calculated and the validity of the multifractal formalism proven. We then explain how to use WSA functions to model and segment real signals. We also show how modeling through WSA functions can be used to estimate non-concave multifractal spectra. This chapter is organized as follows. In the following section, we give the definition of the Hölder exponent. Then, we describe the concept of iterated function systems (IFS) and analyze the local regularity behavior of affine iterated function systems which generate the continuous function graphs. In section 9.4, which constitutes the core of this chapter, we propose some generalizations of IFS which enables us to solve the problem of characterizing Hölder functions. In section 9.5, we introduce a method to estimate the Hölder exponent, based on GIFS and evaluate its performance from numerical simulations. In section 9.6, we address multifractal analysis and modeling. We introduce WSA functions and prove their multifractal formalism. In sections 9.7 and 9.8, we show how to represent and segment real signals by WSA functions. section 9.9 is devoted to the estimation of the multifractal spectrum through the WSA approach. Finally in section 9.10 we present some numerical experiments. 9.2. Definition of the Hölder exponent The Hölder exponent is a parameter which quantifies the local (or pointwise) regularity of a function around a point.1 Let us first define the pointwise Hölder space. ∗ Let I be an interval of , and F a continuous function on I in and β + \N.
1. The Hölder exponent can also be defined for measures or functions of sets in general.
304
Scaling, Fractals and Wavelets
DEFINITION 9.1.– Let t0 ∈ I. The function f belongs to the pointwise Hölder space C β (t0 ) if and only if there exists a polynomial P of degree less than or equal to the integer part of β and a positive constant C such that for any t in the neighborhood of t0 : f (t) − P t − t0 ≤ C t − t0 β Let us note that if β ∈ N∗ , the space C β has to be replaced by the Zygmund β-class [MEY 90a, MEY 90b]. DEFINITION 9.2.– A function f is said to be of Hölder exponent β at t0 if and only if: 1) for any scalar γ < β: lim
h→0
|f (t0 + h) − P (h)| =0 |h|γ
2) if β < +∞, for any scalar γ > β: lim sup h→0
|f (t0 + h) − P (h)| = +∞ |h|γ
where P is a polynomial of degree less than or equal to the integer part of β. If β < +∞, this is equivalent to: f∈ C β− (t0 )
but
>0
f ∈
+
C β+ (t0 )
>0
This is also equivalent to: β = sup{θ > 0 : f ∈ C θ (t0 )} 9.3. Iterated function systems (IFS) Let K be a complete metric space, with distance d. Given m continuous functions Sn of K in K, we call an iterated functions system (IFS) the family {K, Sn : n = 1, 2, . . . , m}. Let H be the set of all non-empty closed parts of K. Then the set H is a compact metric space for the Hausdorff distance h [HUT 81] defined, for any A, B in H, by: h(A, B) = max sup inf d(x, y), sup inf d(x, y) x∈A y∈B
x∈B y∈A
IFS: Local Regularity Analysis and Multifractal Modeling of Signals
305
Let us consider the operator W : H → H defined by: W (G) =
m +
Sn (G)
for all G ∈ H
n=1
We call any set A ∈ H which is a fixed point of W an attractor of the IFS {K, Sn : n = 1, 2, . . . , m}, i.e., it verifies: W (A) = A An IFS always possesses at least one attractor. Indeed, given any set G ∈ H, (m) (G) = the closure of the accumulation set of points {W (m) (G)}∞ m=1 , with W (m−1) (G)), is a fixed point of W . W (W If all Sn functions are contractions, then the IFS is said to be hyperbolic. In this case, W is also a contraction for the Hausdorff metric; thus, it possesses a single fixed point which is the single attractor of the IFS. When the IFS is hyperbolic, the attractor can be obtained in the following manner [BAR 85a]: let p = (p1 , . . . , pm ) be a probability vector with pn > 0 and n pn = 1. From the fixed point x0 of S1 , let us define the sequence xi by successively choosing xi ∈ {S1 (xi−1 ), . . . , Sm (xi−1 )}, where the probability pn is linked to the occurrence xi = Sn (xi−1 ). Then, the attractor is the closure of the trajectory {xi }i∈N . In this chapter, we focus on IFS which make it possible to generate continuous function graphs [BAR 85a]. Given a set of points {(xn , yn ) ∈ [0; 1] × [u; v], n = 0, 1, . . . , m}, with (u, v) ∈ 2 , let us consider the IFS given by m contractions Sn (n = 1, . . . , m) which are defined on [0; 1] × [u; v] by: Sn (x, y) = Ln (x); Fn (x, y) where Ln is a contraction which transforms [0; 1] into [xn−1 ; xn ] and where Fn : [0; 1] × [u; v] → [u; v] is a contraction with respect to the second variable, which satisfies: Fn (x0 , y0 ) = yn−1 ; Fn (xm , ym ) = yn
(9.1)
Then, the attractor of this IFS is the continuous function graph which interpolates the points (xn , yn ). In general, this type of function is called a fractal interpolation function [BAR 85a]. The most studied class of IFS is that of affine iterated function systems, i.e., IFS for which Ln and Fn are affine functions. We will study this class later. We also assume that the interpolation points are equally spaced. Then, Sn (0 ≤ n < m) can be written in a matrix form as: t 1/m 0 t n/m = + Sn x an cn x bn
306
Scaling, Fractals and Wavelets
Let f be the function whose graph is the attractor of the corresponding IFS. Let us note that once cn is fixed, an and bn are uniquely determined by (9.1) so as to ensure the continuity of f . We are now going to calculate the Hölder function of f and see if we can control the local regularity with these affine iterated function systems. PROPOSITION 9.1 ([DAO 98]).– Let t ∈ [0; 1) and 0 · i1 . . . ik . . . be its base m decomposition (when t possesses two decompositions, we select the one with a finite number of digits). Then: log(ci1 . . . cik ) log(cj1 . . . cjk ) log(cl1 . . . clk ) , lim inf , lim inf αf (t) = min lim inf k→+∞ k→+∞ k→+∞ log(m−k ) log(m−k ) log(m−k ) where, for any integer k, if we note tk = m−k [mk t], the k-tuples (j1 , . . . , jk ) and (l1 , . . . , lk ) are given by: −k = t+ k = tk + m
k
jp m−p
p=1 −k t− = k = tk − m
k
lp m−p
p=1
COROLLARY 9.1 ([DAO 98]).– Let t ∈ [0; 1). If, for every i ∈ {0, . . . , m − 1}, the proportion φi (t) of i in the base m decomposition of t exists, then: αf (t) = −
m−1
φi (t) logm ci
i=0
This corollary clearly shows that we cannot control the regularity at each point using IFS. Indeed, the almost sure value of φi (t) w.r.t. the Lebesgue measure is 1 m , hence almost all the points have the same Hölder exponent. However, we can easily construct a continuous function whose Hölder function is not constant almost everywhere. In the next section, we propose a generalization of IFS that offers more flexibility in the choice of the Hölder function. 9.4. Generalization of iterated function systems The principal idea from which the generalization of IFS is inspired lies in the following question: what happens, in terms of regularity, if the Si contractions are allowed to vary at every iteration in the process of attractor generation? However, raising this question entails that we first answer the preliminary issue: does an “attractor” exist in this case? Andersson [AND 92] studied this problem and found satisfactory conditions for the existence and uniqueness of an attractor when the Si vary.
IFS: Local Regularity Analysis and Multifractal Modeling of Signals
307
Formally, let us consider the collection of sets (F k )k∈N∗ where each F k is a (not empty) finite set of contractions Sik in K for i = 0, . . . , Nk − 1, Nk ≥ 1 being the cardinal of F k , while cki denotes the contraction factor of Sik . For n ∈ N∗ , let 'nNi be the set of sequences of length n, defined by:
'nNi = σ = (σ1 , . . . , σn ) : σi ∈ {0, . . . , Ni − 1}, i ∈ N∗ and let:
∗ '∞ Ni = σ = (σ1 , σ2 , . . .) : σi ∈ {0, . . . , Ni − 1}, i ∈ N
For any k, let us consider the operator W k : H → H defined by: Nk +
k
W (A) =
Snk (A)
for A ∈ H
n=1
Let us define the conditions: (c)
sup
lim
n→∞ (σ ,...,σ )∈n 1 n N
i
(c )
lim
sup
n→∞ (σ ,σ ,...)∈∞ 1 2 N
i
n )
ckσk
=0
k=1 ∞
d(Sσj+1 x, x) j+1
j=n
j )
ckσk
= 0 ∀x ∈ K
k=1
Andersson proved the below result. PROPOSITION 9.2 ([AND 92]).– If conditions (c) and (c ) are satisfied, then there exists a unique compact set A ⊂ K such that: lim W k ◦ . . . ◦ W 1 (G) = A
k→∞
for all G ∈ H
A is called the attractor of the IFS (K, [0]{F k }k∈N∗ ). 9.4.1. Semi-generalized iterated function systems We now consider the case where the Sik are affine and where the Nk are constant. Let F k be a set of affine contractions Sik (0 ≤ i < m) whose matrix representation reads: 1/m 0 i/m t t = + Sik x x cki aki bki Let us assume that conditions (c) and (c ) are satisfied. Then, if aki and bki satisfy similar relations as (9.1), we can show, by using the same techniques as
308
Scaling, Fractals and Wavelets
in [BAR 85a], that the attractor of the semi-generalized IFS (K, {F k }k∈N ) is the graph of a continuous function f . As for typical IFS, let us now verify whether the expression of the Hölder function for semi-generalized IFS allows us to control the local regularity. PROPOSITION 9.3 ([DAO 98]).– Let t ∈ [0; 1). Then: log(c1j1 . . . ckjk ) log(c1l1 . . . cklk ) log(c1i1 . . . ckik ) , lim inf , lim inf αf (t) = min lim inf k→+∞ k→+∞ k→+∞ log(m−k ) log(m−k ) log(m−k ) where ip , jp and lp are defined as in Proposition 9.1. Although the Hölder function of semi-generalized IFS describes a broader class than that of standard IFS, it still remains very restrictive (as far as the problem at hand is concerned). Indeed, it is easy to observe that two scalars which only differ in a finite number of digits in their base m decomposition have the same Hölder exponent – whereas it is easy to construct a continuous function whose Hölder function does not satisfy this constraint. It remains thus impossible to control the local regularity at each point by using the semi-generalized IFS. 9.4.2. Generalized iterated function systems Let us now consider a more flexible extension than semi-generalized IFS, by allowing the number and support of Sik to vary through iterations. More precisely, let F k be the set of affine contractions Sik (0 ≤ i ≤ mk − 1), where each Sik only operates on [[ mi ]m−k+1 ; ([ mi ] + 1)m−k+1 ] and has values in [im−k ; (i + 1)m−k ]. Then, the matrix representation of Sik becomes: 1/m 0 t t i/mk = + Sik x x cki aki bki We call (K, (F k )) a GIFS. Given cki , the following construction yields an attractor which is the graph of a continuous function f , that interpolates a set of given points {( mi , yi ), i = 0, . . . , m} (for simplicity, we consider the case m = 2, although the general case can be treated in a similar way). Consider the graph of a non-affine continuous function φ on [0; 1], we note: φ(0) = u,
φ(1) = v
then we choose aki and bki so that the following conditions hold. For i = 0, 1: i i + 1 , yi , Si1 (1, v) = , yi+1 Si1 (0, u) = m m S02 (0, y0 ) = (0, y0 ), S02 (1/2, y1 ) = S12 (0, y0 ), S12 (1/2, y1 ) = (1/2, y1 ) S22 (1/2, y1 ) = (1/2, y1 ), S22 (1, y2 ) = S32 (1/2, y1 ), S32 (1, y2 ) = (1, y2 )
IFS: Local Regularity Analysis and Multifractal Modeling of Signals
309
For k > 2 and i = 0, . . . , 2k − 1: 1) if i is even, then: a) if i < 2k−1 : Sik ◦ S k−1 ◦ S[k−2 ◦ . . . ◦ S[2 i i ]
i 2k−2
22
2
] (0, y0 )
◦ S[k−2 ◦ . . . ◦ S[2 = S k−1 i i ]
i 2k−2
22
2
◦ S[k−2 ◦ . . . ◦ S[2 Sik ◦ S k−1 i i ]
i 2k−2
22
2
] (0, y0 )
] (1/2, y1 )
k−2 k 2 ◦ S[k−1 = Si+1 i+1 ◦ S i+1 ◦ . . . ◦ S[ i+1 ] (0, y0 ) ] ] [ 22
2
2k−2
b) if i ≥ 2k−1 : ◦ S[k−2 ◦ . . . ◦ S[2 Sik ◦ S k−1 i i ]
i 2k−2
22
2
] (1/2, y1 )
◦ S[k−2 ◦ . . . ◦ S[2 = S k−1 i i ]
i 2k−2
22
2
◦ S[k−2 ◦ . . . ◦ S[2 Sik ◦ S k−1 i i ]
i 2k−2
22
2
] (1/2, y1 )
] (1, y2 )
k−2 k 2 ◦ S[k−1 = Si+1 i+1 ◦ S i+1 ◦ . . . ◦ S[ i+1 ] (1/2, y1 ) [ ] ] 22
2
2k−2
2) if i is odd, then: a) if i < 2k−1 : Sik ◦ S[k−1 ◦ S[k−2 ◦ . . . ◦ S[2 i i ] ]
i 2k−2
22
2
] (1/2, y1 )
◦ S[k−2 ◦ . . . ◦ S[2 = S[k−1 i i ] ]
i 2k−2
22
2
] (1/2, y1 )
b) if i ≥ 2k−1 : ◦ S[k−2 ◦ . . . ◦ S[2 Sik ◦ S[k−1 i i ] ]
i 2k−2
22
2
] (1, y2 )
◦ S[k−2 ◦ . . . ◦ S[2 = S[k−1 i i ] ] 2
22
i 2k−2
] (1, y2 )
This set of conditions, which we call continuity conditions, ensures that f is a continuous function that interpolates the points ( mi , yi ). The Hölder function f is given by the following proposition. PROPOSITION 9.4.– Let us assume that the conditions (c) and (c ) are satisfied. Then, the attractor of the GIFS, defined above, is the graph of a continuous function f such that: i = yi ∀i = 0, . . . , m f m and: αf (t) = min(α1 , α2 , α3 )
310
Scaling, Fractals and Wavelets
where: ⎧ log ckmk−1 i1 +mk−2 i2 +...+mik−1 +ik . . . c2mi1 +i2 c1i1 ⎪ ⎪ ⎪ α1 = lim inf ⎪ ⎪ k→+∞ log(m−k ) ⎪ ⎪ k ⎪ ⎨ log cmk−1 j1 +mk−2 j2 +...+mjk−1 +jk . . . c2mj1 +j2 c1j1 α2 = lim inf ⎪ k→+∞ log(m−k ) ⎪ ⎪ k ⎪ ⎪ ⎪ log cmk−1 l1 +mk−2 l2 +...+mlk−1 +lk . . . c2ml1 +l2 c1l1 ⎪ ⎪ ⎩α3 = lim inf k→+∞ log(m−k )
(9.2)
and where ip , jp and lp are defined as in Proposition 9.1. 1 NOTE 9.1.– Given m real numbers u1 , . . . , um ∈ ] m ; 1[, let us define, for any k ≥ 1 k k and for any i ∈ {0, . . . , m − 1}, ci as:
cki = ri+1−m[ mi ] In this case, we recover the original construction of the usual IFS, considered in section 9.3. We now prove that GIFS allow us to solve the problem of characterizing the Hölder functions. Indeed, we have the following main result. THEOREM 9.1.– Let s be a function of [0; 1] in [0; 1]. The following conditions are equivalent: 1) s is the Hölder function of a continuous function f of [0; 1] in ; 2) there is a sequence (sn )n≥1 of continuous functions such that: s(x) = lim inf sn (x), n→+∞
∀x ∈ [0; 1]
The implication of 1) ⇒ 2) is relatively easy and can be found in [DAO 98]. Hereafter, we present a constructive proof of the converse implication. To do so, let H denote the set of functions of [0; 1] in [0; 1] which are inferior limits of continuous functions. We need the following lemma. LEMMA 9.1 ([DAO 98]).– Let s ∈ H. There exists a sequence {Rn }n≥1 of piecewise polynomials such that: ⎧ s(t) = lim inf n→+∞ Rn (t) ∀t ∈ [0; 1] ⎪ ⎪ ⎪ ⎪ ⎨ + −
Rn ∞ ≤ n, Rn ∞ ≤ n ∀n ≥ 1 (9.3) ⎪ ⎪ ⎪ 1 ⎪ ⎩ Rn ∞ ≥ log n −
where Rn and Rn are the right and left derivatives of Rn , respectively. +
IFS: Local Regularity Analysis and Multifractal Modeling of Signals
311
Let {Rn }n≥1 be the sequence given by (9.3) and M be the set of m-adic points of [0; 1]. Now let us consider the sequence {rk }k≥1 of functions on M in defined, for k0 ip m−p , by: any t ∈ M such that t = p=1 r1 (t) = R1 (i1 m−1 ) rk (t) = kRk (t) − (k − 1)Rk−1
$k−1
% ip m
−p
for k = 2, . . . , k0
p=1
rk (t) = kRk (t) − (k − 1)Rk−1 (t)
for k > k0
Thanks to the continuity conditions, finding a GIFS whose attractor satisfies 1) amounts to determining the double sequence (cki )i,k . The latter is given by the following result. PROPOSITION 9.5 ([DAO 98]).– Let s ∈ H and {rk }k≥1 be the previously defined sequence. Then, the attractor of the GIFS whose contraction factors are given by: cki = m−rk (im
−k
)
is the graph of the continuous function f satisfying: αf (t) = s(t)
∀t ∈ [0; 1]
This result provides an explicit method, fast and easy to execute, that allows the construction of interpolating continuous functions whose Hölder function is prescribed in the class of inferior limits of continuous functions (situated between the first and second Baire classes). Let us underline that there are two other constructive approaches to prescribe Hölder functions. One of them is based on a generalization of Weierstrass function [DAO 98] and the other is based on the wavelet decomposition [JAF 95]. This section is concluded with some numerical simulations. Figures 9.1 and 9.2 show the attractors of GIFS with prescribed Hölder functions. Figure 9.1 (respectively 9.2) shows the graph obtained when s(t) = t (respectively s(t) = | sin(5πt)|). In both cases, the set of interpolation points is: 2 3 4 1 ,1 ; ,1 ; ,1 ; , 1 ; (1, 0) (0, 0); 5 5 5 5 9.5. Estimation of pointwise Hölder exponent by GIFS In this section, we address the problem of the estimation of the Hölder exponent for a given discrete time signal. Our approach, based on GIFS, is to be compared
312
Scaling, Fractals and Wavelets 5 "f" 4
3
2
1
0
-1
-2
0
0.2
0.4
0.6
0.8
1
Figure 9.1. Attractor of a GIFS whose Hölder function is s(t) = t
5 "f" 4
3
2
1
0
-1
-2
-3
0
0.2
0.4
0.6
0.8
1
Figure 9.2. Attractor of a GIFS whose Hölder function is s(t) = | sin(5πt)|
with two other methods which make it possible to obtain satisfactory estimations. The first method is based on the wavelet transform and is called the wavelet transform maxima modules (WTMM) (cf. Chapter 3 for a detailed description of this method). The second method [GON 92b] uses Wigner-Ville distributions. 9.5.1. Principles of the method For the sake of simplicity, our study is limited to continuous functions on [0; 1]. The calculation algorithm of the Hölder exponent is based on Proposition 9.4. To apply this proposition to the calculation of the Hölder exponents for a real continuous signal f , we have to begin by calculating the coefficients cjk of a GIFS whose attractor is f . This amounts to solving the “inverse problem” for GIFS, which is a generalization of the ordinary inverse problem for IFS. The latter problem was studied by many authors, either from a theoretical point of view [ABE 92, BAR 88, BAR 85a, BAR 85b, BAR 86, CEN 93, DUV 92, FOR 94, FOR 95, VRS 90] or in
IFS: Local Regularity Analysis and Multifractal Modeling of Signals
313
physics applications [MANT 89, VRS 91b], image compression [BAR 93a, BAR 93b, CAB 92, FIS 93, JAC 93a, JAC 93b, VRS 91a] or signal processing [MAZ 92]. The inverse problem is defined as follows: “given a signal f , find a (generalized) iterated function system whose attractor approximates at best f , in the sense of a fixed norm”. This problem is extremely complex in the case of the IFS, yet becomes much easier for that of GIFS. In particular, it is shown in [DAO 98] that, for a function f ∈ C 0 ([0; 1]), we can find a GIFS whose attractor is as close to f as wished, in the ||.||∞ sense. This is obviously different when IFS have a finite number of functions. However, the consequence of simplifying and improving the approximation, is that we move from a finite to an infinite number of parameters in the modeling. A practical solution to the inverse problem for GIFS is detailed in the next section. For now, let us note that in the particular case where: ⎧ ⎨m = 2 ⎩Lk (t) = i t + i i m mk if we write: 1 1 sjk = f (k + )2−j − f (k2−j ) + f (k + 1)2−j 2 2
∀k ∈ {0, . . . , 2j − 1}
and: cjk =
sjk sj−1 [k]
∀k = 0, . . . , 2j − 1
(9.4)
2
then, since m and Lki are fixed, the inverse problem is solved if cjk satisfy Conditions (c) and (c ). Indeed, in this case, the attractor of the GIFS defined by the interpolation points {(0, 0); ( 12 , f ( 12 )); (1, 0)} and the coefficients cji is the graph of f . To calculate the Hölder function of f , we note:
1 S = f ∈ C 0 ([0; 1]) : |cjk | ∈ [ + ; 1 − ] 2 where cjk are determined by (9.4) and where is a strictly positive fixed scalar (as small as we wish). If we have f ∈ S, we can apply Proposition 9.4 and the pointwise Hölder exponents of f ∈ S are obtained by applying this proposition. NOTE 9.2.– If we have f (0) = f (1) = 0, then: j f (x) = sk θj,k (x) j≥0 0≤k<2j
314
Scaling, Fractals and Wavelets
where: θj,k (x) = θ(2j x − k)
and: θ(x) =
1 − |2x − 1| if x ∈ [0; 1] 0 if x ∈ [0; 1]
which is simply the decomposition of f in the Schauder basis. 9.5.2. Algorithm Let f ∈ C 0 ([0; 1]). To simplify notations, let us suppose that f (0) = f (1) = 0. The function f is then written as: j sk θj,k (x) f (x) = j≥0 0≤k<2j
The method described in the previous section allows us to calculate the Hölder exponents only when f belongs to S. As far as the general case is concerned, it is not possible to calculate αf , and we can only obtain an approximation. For this purpose, we construct a function f˜ satisfying: f˜ ∈ S ∀x ∈ [0; 1], ∀g ∈ S : |αf (x) − αf˜(x)| ≤ |αf (x) − αg (x)| and we calculate αf˜ instead of αf . This method is reminiscent, in a sense, of that consisting of filtering a signal before sampling it: if the sampling frequency ωe is less than 2ωmax , where ωmax designates the equivalent signal bandwidth, then the direct sampling of the signal will lead to information loss. When ωe < 2ωmax , the only part of the signal that can be used is that at “low frequency” and the best way to do it is to consider a frequency domain approximation of the signal prior to carrying out the sampling. In the present case, we are searching for an approximation f˜ of f in the sense described earlier and which then calculates αf˜. Obviously, this approach will only give appropriate results if f can really be approximated according to the criteria that have been defined. Applications show that this is often the case. Let us emphasize the fact that f˜ is, in general, an improper approximation of f in the sense of the ||.||L2 or ||.||∞ norms and that only αf˜ is close to αf .
IFS: Local Regularity Analysis and Multifractal Modeling of Signals
315
We can easily verify, according to the above-mentioned hypothesis, that f˜ is obtained as the attractor of the GIFS defined by: m=2 i i t+ k m m ⎧ 1 ⎪ j j ⎪ ;1 if |ck | ∈: ⎪ ⎪ck ⎪ 2 ⎨ c˜jk = 1 − if |cjk | > 1 − ⎪ ⎪ ⎪ ⎪ ⎪ 1 + if |cj | < 1 + ⎩ k 2 2
Lki (t) =
9.5.3. Application We are now going to explain how to practically solve the inverse problem. Let F = {f (i), i = 0, . . . , 2J } be a given discrete signal. For j = 1, . . . , J, let us consider the set Pj defined by: Pj = {f (i2J−j ), i = 0, . . . , 2j } The set Pj is simply the sub-sampled signal with the step 2J−j . A geometric interpretation of the calculation of cjk is the following: for j ∈ {1, . . . , J − 1}, the set of coefficients {cjk , k = 0, . . . , 2j − 1}, which will determine the GIFS, is obtained as being the set of slopes of the 2j affine functions which enable us to transform the polygon defined by Pj into the polygon defined by Pj+1 . To make the point clearer, an example with J = 3 is proposed in Figure 9.3. The signal samples are represented by dots. For j = 1, 2, the coefficients of the GIFS are: cjk =
ujk uj−1 [k]
∀k = 0, . . . , 2j − 1
2
The estimation procedure of the pointwise Hölder exponent is asfollows: let J−1 i ∈ {0, . . . , 2J − 1}, and (i1 , . . . , iJ−1 ) be the tuple such that i2−J = p=1 ip 2−p . j−1 p Then let us assume that kj = p=0 ij−p 2 and let us consider the sequence j (C )j=1,...,J−1 defined by: C j = − log2 |cjkj | · · · |c2k2 | |c1k1 | We define:
j C˜ j = − log2 |˜ ckj | · · · |˜ c2k2 | |˜ c1k1 |
316
Scaling, Fractals and Wavelets
2
2
u2
u1 1
2
1
u0
u1
u0 0
u0
2
u3
Figure 9.3. Example of the calculation of cjk
The exponent αf (i) is then given as the result of the linear regression of C˜ j with respect to j. Let us observe that if c˜jk = cjk , for any k and j, i.e., if all the GIFS coefficients have their amplitude in ] 12 ; 1[, then we are in the estimation framework of the Hölder exponent from the Schauder coefficients sjk . Indeed, in this case, we have C˜ j = − log2 |sjkj | and it is known [TRIE 78] that the Schauder base characterizes the local regularity of continuous and nowhere differentiable functions (and also more regular functions, under certain conditions [JAF 02b]). We now present some numerical tests on synthetic signals. We compare the three methods described earlier (wavelets, Wigner-Ville and GIFS) applied to the generalized Weierstrass functions: W (t) =
∞
2−ks(t) sin(2k t)
k=0
It is shown in [DAO 98] that its Hölder function is αW (t) = s(t), under certain conditions on s. The choice of these functions is motivated by many reasons: – they are simple to synthesize and it is easy to vary αW (t); – they have already been studied by many authors [GON 92a, TRIC 93] and provide a reliable generic model for different phenomena [JAG 90]; – they are synthesized neither from wavelets nor from GIFS, which makes fair comparisons possible. Figure 9.4 (respectively 9.6) shows the function W (t) on [0; 1] obtained by taking s(t) = t (respectively s(t) = | sin(5πt)|). Figure 9.5 (respectively 9.7) shows its Hölder function estimation using the methods mentioned earlier, the theoretical curve being represented by a thick line. The standard line represents the GIFS estimation, the thin line represents the Schauder base estimation, the dashed line represents the WTMM (Morlet wavelet) estimation. Finally, the mixed line represents the time-scale energy distribution (pseudo-Wigner) estimation.
IFS: Local Regularity Analysis and Multifractal Modeling of Signals 5
4
3
2
1
0
Ŧ1
Ŧ2 0
200
400
600
800
1000
1200
Figure 9.4. Generalized Weierstrass function with s(t) = t
2.5
2
1.5
1
0.5
0
Ŧ0.5 0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Figure 9.5. Estimation of the Hölder function
3
2
1
0
Ŧ1
Ŧ2
Ŧ3 0
200
400
600
800
1000
1200
Figure 9.6. Generalized Weierstrass function with s(t) = | sin(5πt)|
317
318
Scaling, Fractals and Wavelets 2
1.5
1
0.5
0
Ŧ0.5 0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Figure 9.7. Estimation of the Hölder function
Based on these examples, we conclude that the GIFS estimates are more precise than those of other methods. Thus, GIFS are not only capable of prescribing any Hölder function, but they can also provide accurate estimations of these functions in practice, even for very irregular signals. This makes GIFS a well-adapted framework for the analysis of local function regularity, and deserves to be further studied. 9.6. Weak self-similar functions and multifractal formalism With this section we embark upon the second part of the chapter, devoted to multifractal analysis and modeling of signals. Self-similar functions constitute the paradigm of “multifractal signals” (see Chapter 3 or Chapter 4 for the definition and the multifractal properties of these functions). However, in practice, self-similar functions and their immediate extensions are most of the time too rigid to properly model real-world signals, such as speech signals for instance. In what follows, we consider a generalization, the WSA functions, which offers a good compromise between flexibility and complexity in modeling. WSA functions are defined as a generalization of the self-similar functions, in the sense of Jaffard [JAF 02a], where the renormalization factors are allowed us to vary through scales. Formally, DEFINITION 9.3.– A function f : [0, 1] → is said to be weak self-affine if and only if: 1) there exists an open set Ω ⊂ [0, 1] and d (d ≥ 2) contracting similitudes S0 , . . . , Sd−1 with contraction factor d1 , such that: Si (Ω) ⊂ Ω
∀i ∈ {0, . . . , d − 1}
Si (Ω) ∩ Sj (Ω) = ∅
if i = j
IFS: Local Regularity Analysis and Multifractal Modeling of Signals
319
2) there exist d positive sequences (λj0 )j∈N∗ , . . . , (λjd−1 )j∈N∗ satisfying 0 < λji < 1 for any i ∈ {0, . . . , d − 1} and j ∈ N∗ and there exists a continuous function g with compact support, such that f verifies: $ n % ∞ ) j j j i 2j−p λij f (x) = g(x) + p=1 p (9.5) n=1 (i1 ,...,in )∈{0,...,d−1}n j=1 −1 (x) × g Sin ◦ . . . ◦ Si−1 1 where, for any j ≥ 1 and k ∈ {0, . . . , dj − 1}, we have jk = ±1. If there exist d scalar λ0 , . . . , λd−1 such that: jk λji = λi ,
∀i ∈ {0, . . . , d − 1}, ∀j ≥ 1 and ∀k ∈ {0, . . . , dj − 1}
then we recover the traditional self-similar functions, in the sense of [JAF 02a]. The (weak) self-affinity of f is made clear seeing that Definition 9.3 implies that f can be obtained as the limit of the sequence (fj )j∈N , where f0 (x) = g(x) and, for j ≥ 1, fj is recursively calculated as follows: fj (x) =
d−1
ji λji fj−1 Si−1 (x) + g(x)
i=0
The following theorem, proven in [DAO 96], enables us to calculate the multifractal spectrum d(α) of WSA functions. The corresponding theorem for ordinary self-similar functions can be found in [JAF 02a] (see also Chapter 3 in this volume). Let us define, for any j ≥ 1, the d-tuple (uj0 , . . . , ujd−1 ) by: (uj0 , . . . , ujd−1 ) = (λji0 , . . . , λjid−1 ) where (i0 , . . . , id−1 ) is the permutation of (0, . . . , d − 1) which gives: λji0 ≤ · · · ≤ λjid−1 In other words, for any j, (uj0 , . . . , ujd−1 ) is the d-tuple (λj0 , . . . , λjd−1 ) rearranged in ascending order. Then, the following theorem holds. THEOREM 9.2.– Let us suppose that there exist two scalars a > 0 and b > 0 such that, for any i ∈ {0, . . . , d − 1} and j ≥ 1, we have: 0 < a ≤ uji ≤ b < 1
320
Scaling, Fractals and Wavelets
Let us also suppose that:
card j ∈ {1, . . . , n} : uji ≤ xi ; ∀i = 0, . . . , d − 1 p(x0 , . . . , xd−1 ) = lim n n exists for any (x0 , . . . , xd−1 ) ∈ [a; b]d . Finally, let us suppose that g is uniformly more regular than f . We note by d(α) the (Hausdorff) multifractal spectrum of f , i.e., d(α) = dimH {x : α(x) = α}, where α(x) is the Hölder exponent of f at point x. Then: – d(α) = −∞ if α ∈ [αmin ; αmax ] where: ⎧ logd (u1d−1 ) + · · · + logd (und−1 ) ⎪ ⎪ ⎨αmin = lim − n n 1 n ⎪ ⎪ ⎩αmax = lim − logd (u0 ) + . . . + logd (u0 ) n n – if α ∈ [αmin ; αmax ], then: d(α) = inf qα − τ (q) q∈
where: n τ (q) = lim inf −
j=1
logd (λj0 )q + . . . + (λjd−1 )q
n→∞
n
This theorem shows that the multifractal formalism is valid for the WSA functions. Therefore, the Hausdorff multifractal spectrum of large deviations and Legendre transform (see Chapter 1 or Chapter 4 for definitions) coincide, are concave and can be easily calculated. 9.7. Signal representation by WSA functions The method we use to represent a given signal is based on its approximation by one or many WSA functions. In this section, we develop a practical technique to approximate, in the L2 sense, a signal through a WSA function. In practice, only discrete data are available; thus, in what follows, we will suppose that we have a signal {f (m), m = 0, . . . , 2J − 1}. Our purpose is to find the parameters (d, g, (Si )i , (jk )k,j , (λji )i,j ) of the WSA function which provide the best L2 -approximation of f . In its general form, this problem is difficult to solve. However, it is possible to consider a simplified and less general form, for which a solution can be found by using a fast algorithm based on the wavelet decomposition of f . Let us describe this sub-optimal solution. Let φ
IFS: Local Regularity Analysis and Multifractal Modeling of Signals
321
denote the scaling function, ψ the corresponding wavelet and wkn the resulting wavelet coefficients of the signal f : f (x) = a0 φ(x) +
J−1
wkj ψ(2j x − k)
j=0 0≤k<2j
Let us assume that, for any (k, j), we have wkj = 0 and let us define for j ≥ 1: j wk cjk = j−1 w[ k ] 2
Let us assume that Si (x) =
x+i 2
for i = 0, 1. Then, a simple calculation leads to:
f (x) = a0 φ(x) + w00 ψ(x) +
w00
J−1
n sgn w00 w n
p=1 ip 2
n=1 (i1 ,...,in )∈{0,1}n
$ n ) n−p
j=1
% cj j
p=1 ip 2
j−p
(9.6)
× ψ Si−1 ◦ . . . ◦ Si−1 (x) n 1
where sgn(x) denotes the sign function. This latter equality implies the following sub-optimal choice for the parameters: under the additional constraint d = 2, (f (x) − a0 φ(x))/w00 takes a form similar to (9.5) if we assume that g = ψ, j j j−1 Si (x) = x+i 2 for i = 0, 1 and k = sgn(wk /w[ k ] ). In the following, we thus fix the 2
values of d, S0 and S1 as above and try to find the optimal g and (λji ). For now, let us assume that g is known. Hence, our problem is to find, for any j, two positive scalars λj0 and λj1 such that, if we replace all (cj2k ) (respectively (cj2k+1 )) with λj0 (respectively λj1 ) in (9.6), then we obtain the best L2 -approximation of the original signal f . In other words, for a given couple (φ, ψ), we want to find, at each j scale, one scalar which “best represents” the wavelet coefficient ratio of f with even indexes k and the same for the coefficients with odd indexes k. By using a gradient descent in the time-scale wavelet space, we have shown in [DAO 96] that the two sequences (λn0 )n≥1 and (λn1 )n≥1 , which are solutions of the so-simplified inverse problem, are given by: Pkn cn2k+i (9.7) λ1i = c1i and λni = 0≤k<2n−1
for any n > 1 where, for any k ∈ {0, . . . , 2n−1 − 1}: n−1 j n−j cn−1 j=1 λi (k) k n Pk = n−1 j j j 2 2 j=1 |λ0 | + |λ1 |
322
Scaling, Fractals and Wavelets
where the sequence (i1 (k), . . . , in−1 (k)) is the single sequence of {0, 1}n−1 n−1 n−j−1 such that we have k = . Moreover, for any n > 1, we have j=1 ij (k)2 n P = 1. n−1 k 0≤k<2 Unfortunately, such a procedure does not provide us with appropriate representations in practice and the reasons for this can be analyzed as follows: each cjk is defined as the ratio of two wavelet coefficients. While it is assumed that all wkj are non-zero, some arbitrarily small values are likely to exist for most real applications. This yields both very large and small values of cjk . Obviously, such a large variation range is an issue for the proposed modeling, since all the coefficients {cjk , k = 0, . . . , 2j − 1} are replaced by only two values and thus a control on the dispersion of the cjk is needed. However, what draws our interest is the representation of irregular signals (otherwise, a fractal approach would not make sense). As far as irregular signals are concerned, energy is present at most scales, and consequently the majority of the cjk should vary within an intermediate range. Moreover, from a fractal analysis viewpoint, large cjk are not interesting as they do not contribute to the regularity of f . In addition, if we assume that f is nowhere differentiable, then the Hölder exponent in each point is less than 1; thus, “many” cjk , including those which control the multifractal properties of f , will belong to [ 12 , 1] (see [DAO 98] for details). Thus, it appears reasonable to ignore, in our representation, the “large” cjk , and consider only those that are less than 1. More precisely, we keep unchanged the cjk when they are greater than 1 and we calculate (λji ) that yield the best L2 -approximation by considering only the remaining cjk . Evidently, for this strategy to make sense, it is necessary that the cardinal of {cjk : cjk ≥ 1} is negligible as compared with that of {cjk : cjk < 1}. This depends on the nature of the signal and on the choice of g = ψ. These constraints lead us to the following criteria for the sub-optimal choice of the wavelet analysis: (C1) {wkj = 0} = ∅ (C2)
the cardinal of {cjk : cjk < 1} is maximal.
In practice, because of edge artifacts, wavelet decomposition is often limited to a certain scale j0 > 0. As a consequence, the (cjk ) are defined only for j > j0 . In practice, if we write the signal f as: f (x) =
0≤k<2j0
ajk0 φ(2j0 x − k) +
J−1
n=j0
0≤k<2n
wkn ψ(2n x − k)
then the problem at hand is to find, for any j > j0 , two positive scalars λj0 and λj1 such that, when we replace all (cj2k ) satisfying cj2k < 1 (respectively (cj2k+1 ) satisfying
IFS: Local Regularity Analysis and Multifractal Modeling of Signals
323
cj2k+1 < 1) by λj0 (respectively λj1 ), we obtain the best L2 -approximation of the original signal f . The resulting approximate signal f˜ is hence a WSA function defined by: ajk0 φ(2j0 x − k) + wkj0 ψ(2j0 x − k) f˜(x) = 0≤k<2j0
+
0≤k<2j0
J−1
j0 w j0
p=1 ip
n=j0 +1 (i1 ,...,in )∈{0,1}n
$
c˜j j
p=1 ip
j=j0 +1
where: c˜j j
⎧ ⎨cj j
j−p p=1 ip 2
=
⎩λj ij
n n sgn w n−p i 2 p p=1 %
n )
×
2j0 −p
p=1 ip 2
j−p
2j−p
ψ Si−1 ◦ . . . ◦ Si−1 (x) n 1
if cj j
p=1 ip 2
j−p
≥1 (9.8)
otherwise
Let us observe that, for n > j0 and k ∈ {0, . . . , 2n − 1}, the wavelet coefficient w ˜kn of f˜ is given by: w ˜kn = sgn(wkn )w[j0
k 2n−j0
n ) c˜j[ ] j=j0 +1
k 2n−j
]
Since the study is restricted to orthogonal wavelet transforms – that preserve energy – the goal is to find two positive sequences (λn0 )n=j0 +1,...,J−1 and (λn1 )n=j0 +1,...,J−1 that satisfy: argmin
J−1
n=j0 +1
0≤k<2n
|wkn − w ˜kn |2
However, finding this global minimum is a difficult problem. A local minimum can nevertheless be obtained by successively calculating, for n = {j0 + 1, . . . , J − 1}, the pair (λn0 , λn1 ) that satisfies: |wkn − w ˜kn |2 (9.9) argmin 0≤k<2n
The solution to problem (9.9) is given in the following proposition (see [DAO 02] for the proof).
324
Scaling, Fractals and Wavelets
PROPOSITION 9.6.– For i = 0, 1 and n > j0 , the (λni ) solution of (9.9) are recursively given by: λij0 +1
=
0≤k<2j0 j0 +1 |c2k+i |<1
j0 2 j0 +1 w c
0≤k<2j0 j0 +1 |c2k+i |<1
n−1
λni
=
0≤k<2 |cn 2k+i |<1
2k+i
k
j0 2 w
(9.10)
k
2 j0 n−1 j j n w c ˜ c [ 2k+i ] j=j0 +1 [ 2k+i ] [ 2k+i ] c2k+i 2n−j
2n−j0
2n−j
2 2 j0 n−1 j n−1 w 2k+i c ˜ 2k+i 0≤k<2 j=j0 +1 [ [ ] ] |cn 2k+i |<1
2n−j0
(9.11)
2n−j
for n = j0 + 2, . . . , J − 1 Formulae (9.8), (9.10) and (9.11) define the approximation of f by a WSA function. NOTE 9.3.– Obviously, it is possible to develop a similar algorithm for ordinary self-similar functions. However, such an algorithm would restrict the search to a class of functions smaller than that of WSA functions. Therefore, this would yield less precise representations in general. Moreover, since ordinary self-similar functions are a particular case of WSA functions, the proposed algorithm is thus also able to model signals that are properly represented by ordinary self-similar functions. Indeed, λni given by (9.11) would be equal for all n. 9.8. Segmentation of signals by weak self-similar functions In many scenarios, a single WSA function cannot alone represent a signal. A typical example is the concatenation of two IFS. Consider a signal X on [0, 1] whose restrictions to [0, 12 ] and [ 12 , 1] consist of the attractors of two different IFS. The modeling of X by a single WSA function would result in a significant global L2 error, whereas two WSA functions would yield a perfect approximation (using a Schauder basis as wavelet decomposition). It is therefore important to design a procedure that segments a given signal into many parts, each one being appropriately represented by a WSA function. As with the previous section, it is worth mentioning that constructing an optimal algorithm is a difficult task. In what follows, we present a segmentation method which yields good results in practice. Let us consider the root node lives at each level j ∈ with l = 0, . . . , 2j
the dyadic tree of depth J for which, by convention, at level zero and the leaves at level J − 1. The nodes, {0, . . . , J − 1}, are numbered from left to right by (l, j) − 1. The coefficient cjl is associated with each node (l, j)
IFS: Local Regularity Analysis and Multifractal Modeling of Signals
325
such that j > j0 . The segmentation algorithm is based on the fact that the sub-tree extending from node (l, j) completely determines the restriction of f to I(l, j) = {l2J−j , . . . , (l + 1)2J−j − 1}. The aim is to define and measure the error associated with each node (l, j) by the L2 distance between the restriction of the original signal f to I(l, j) and the representation of this restriction by a single WSA function. In order to account for the fact that large scale errors have more impact than small scale ones, errors are weighted (see below). Starting from the root node, which obviously corresponds to a single WSA function for all the signal, we recursively divide the tree until the error falls, for each sub-tree, below a given threshold defined a priori. Each resulting sub-tree is “appropriately” represented by a (single) WSA function and the union of the corresponding I(l, j) defines the segmentation of f . More precisely, the set of integers I(l, j, n) = {l2n−j , . . . , (l + 1)2n−j − 1} is associated with each node (l, j) and each n > j. Let λni (l, j) denote the ratio of sums similar to those in (9.11), but where the indices are determined by (k ∈ I(l, j, n − 1), cn2k+i < 1). Let us denote: eni (l, j) =
ei (l, j) =
1 2n−j
n λi (l, j) − |cn2k+i |2
k∈I(l,j,n−1) cn 2k+i <1
J−1
eni (l, j)
n=j+1
and: e(l, j) =
e0 (l, j) + e1 (l, j) σ(j)
where σ is a positive increasing function. The quantity e(l, j) is the error function and σ is introduced to account for the fact that errors made on coarse scales have more impact than that made on fine scales. Now, the proposed segmentation algorithm can be formulated (see Algorithm 9.1). The result of this algorithm is the segmentation of f into consecutive parts, each being properly represented by a WSA function. From this point of view, this segmentation approach is a new type of tool. Instead of dividing the original signal into homogenous parts according to usual criteria such as local average or fractal dimension, we use a criterion based on multifractal stationarity. Indeed each segment has a well-defined multiplicative structure, with a multifractal spectrum given by Theorem 9.2. As an application of this new segmentation scheme, we will examine in the next section how it enables to estimate non-concave multifractal spectra.
326
Scaling, Fractals and Wavelets
Fix > 0; node = root node; (this is the initialization) function segmentation (node) Begin (l, j) = number of the node; If we have e(l, j) < , then: {f (m), m ∈ I(l, j, J)} is approximated by the weak self-similar function defined by {λni (l, j), n = j + 1, . . . , J − 1, i = 0, 1} Otherwise segmentation (left line of the node); segmentation (right line of the node); End Algorithm 9.1. Segmentation algorithm
NOTE 9.4.– Our algorithm suffers from a weakness: segmentation can only occur at dyadic points, the consequence being an important loss if the “real” segments are not lined up with dyadic points. This difficulty frequently arises when using dyadic wavelets and it can be solved using standard techniques such as non-decimated wavelet transforms. 9.9. Estimation of the multifractal spectrum Representation by means of weak self-similar functions offers a semi-parametric approach to the estimation of the spectrum d(α). At first, f is segmented into homogenous parts by using the algorithm described in the previous section. Each subpart Pi , with i = 1, . . . , p, is represented by a single WSA function Fi whose spectrum di can be calculated using Theorem 9.2. Because the number of segments is finite, the dimension d(α) associated with the Hölder exponent α for any signal is thus a maximum of di (α), with i = 1, . . . , p. Therefore, the semi-parametric estimation of d(α) is: ˆ d(α) = max di (α) i=1,...,p
ˆ Obviously, each di is concave, although, in general, this is not the case for d. ˆ The concatenation example of two IFS shows that the estimated spectrum d coincides exactly with the theoretical spectrum and exhibits two modes, as is characteristic for phase transition. It is important to note that, for a given δ > 0, it is easy to construct two functions f1 and f2 such that ||f1 − f2 ||L2 < δ or ||f1 − f2 ||∞ < δ, yet with very different spectra. Therefore, we cannot draw the conclusion that, in general, the original signal spectrum is close to that of the approximating WSA function. However, based on criteria which allow us to confirm that the physical properties of the original signal and
IFS: Local Regularity Analysis and Multifractal Modeling of Signals
327
the approximating signal are close, then this conclusion can make sense. For example, in the case of speech signals, studied later, an obvious criterion is that of auditory comparison. As far as Internet traffic applications are concerned, the chosen criterion will be the comparison between our estimation of the spectrum and that proposed by other approaches. Indeed, since approaches are qualitatively different and more or less yield equivalent spectra (as we will see later), this gives credit to the quality of the proposed estimation. 9.10. Experiments The first example consists of the representation of the word “welcome” pronounced by a male speaker. The signal contains 215 samples, we assume that j0 = 8 and we use the Daubechies-16 wavelet (as explained before, the choice of wavelet is based on the criteria (C1) and (C2)). In this experiment, as in the following one, we assume that σ(j) = j 2 . By using a threshold = 50, we obtain a representation with seven WSA functions, where 64% of the coefficients are processed (the remaining 36% correspond to the tree levels coarser to j0 or to values of cjk larger than 1). Figure 9.8 shows that the original and the approximating signals are visually almost identical. In a more significant manner, we cannot distinguish the two signals from an auditory comparison, as can be checked at: http://www-rocq.inria.fr/fractales. In addition, the segmentation (see the crosses in Figure 9.8) is phonetically consistent, since it coincides almost perfectly with the sounds: silence, /w/, / l /, silence, /k/, /om/, silence. The slight difference between the positions of the segmentation marks and the exact transition points between phonetic units is due to the fact that, in our actual implantation, the segmentation is restricted to the dyadic points.
Figure 9.8. The word “welcome” pronounced by a male speaker (in black) with its approximation (superimposed in gray) and the segmentation marks (the crosses)
In the second example, we present an application for Internet traffic signals. We use a signal of 512 traffic samples coming out from Berkeley, measured in bytes by time
328
Scaling, Fractals and Wavelets
steps. The analyzed wavelet is Daubechies-4, j0 = 4 and 65% of the coefficients have been processed. With a threshold = 30, we obtain two segments. Figure 9.9 shows the original signal (in black), its approximation (in gray), the segmentation marks (the crosses) and the estimated spectrum on each segment (Figure 9.9b). It is interesting to compare these spectra with those estimated in [LEV 97]: the results are very similar, whereas if we use more segments (or a single segment), a clear difference appears (see Figure 9.10 for the result obtained with four segments). Since the method used here and that of [LEV 97] are very different, the fact that we have found a concordance can imply that this particular signal possesses two parts, each being stationary in a multifractal sense. Such information can be useful for a better understanding of the traffic structure. 6
x 10
4
Singularity Spectrum
1
5
0.8
4
3
0.6
2 0.4
1 0.2
0
0
-1
0
100
200
300
400
500
0
0.2
0.4
0.6 Holder exponent
600
(a)
0.8
1
1.2
(b)
Figure 9.9. (a) Original Internet traffic signal (in black), its approximation (in gray) using two segments and the segmentation marks (the crosses), (b) estimated spectrum for each segment. The estimated spectrum for the whole signal is the superior hull of the two curves 6
x 10
4
Singularity Spectrum
1
5
0.8
4
3
0.6
2 0.4
1 0.2
0
0
-1
0
100
200
300
(a)
400
500
600
0
0.2
0.4
0.6
0.8
1 1.2 Holder exponent
1.4
1.6
1.8
2
(b)
Figure 9.10. (a) Original Internet traffic signal (in black), its approximation (in gray) using four segments and the segmentation marks (the crosses), (b) estimated spectrum for each segment. The estimated spectrum for the whole signal is the convex hull of the two curves
IFS: Local Regularity Analysis and Multifractal Modeling of Signals
329
9.11. Bibliography [ABE 92] A BENDA S., D EMKO S., T URCHETTI G., “Local moments and inverse problem for fractal measures”, Inverse Problems, vol. 8, p. 739–750, 1992. [AND 92] A NDERSSON L.M., “Recursive construction of fractals”, Annales Academiae Scientiarum Fennicae, 1992. [ARB 02] A RBEITER M., PATZSCHKE N., “Random self-similar multifractals”, Advances in Mathematics, 2002. [BAC 91] BACRY E., A RNÉODO A., F RISH U., G AGNE Y., H OPFINGER E., Wavelet Analysis of Fully Developed Turbulence Data and Measurement of Scaling Exponents, Kluwer Academic Publishers, 1991. [BAC 93] BACRY E., M UZY J.F., A RNÉODO A., “Singularity spectrum of fractal signal from wavelet analysis: exact results”, J. Stat. Phys., vol. 70, no. 3-4, p. 635–674, 1993. [BAR 88] BARNSLEY M.F., D EMKO S., E LTON J., G ERONIMO J., “Invariant measures for Markov processes arising from iterated function systems with place-dependent probabilities”, Annales de l’IHP, Probabilité et Statistique, vol. 24, no. 3, p. 367–394, 1988. [BAR 85a] BARNSLEY M.F., Approximation, 1985.
“Fractal
functions
and
interpolation”,
Constructive
[BAR 85b] BARNSLEY M.F., D EMKO S., “Iterated function system and the global construction of fractals”, Proceedings of the Royal Society, vol. A399, p. 243–245, 1985. [BAR 86] BARNSLEY M.F., E RVIN V., H ARDIN D., L ANCASTER J., “Solution of an inverse problem for fractals and other sets”, Proc. Natl. Acad. Sci. USA, vol. 83, 1986. [BAR 93a] BARNSLEY M.F., Fractal Image Compression, A.K. Peters, 1993. [BAR 93b] BARNSLEY M.F., Fractals Everywhere, Academic Press, 1993. [CAB 92] C ABRELLI C.A., F ORTE B., M OLTER U.M., V RSCAY E.R., “Iterated fuzzy set systems: A new approah to the inverse problem for fractal and other sets”, Math. Analysis and Applications, vol. 171, no. 1, p. 79–100, 1992. [CEN 93] C ENTORE P.M., V RSCAY E.R., “Continuity of attractors and invariant measures for iterated functions systems”, Canadian Math. Bull., vol. 37, p. 315–329, 1993. [DAO 96] DAOUDI K., Généralisations des IFS: Applications au Traitement du Signal, PhD Thesis, Paris 9 University, 1996. [DAO 98] DAOUDI K., L ÉVY V ÉHEL J., M EYER Y., “Construction of continuous functions with prescribed local regularity”, Constructive Approximation, vol. 14, no. 3, p. 349–386, 1998. [DAO 02] DAOUDI K., L ÉVY V ÉHEL J., “Signal representation and segmentation based on multifractal stationarity”, Signal Processing, vol. 82, no. 12, p. 2015–2024, 2002. [DUV 92] D UVALL P.F., H USCH L.S., “Attractors of iterated function systems”, Proc. of the Amer. Math. Society, vol. 116, no. 1, 1992. [EVE 92] E VERTSZ C.J.G., M ANDELBROT B.B., “Self-similarity of the harmonic measure on DLA”, Physica A, vol. 185, p. 77–86, 1992.
330
Scaling, Fractals and Wavelets
[FAL 90] FALCONER K.J., Fractal Geometry: Mathematical Foundations and Applications, John Wiley & Sons, 1990. [FAL 94] FALCONER K.J., “The multifractal spectrum of statistically self-similar measures”, Journal of Theoretical Probability, vol. 7, no. 3, p. 681–702, 1994. [FIS 93] F ISHER Y., JACOBS E.W., B OSS R.D., “Fractal image compression using iterated transforms”, in Data Compression, Kluwer Academic Publishers, 1993. [FOR 94] F ORTE B., L O S CHIAVO M., V RSCAY E.R., “Continuity properties of attractors for iterated fuzzy set systems”, J. Australian Math. Soc., vol. B 36, p. 175–193, 1994. [FOR 95] F ORTE B., V RSCAY E.R., “Solving the inverse problem for measures using iterated function systems: a new approach”, Adv. Appl. Prob, vol. 27, p. 800–820, 1995. [FRI 95] F RISCH U., PARISI G., “Fully developed turbulence and intermittency”, in Proceedings of the International Summer School on “Turbulence and Predictability in Geophysical Fluid Dynamics and Climate Dynamics”, p. 84–88, 1985. [GON 92a] G ONÇALVES P., F LANDRIN P., “Bilinear time-scale analysis applied to local scaling exponents estimation”, in Progress in Wavelet Analysis and Application (Toulouse, France), p. 271–276, June 1992. [GON 92b] G ONÇALVES P., F LANDRIN P., “Scaling exponents estimation from time-scale energy distributions”, in Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 1992. [GRA 87] G RAF S., “Statistically self-similar fractals”, Probability Theory and Related Fields, vol. 74, p. 357–392, 1987. [HIR 02] H IRATA T., I MOTO M., “Multifractal analysis of spatial distribution of microearthquakes in the Kanto region”, Geophys. J. Int., vol. 107, p. 155–162, 2002. [HUT 81] H UTCHINSON J., “Fractals and self-similarity”, Indiana University Journal of Mathematics, vol. 30, p. 713–747, 1981. [JAC 93a] JACOBS E.W., B OSS R.D., F ISHER Y., Fractal based image compression, Technical Report, Naval Ocean System Center, 1993. [JAC 93b] JACOBS E.W., B OSS R.D., F ISHER Y., “Image compression: a study of the iterated transform method”, Signal Processing, vol. 29, 1993. [JAF 89] JAFFARD S., “Exposants de Hölder en des points donnés et coefficients d’ondelettes”, Comptes rendus de l’Académie des sciences de Paris, vol. 308, no. 1, p. 79–81, 1989. [JAF 91] JAFFARD S., “Pointwise smoothness, two-microlocalization, coefficients”, Publications Mathématiques, vol. 35, p. 155–168, 1991.
and
wavelet
[JAF 92] JAFFARD S., M EYER Y., Pointwise Regularity of Functions and Wavelet Coefficients, Masson, 1992. [JAF 95] JAFFARD S., “Functions with prescribed Hölder exponent”, Applied and Computational Harmonic Analysis, vol. 2, no. 4, p. 400–401, 1995. [JAF 02a] JAFFARD S., “Multifractal formalism functions: Parts 1 and 2”, SIAM Journal of Mathematical Analysis, 2002.
IFS: Local Regularity Analysis and Multifractal Modeling of Signals
331
[JAF 02b] JAFFARD S., M ANDELBROT B.B., “Local regularity of non-smooth wavelet expansions and applications to the Polya function”, Advances in Mathematics, 2002. [JAG 90] JAGGARD D.L., “On fractal electrodynamics”, in Recent Advances in Electromagnetic Reseach, Springer-Verlag, 1990. [LEV 95] L ÉVY V ÉHEL J., “Fractal approches in signal processing”, Fractals, vol. 3, no. 4, p. 755–775, 1995. [LEV 96] L ÉVY V ÉHEL J., “Introducion to the multifractal analysis of images”, in F ISHER Y. (Ed.), Fractal Image Encoding and Analysis, Springer-Verlag, 1996. [LEV 97] L ÉVY V ÉHEL J., R IEDI R., “Fractional Brownian motion and data traffic modeling: the other end of the spectrum”, in L ÉVY V ÉHEL J., L UTTON E., T RICOT C. (Eds.), Fractals in Engineering, Springer-Verlag, 1997. [LEV 01] L ÉVY V ÉHEL J., “Weakly self-affine functions and applications in signal processing”, Cuadernos del Instituto Matematica Beppo Levi, vol. 30, p. 35–49, 2001. [MAL 92] M ALLAT S., H ANG W.L., “Singularity detection and processing with wavelets”, IEEE Trans. on Information Theory, vol. 38, no. 2, 1992. [MAND 68] M ANDELBROT B.B., “Fractional Brownian motions, fractional noises, and applications”, SIAM Review, vol. 10, no. 4, p. 422–437, 1968. [MAND 72] M ANDELBROT B.B., “Possible refinement of the lognormal hypothesis concerning the distribution of energy dissipation in intermittent turbulence”, in ROSENBLATT M., VAN ATTA C. (Eds.), Statistical Models and Turbulence (La Jolla, California), Springer, Lecture Notes in Physics 12, p. 331–351, 1972. [MAND 74] M ANDELBROT B.B., “Intermittent turbulence in self-similar cascades: Divergence of high moments and dimension of the carrier”, J. Fluid Mech., vol. 62, p. 331–358, 1974. [MAND 89] M ANDELBROT B.B., “A class of multinomial multifractal measures with negative (latent) values for the dimension f (α)”, in P IETRONERO L. (Ed.), Fractals’ Physical Origin and Properties (Erice), Plenum, New York, p. 3–29, 1989. [MAND 91] M ANDELBROT B.B., E VERTSZ C.J.G., “Multifractality of the harmonic measure on fractal aggregates and extended self-similarity”, Physica A, vol. 177, p. 386–393, 1991. [MAND 93] M ANDELBROT B.B., “A fractal’s lacunarity, and how it can be tuned and measured”, Fractals in Biology and Medicine, p. 8–21, 1993. [MANT 89] M ANTICA G., S LOAN A., “Chaotic optimization and the construction of fractals: Solution of an inverse problem”, Complex Systems, vol. 3, p. 37–62, 1989. [MAU 86] M AULDIN R.D., W ILLIAMS S.C., “Random recursive constructions: Asymptotic geometry and topological properties”, Trans. Amer. Math. Soc., vol. 295, p. 325–346, 1986. [MAZ 92] M AZEL D.S., H AYES M.H., “Using iterated function systems to model discrete sequences”, IEEE Trans. on Signal Processing, vol. 40, no. 7, 1992. [MEY 90a] M EYER Y., Ondelettes et opérateurs. Vol. 1: ondelettes, Hermann, 1990. [MEY 90b] M EYER Y., Ondelettes et opérateurs. Vol. 2: opérateurs de Calderon-Zygmund, Hermann, 1990.
332
Scaling, Fractals and Wavelets
[OLS 94] O LSEN L., “Random geometrically graph directed self-similar multifractals”, in Pitman Research Notes in Mathematics, Series 307, 1994. [OLS 02] O LSEN L., “A multifractal formalism”, Advances in Mathematics, 2002. [RIE 94] R IEDI R., “Explicit bounds for the Hausdorff dimension of certain self-similar sets”, in N OVAK M.M. (Ed.), Fractals in the Natural and Applied Sciences, North-Holland, IFIP Transactions, p. 313–324, 1994. [RIE 95] R IEDI R., M ANDELBROT B.B., “Multifractal formalism for infinite multinomial measures”, Advances in Applied Mathematics, vol. 16, p. 132–150, 1995. [TRIC 82] T RICOT C., “Two definitions of fractional dimension”, Math. Proc. Camb. Phil. Soc., vol. 91, p. 54–74, 1982. [TRIC 93] T RICOT C., Courbes et dimension fractale, Springer-Verlag, 1993. [TRIE 78] T RIEBEL H., Spaces of Besov-Hardy-Sobolev Type, Teubner, Texte zur Mathematik, 1978. [VOJ 95] VOJAK R., L ÉVY V ÉHEL J., “Higher order multifractal analysis”, SIAM Journal on Mathematical Analysis, 1995. [VRS 90] V RSCAY E.R., “Moment and collage methods for the inverse problem of fractal construction with iterated function systems”, in Proceedings of the Fractal’90 Conference (Lisbon, Portugal), 1990. [VRS 91a] V RSCAY E.R., “Iterated function systems: Theory, applications, and the inverse problem”, in Fractal Geometry and Analysis, p. 405–468, 1991. [VRS 91b] V RSCAY E.R., W EIL D., “ “Missing moment” and perturbative methods for polynomial iterated function systems”, Physica D, vol. 50, p. 478–492, 1991. [WEI 95] W EIERSTRASS K., “On continuous function of a real argument that do not have a well-defined differential quotient”, Mathematische Werke, p. 71–74, 1895.
Chapter 10
Iterated Function Systems and Applications in Image Processing
10.1. Introduction An iterated function system (IFS) makes it possible to generate fractal images from a set of contracting transformations [BARN 88]. It is also possible to rely on a simple fractal image to determine the parameters of contracting transformations, thereby allowing us to synthesize it (in this case, such a process is referred to as the inverse problem). The first part of this chapter is devoted to the review of some basic concepts necessary for the comprehension of IFS theory. An adaptation of this theory, initially proposed in [JACQ 92], makes it possible to generate non-fractal images of natural scenes from a set of local contracting transformations. The purpose of the second part of this chapter is to introduce this method, which can be used for analyzing or coding images. We then describe the principles of natural image coding by fractals, which consists of the automatic calculation of the transformation parameters that enables us to generate a given image. Finally, we present various solutions which allow us to speed up the automatic calculation of the contracting transformations and to improve the quality of reconstructed images. 10.2. Iterated transformation systems In this section, we recall the main assets of the iterated transformation systems theory, which enables the coding and synthesis of binary fractal images
Chapter written by Franck DAVOINE and Jean-Marc C HASSERY.
334
Scaling, Fractals and Wavelets
and of gray-level images. The concept of IFS is also discussed, from a broader perspective, in Chapter 9 for the generation of the continuous function graphs and the characterization of Hölder functions. 10.2.1. Contracting transformations and iterated transformation systems 10.2.1.1. Lipschitzian transformation Let ω: R2 → R2 be a transformation defined on the metric space (R2 , d). The symbol d indicates the distance between two points of R2 . The transformation ω is called Lipschitzian with strictly real positive Lipschitz factor s, if: (10.1) d ω(x), ω(y) s · d(x, y) ∀x, y ∈ R2 10.2.1.2. Contracting transformation Let ω: R2 → R2 be a transformation defined on the metric space (R2 , d). The transformation ω is said to be contracting with real contraction factor s, 0 < s < 1, if: (10.2) d ω(x), ω(y) s · d(x, y) ∀x, y ∈ R2 10.2.1.3. Fixed point A contracting transformation ω possesses a single fixed point xf ∈ R2 , such that ω(xf ) = xf . Let us note by ω on (·) the application of ω reiterated n times. For any point x element of R2 , the sequence {ω on (x) : n = 0, 1, 2, . . .} converges towards xf : lim ω on (x) = xf
n→∞
∀x ∈ R2
(10.3)
10.2.1.4. Hausdorff distance Let us consider the metric space (R2 , d). The symbol H(R2 ) indicates a space whose elements are the non-empty compact subsets of R2 . The distance [TRI 93] from the point x element of R2 to the set B element of H(R2 ), noted d(x, B), is defined by: d(x, B) = min{d(x, y) : y ∈ B} The distance from the set A element of H(R2 ) to the set B element of H(R2 ), noted d(A, B), is defined by: d(A, B) = max{d(x, B) : x ∈ A}
Iterated Function Systems and Applications in Image Processing
335
The Hausdorff distance between two sets A and B elements of H(R2 ), noted hd (A, B), is defined by: hd (A, B) = max{d(A, B), d(B, A)}
(10.4)
Only when applied to closed and bounded sets – also referred to as compacts – does the Hausdorff distance verify all the properties of a distance (in particular, commutativity). Evidently, it is not to be confused with the concept of the Hausdorff dimension presented in this volume in Chapter 1 and Chapter 3. 10.2.1.5. Contracting transformation on the space H(R2 ) Let ω: R2 → R2 be a contracting transformation defined on the metric space (R , d) with the real s as contraction factor. The transformation ω: H(R2 ) → H(R2 ) defined by: 2
ω(B) = {ω(x) : x ∈ B},
∀B ∈ H(R2 )
(10.5)
is contracting on (H(R2 ), hd ), with contraction factor s. The symbol hd indicates the Hausdorff distance. 10.2.1.6. Iterated transformation system An IFS defined on the complete metric space (R2 , d) is composed of a set of N transformations ωi : R2 → R2 (i = 1, . . . , N ), each of them associated with a Lipschitz factor si . From now on, in this section, it will be considered that the N transformations are contracting: the transformation system is in this case called hyperbolic IFS. The contraction factor of the hyperbolic IFS, noted s, is equal to max{si : i = 1, . . . , N }. 10.2.2. Attractor of an iterated transformation system Let us consider an IFS {R2 ; ωi , i = 1, . . . , N }. It has been demonstrated [BARN 93] that the operator W : H(R2 ) → H(R2 ) defined by: W (B) =
N +
ωi (B),
∀B ∈ H(R2 )
(10.6)
i=1
is contracting and that its contraction factor corresponds with that of the IFS. The operator W possesses a single fixed point At ∈ H(R2 ) given by: At = W (At ) = lim W on (X), n→∞
∀X ∈ H(R2 )
(10.7)
The object At is also called an IFS attractor. It is invariant under the transformation W and is equal to the union of N copies of itself transformed by ω1 , . . . , ωN . This invariant object is called self-similar or “self-affine” when the elementary transformations ωi are affine.
336
Scaling, Fractals and Wavelets
EXAMPLE 10.1.– Let us consider the IFS {R2 ; ωi , i = 1, . . . , 3} composed of the following affine transformations: ⎡ 1 x ⎢2 =⎣ ω1 y 0 ⎡1 x ⎢ ω2 = ⎣2 y 0 ⎡1 x ⎢ = ⎣2 ω3 y 0
⎤ ⎛ ⎞ 0 0 x ⎥ + ⎝ y0 ⎠ ⎦ 1 y 2 2 ⎤ ⎞ ⎛ −x0 0 ⎥ x ⎜ 2 ⎟ ⎦ y + ⎝ −y ⎠ 0 1 2 2 ⎤ ⎛ x ⎞ 0 0 ⎥ x ⎜ 2 ⎟ ⎦ y + ⎝ −y0 ⎠ 1 2 2
(10.8)
Its contraction factor is equal to 0.25. The attractor coded by the IFS, called the Sierpinski triangle, is represented in Figure 10.1. x y ( 20 , y 0 ) w1
x (0,0)
w2
w3
(-x 0 , -y 0 )
Figure 10.1. Attractor of the iterated transformation system of Example 10.1. The initial square, originally centered and with sides of length 2x0 , 2y0 , is transformed into three homothetic squares by contracting transformations ω1 , ω2 and ω3 . This process is then iterated
10.2.3. Collage theorem This theorem, shown in [BARN 93], provides an upper bound to the Hausdorff distance hd between a point A included in H(R2 ) and the attractor At of an IFS. THEOREM 10.1.– We consider the complete metric space (R2 , d). Given a point A belonging to H(R2 ) and an IFS {R2 ; ω1 , ω2 , . . . , ωn } with a real contraction factor
Iterated Function Systems and Applications in Image Processing
337
0 s < 1. The following relation holds: $ N % + 1 hd A, hd (A, At ) ωi (A) 1−s i=1
(10.9)
The theorem shows that if it is possible to transform an object A so as to verify the relation A W (A) while ensuring that W is contracting, then the fixed point At of the operator W is close to A. In this case, the operator W , defined in section 10.2.2, fully characterizes the approximation1 At of object A and exactly codes this latter if A = W (A) [BARN 86]. EXAMPLE 10.2.– Let us consider the application of four contracting transformations to a square noted A. If the resulting four subsquares cover the initial square A exactly, the pasting theorem is satisfied. The attractor of the IFS is therefore a square identical to the square A. Figure 10.2 illustrates the attractor coded by the IFS {R2 ; ωi , i = 1, . . . , 4} composed of the following affine transformations: ⎡1 x ⎢ ω1 = ⎣2 y 0 ⎡1 x ⎢ = ⎣2 ω2 y 0 ⎡1 x ⎢ = ⎣2 ω3 y 0 ⎡1 x ⎢ = ⎣2 ω4 y 0
⎤ ⎛ ⎞ −x0 0 ⎥ x ⎜ 2 ⎟ ⎦ y +⎝ y ⎠ 1 0 2 2 ⎤ ⎛x ⎞ 0 0 ⎥ x ⎜2⎟ ⎦ y + ⎝ y0 ⎠ 1 2 2 ⎤ ⎛ −x ⎞ 0 0 ⎥ x ⎜ 2 ⎟ ⎦ y + ⎝ −y ⎠ 1 0 2 2 ⎤ ⎛ x ⎞ 0 0 ⎥ x ⎜ 2 ⎟ + ⎝ −y ⎠ ⎦ y 1 0 2 2
(10.10)
The iterative process initialized on a circle converges towards the square A. The same result would be obtained by initializing the process on any other form.
1. The more self-similar the object A, the more effective the coding.
338
Scaling, Fractals and Wavelets (x , y ) 0 0
y
w (0,0)
1
w
2
x w
3
w
4
(-x 0 , -y 0 )
Figure 10.2. Attractor of the iterated transformation system of Example 10.2
10.2.4. Finally contracting transformation Let us consider a Lipschitzian transformation ω. If there is an integer n such that the transformation ω on is contracting, then ω is termed finally contracting. The integer n is called a contraction exponent. The operator W , defined by equation (10.6), can be finally contracting even if a limited number of transformations ωi are not contracting. In this case the operator W is not contracting, but it can become so at iteration n since transformation products ωi enter progressively while iterating the transformations of transformations2. Generalized collage theorem Let us consider the finally contracting W with integer contraction exponent n. Then there is a single fixed point xf ∈ R2 such that: xf = W (xf ) = lim W ok (x) k→∞
∀x ∈ R2
In this case: 1 1 − σn hd A, W (A) (10.11) 1−s 1−σ where σ is the Lipschitz factor of W and s the contraction factor of W on [FIS 91, LUN 92]. hd (A, At )
2. For example, W o2 = ω1 ◦ ω1 ∪ ω1 ◦ ω2 ∪ ω2 ◦ ω1 ∪ ω2 ◦ ω2 .
Iterated Function Systems and Applications in Image Processing
339
10.2.5. Attractor and invariant measures This section is a mere introduction of the main concepts for generating fractal objects in gray-levels. For more information see Chapter 9 in [BARN 88]. Given pi a probability associated with each one of the N transformation ωi of an IFS: pi = 1 ∀i, pi 0 and i=1:N
A fractal object in gray-level induces a measure3 μ on its support, that is associated with a Markov operator M of the following form: pi μn−1 ωi−1 (B) . (10.12) μn (B) = Mμn−1 (B) = i=1:N
In this expression, B is a Borel subset of R2 and μn (B) the probability of B at iteration n. It is shown that such an operator M is contracting [BARN 88] (with respect to the Hutchinson metric on the space of measures) and thus there exists a single measure μ called an invariant measure of the IFS that reads: Mμ = μ = lim Mok (μ0 ), k→∞
∀μ0
(10.13)
Moreover, the support of the invariant measure μ is the IFS attractor. Let us now consider the practical case in which the fractal object is a digital image. The normalized value of a pixel B of the image corresponds to the probability of the Borel subset B of R2 . According to (10.12), the value of a pixel B of the image μn is equal to the sum of the values of pixels ωi−1 (B) in μn−1 , multiplied by the probabilities pi . The invariant measure associated with the IFS attractor can also be obtained by iterating the three following operations a great number of times, initialized on an arbitrary point x0 of R2 : – choose a transformation ωi with the probability pi ; – calculate x1 = ωi (x0 ); – replace x0 by x1 . 3. Recall that a measure is, in the physical sense of the term, a measurable quantity (e.g. the light intensity) that allows us to associate weights with the different points of its support (the support of the measure is the set of the points on which it is defined).
340
Scaling, Fractals and Wavelets
w2 p2
w1 p1
w3 p3 Figure 10.3. Calculation of the measure μn
When the number of iterations is sufficiently high, the points are distributed on a compact set R2 defining the attractor of the IFS. The density (frequency of the visits) of each pixel of the attractor defines the invariant measure μ (see [ELT 87] and the “chaos game” in [BARN 88]), whose form is controlled by the set of the predefined probabilities pi . 10.2.6. Inverse problem The definition of the inverse problem can be stated as follows: given an object A belonging to H(R2 ) and a measure μ on A, how can we find the IFS and the set of probabilities pi , for which A is the attractor and μ the invariant measure? Various works have attempted to solve this constrained optimization problem, the difficultly of which lies in its large dimensionality and in the irregularity of the function to be minimized. The proposed solutions use different techniques based on genetic algorithms [LUT 93], wavelets [RIN 94] and other approaches [BARN 86, CAB 92, KRO 92, LEV 90, MAN 89, VRS 90]. 10.3. Application to natural image processing: image coding 10.3.1. Introduction The goal of this section is to introduce the basic methods that make it possible to associate a natural image with a finally contracting transformation W whose attractor is an approximation of the image itself. If this inverse problem is solved, we can
Iterated Function Systems and Applications in Image Processing
341
then talk about image compression, since storing the coefficients of W requires “less information” than storing the original image. It is also referred to as lossy compression, owing to the fact that the attractor constitutes only one approximation of the original image. The compression of an image by fractals relies on a transformation called fractal transformation, which consists of transforming the image by a finally contracting operator, so that its visual aspect remains almost unchanged. For that, the image transformation is made up of N elementary sub-transformations, each one operating on a block of the image, in the following way (see Figure 10.4). The image is partitioned in N blocks rn called destination blocks: A=
N +
rn
(10.14)
n=1
destination block r n
Zn
source block d D (n)
Figure 10.4. Destination blocks rn and source blocks dα(n) . The source block transformed by ωn must resemble the smaller size destination block. The set of destination blocks form a partition of the image
We call R the partition of the image support in destination blocks. Each destination block is then put in correspondence with another transformed block ωn (dα(n) ) that “resembles” it with respect to a gray-level based error measure. The block dα(n) , called a source block, is sought through a library made up of Q blocks belonging to the image: α(n) is thus an application from [1 . . . N ] to [1 . . . Q]. The Q blocks do not necessarily form a partition of the image but are representative of the entire image.
342
Scaling, Fractals and Wavelets
The transformation of image A by W is formulated using the following equation: W (A) =
N +
ωn (dαn ) =
n=1
N +
ˆ rn
(10.15)
n=1
where ˆ rn is the approximation of the destination block rn , obtained by transforming rn is called the source block dαn by ωn (the mapping between block dαn and block ˆ a “collage” operation). The calculation of the transformation parameters ωn and the position of the blocks dαn are detailed in the following section. It should be noted that the problem described here is different from the inverse problem introduced in section 10.2.6 since the spatial transformations considered do not apply to the whole image directly but to subparts of it, as it is not fractal. Moreover, no probability pi is assigned to the transformation ωi defining W . We will see that, among the proposed methods, the Dudbridge method is that which comes closest to it. 10.3.2. Coding of natural images by fractals Jacquin [JACQ 92] proposed in 1989 an approach based on a regular partition R with square geometry. The image is partitioned into square destination blocks4 of fixed size equal to B 2 pixels (B = 8). The algorithm seeks, for each destination block rn ) rn , the source block dα(n) of size D2 (D = 2B) that minimizes the error d(rn , ˆ where ˆ rn is the approximation of rn calculated from the source block dα(n) . The error measure d is given by: 2
rn ) = d(rn , ˆ
B
rnj − rˆnj
2
(10.16)
j=1
where rnj and rˆnj are the pixel values of index j inside the original block rn and the collage block ˆ rn , respectively. The joining operation, called parent collage, is detailed in the following section. 10.3.2.1. Collage of a source block onto a destination block The collage operation of a source block dα(n) onto a destination block rn , realized by the transformation ωn , decomposes into two parts: – a spatial transformation deforms the support of block dα(n) ; – a “mass” transformation acts on the pixel luminance of the deformed dα(n) block.
4. In the formulation of Jacquin, these blocks are called parent blocks.
Iterated Function Systems and Applications in Image Processing
343
These two points are further detailed in this section. Spatial transformation shrinks the block source dα(n) of size D2 to the scale and overlaps it with the destination block rn of size B 2 . The block thus transformed, noted (n) b2 , is obtained by decimating the pixels of the source block: a pixel of coordinates (n) (xi , yj ) in b2 is given by the following equation: (n)
b2 (xi , yj ) =
1' dα(n) (xk , yl ) + dα(n) (xk , yl+1 ) 4
(
(10.17)
+ dα(n) (xk+1 , yl ) + dα(n) (xk+1 , yl+1 ) where (xk , yl ) are the coordinates of a pixel of intensity dα(n) , and belonging to the block dα(n) . (n)
Mass transformation acts on block b2 to approximate the destination block rn . The complexity of this transformation depends on the nature of the block rn under consideration. To do so, Jacquin proposes classifying the square blocks using the method developed by Ramamurthi and Gersho [RAM 86]: all blocks of an image are grouped into three classes; homogenous blocks, textured blocks and blocks with contours (simple and divided). Depending on the class which the destination block rn belongs to, a more or less complex mass transformation is associated with it. This (n) (n) depends on the decimated block b2 and/or on a constant block b1 formed of pixels (n) (n) all equal to one. The block b2 will be associated with a scale coefficient denoted β2 (n) (n) and the block b1 will be associated with a shift coefficient noted β1 . The choice of transformation type depends on the following procedure: – if the block rn is homogenous: absorption of the gray-levels of rn . No search for source blocks dα(n) is carried out. The transformation of rn , coded with Is bits, reads: (n) (n)
ˆ rn = β1 b1 (n)
where the integer β1 lies between 0 and 255; – if the block rn is textured: search for the source block dα(n) , then perform contrast change and apply shifts. The transformation of dα(n) , coded with Im bits, reads: (n) (n)
(n) (n)
ˆ rn = β2 b2 + β1 b1 (n)
(n)
where β2 belongs to the set {0.7, 0.8, 0.9, 1.0} and the integer β1 lies between −255 and 255; – if the block rn contains contours: search for the source block dα(n) , then perform contrast change, apply shifts and discrete isometries ın (rotations of 0, +90, −90 and
344
Scaling, Fractals and Wavelets
+180 degrees, reflections along vertical and horizontal symmetry axes, and reflections along the two diagonal axes). The transformation of dα(n) , coded with Ie bits, reads: (n) (n) (n) (n) ˆ rn = ın β2 b2 + β1 b1 (n)
(n)
where β2 belongs to the set {0.5, 0.6, 0.7, 0.8, 0.9, 1.0} and the integer β1 between −255 and 255.
lies
When the destination block is textured or overlaps with contours, the scale (n) coefficient β2 is calculated so that the standard deviations of the two blocks b2 and rn are equal. It is then rounded to a coefficient belonging to a set of predefined values, all real positive and less than one. The shift coefficient β1 is calculated so that (n) the pixel averages of the two blocks b2 and rn are equal. It is not quantified. The exhaustive search of the source block dα(n) is carried out by shifting over the image support a square block by a step width of δh = δv = 4 pixels in the horizontal and in the vertical directions. When two blocks are compared, each of the eight discrete isometries is considered. For an image of size 256 × 256 (respectively 512 × 512), such research is thus carried out through a library made up of 29,768 (respectively 125,000) source blocks. 10.3.2.2. Hierarchical partitioning In a next step, Jacquin proposes the division of the collage parent blocks ˆ rn into four destination sub-blocks of size 4 × 4 pixels called a child blocks (see Figure 10.5). The obtained blocks are compared with their equivalents in the original image, through the error measure given by formula (10.16), with B = 4. If the error is higher than a given threshold, they are coded separately by seeking the best source block of size 8 × 8 available in the image. The collage process is, in this case, called a child collage. 1 parent no child 1 configuration
child collage p arent collage
1 parent 1 child 4 configurations 1 parent 2 children 6 configurations no parent 4 children 1 configuration
Figure 10.5. Partitioning formed by parents and child blocks
Iterated Function Systems and Applications in Image Processing
345
If, for a parent block, three or four child collages are necessary, only the four child collages are coded. If one or two child collages are necessary, the parent block is coded by parent collage complemented with child collages. 10.3.2.3. Coding of the collage operation on a destination block The storing of the collage of a source block (parent or child) dα(n) on a destination block (parent or child) rn includes: – the index of the source block dα(n) retained among the Q blocks of the library, provided that those are arranged in a block list and that their organization on the image support is known. Otherwise, it is necessary to store the coordinates (xk , yl ) of a reference pixel in block dα(n) (for example, the upper left corner in the case of a square block); – the isometry used during collage (one among eight); – the coefficients β1 and β2 of the mass transformation. This information is associated with each N destination block of the partition R. It is coded on a variable number of bits since it is not always necessary to store the set of the three components. It depends on the mass transformation that is used. 10.3.2.4. Contraction control of the fractal transformation The contraction control of the fractal transformation is a difficult problem. Jacquin shows that the contraction factor of a mass transformation depends on the scale factor (n) β2 . The contraction factor of the other mass transformations (shift, absorption) is equal to 1. The author thus ensures the contraction of the fractal transformation by (n) imposing the condition (β2 )2 < 1 whatever the value of n in [1, . . . , N ]. This too stringent constraint limits the quality of the collage. A more detailed study of the contraction control of the fractal transformation will be presented in section 10.3.3.2 and we shall see in section 10.3.3.3 that it is possible to slightly slacken the constraint while preserving the final contraction of the fractal transformation. 10.3.3. Algebraic formulation of the fractal transformation Immediately following the work by Jacquin, Lundheim proposed an algebraic formulation aimed at facilitating the comprehension of various theoretical and practical problems raised by the extension of IFS theory to the coding of natural images [LUN 92]. Its formulation, applied to the 1D signal, was then extended to the case of 2D signals by Øien [ØIE 93] and Lepsøy [LEP 93]. The formulation that is provided here is used in the simple case of a fractal transformation operating on the source and destination blocks of fixed size and simple geometry (square, rectangular or triangular). The source blocks do not overlap. A
346
Scaling, Fractals and Wavelets
block is seen as a vector by supposing that the pixels which make it up are connected inside the original image. Let us consider an image as a column vector x made up of M 2 pixels. The fractal transformation T of the image x, composed of a linear term L and of a translation vector t, takes on the following form: T x = Lx + t
(10.18)
By specifying this transformation at the level of each destination block of the partition R, equation (10.18) can be written as follows: $ Tx =
N
% Ln x +
n=1
N
tn
(10.19)
i=1
The elementary transformations associated with each matrix Ln and each vector tn are detailed below. 2
2
The transformation associated with the elementary matrix Ln : RM → RM operates on the source vector dα(n) to attach it to the destination vector rn . The source 2 2 and destination vectors belong respectively to RD and RB , where D > B. The matrix Ln reads: (n)
Ln = β2 Pn DFα(n)
(10.20)
(n)
b2
where: 2 2 1) Fα(n) : RM → RD selects a source block dα(n) of size D2 pixels of the image. The isometries can be applied inside the block; 2
2
2) D : RD → RB brings back the source block selected to the size of a destination block by sub-sampling or by averaging pixels. The block thus obtained (n) is called block b2 . It should be thought as the decimated source block described by Jacquin; 2
2
(n)
3) Pn : RB → RM positions the decimated source block b2 destination block rn and cancels the other pixels of the image.
on the
A column vector Ln x is primarily composed of zeros, except for the part corresponding to the considered destination block of index n. The matrix L given by formula (10.21) is thus composed of sub-matrices associated with each block rn of
Iterated Function Systems and Applications in Image Processing
347
the image and of zeros elsewhere: ⎡
... ⎢ L = ⎣ ... (N ) β2 D
⎤ ... (n) ⎥ β2 D ⎦ ...
(1)
β2 D ... ...
(10.21)
(n)
The vertical position of a sub-matrix β2 D in the matrix L corresponds to the index n of the destination block considered. The horizontal position corresponds to the position of the source block which is associated with it. The elementary translation vector tn reads: (n)
(n)
tn = β1 Pn b1 (n)
(10.22) (n)
where β1 is a real coefficient. The constant block b1 of size B 2 pixels, not (n) stemming from image x, is composed of pixels equal to (b1 = [1, 1, . . . , 1]T ). The vector t operating on the whole image is written as follows: ' (T (1) (1) (1) (2) (2) (2) (N ) (N ) (N ) t = β1 , β1 , . . . , β1 , β1 , β1 , . . . , β1 , β1 , β1 , . . . , β1
(10.23)
In short, the fractal transformation T of image x reads as follows: Tx =
N
(n) (n) β2 Pn b2
x+
n=1
N
(n)
(n)
β1 Pn b1
(10.24)
n=1
10.3.3.1. Formulation of the mass transformation In what follows, and for the sake of clarity, the operator Pn and index n will be omitted in the expressions of fractal transformation T . Thanks to this simplified writing, the approximation of the block r, noted ˆ r, is provided by a linear combination of two blocks, among which block d is extracted from the image itself. It reads: ˆ r = β2 b2 + β1 b1
(10.25)
The resulting transformation is that first proposed by Jacquin in 1989. It is of course possible to elaborate expression (10.25) so as to improve the approximation of
348
Scaling, Fractals and Wavelets constant block (n)
β1
b1
rn
source block dα (n)
destination block
β2 decimated source block (n) b2 isometry decimation
Figure 10.6. Fractal transformation of the image
the destination block. In a more general way, the approximation of r is expressed as follows: ˆ r = β2 b2 + β1 b1 +
K
βi bi
(10.26)
i=3
The blocks bi are constant and known by the coder and the decoder. This type of expression is in particular used in [GHA 93, MON 94].
β1
β3
block b 1
block b 3
β4 block b 4
β2
block b 2
block r
Figure 10.7. Adjunction of fixed vectors for the approximation of a destination block
Vines [VIN 93] proposes another diagram based around square destination blocks and decimated source blocks, of size equal to 8 × 8 pixels. It builds an orthonormal basis made up of the three fixed vectors illustrated in Figure 10.7, and of 61 other vectors obtained from the decimated source vectors of the image. A destination block is then approximated by a linear combination of some basis vectors. The number of vectors considered depends on the complexity of the destination block.
Iterated Function Systems and Applications in Image Processing
349
10.3.3.2. Contraction control of the fractal transformation The Lipschitz factor s of the affine operator T (equation (10.24)) is equal to the norm of the matrix L and thus to the square root of the largest eigenvalue of LT L, if the L2 norm is considered. Based on this remark, Lundheim defines the sufficient conditions which ensure the contraction of the operator T [LUN 92], by considering that the source blocks do not overlap: if the collages derive from sub-sampling the square source blocks, then s reads: = > (n) 2 > β2 (10.27) s = ? max l=1:Q
α(n)=l
where Q is the number of source blocks used. The sum above encompasses the scale (n) coefficients β2 associated with the set of destination blocks rn which depend on the source block dα(n) . If the collages result from averaging the pixels of the source blocks, s reads: = > (n) 2 >B β2 (10.28) s = ? max D l=1:Q α(n)=l
2
Ei ri
d E2k
E2j rj
rk Figure 10.8. Illustration of equations (10.27) and (10.28): for each source (n) block dα(n) , we calculate the sum of the scale coefficients β2 associated with the destination blocks rn which depend on dα(n)
Equation (10.28) shows that the Lipschitz factor of the operator T is reduced by B when the pixels of the source blocks are averaged. It depends on the scale a factor D (n) coefficients β2 but also on the size difference of the compared blocks. Hence, the “spatial contraction” of the blocks influences, in this case, the contraction factor of T .
350
Scaling, Fractals and Wavelets
10.3.3.3. Fisher formulation Fisher [FIS 95a] described the collage operation of a source block onto a destination block using a unique formula: ⎛ ⎞ ⎛ ⎞⎛ ⎞ ⎛ ⎞ an bn en 0 x x 0 ⎠ ⎝y ⎠ + ⎝ fn ⎠ (10.29) ω n ⎝ y ⎠ = ⎝ cn d n (n) (n) z z β1 0 0 β2 where (x, y) are the coordinates of a pixel pertaining to the source block dα(n) and z is its gray-level. The symbols an , bn , cn , dn , en and fn are the coefficients of the affine spatial transformation which brings back the pixels of source block dα(n) inside the (n) (n) destination block rn . The quantities β1 and β2 are the transformation coefficients of the pixels’ gray-level. Whereas the mass transformation coefficients used by Jacquin are chosen from a predefined set of values, Fisher uses a uniform scalar quantifier. Jacobs et al. showed that for this type of quantifier, the quantification of the translation coefficients and of the scale coefficients, with seven and five bits, respectively, is optimal in terms of visual quality of the rebuilt images [JACO 92]. Optimal coefficients calculation of the mass transformation For a destination block r, the approximation ˆ r is given by: ˆ r = β2 b2 + β1 b1 The calculation of the “mass” transformation coefficients is an optimization 2 problem within a linear subspace X of the vector space RB . The purpose is to find, for a destination block r and for a decimated source block b2 , the optimal coefficients β1 and β2 minimizing the least square distance d between r and its collage ˆ r. THEOREM 10.2 (P ROJECTION).– The optimal approximation in the L2 norm of a 2 r element of X which vector r element of RB , in the linear subspace X is the vector ˆ yields the residual vector r − ˆ r orthogonal to all vectors spanning the subspace X. Let us consider, the mean square error (MSE) between two vectors x and y in the 2 space RB , defined by the following expression: 2
Err (x, y) =
B j=1
(xj − yj )2
∀x, y ∈ RB
2
Iterated Function Systems and Applications in Image Processing
351
Determining the two optimal coefficients β1 and β2 leading to the best approximation ˆ r of vector r in the basis b1 , b2 , amounts to canceling the two mean square errors: Err (r − β1 b1 − β2 b2 , b1 ) = 0 Err (r − β1 b1 − β2 b2 , b2 ) = 0 The optimal coefficients β1 and β2 thus calculated are not constrained and the contraction of the collage operator is not guaranteed. Jacobs et al. empirically show that thresholding the magnitude of the β2 coefficient to 1.5 ensures the final convergence of the fractal transformation. Hürtgen proposes a detailed study of the contraction control of the fractal transformation by considering particular cases based on square partitioning [HURT 93a, HURT 94] and on the spectral radius associated with the linear term L introduced by Lundheim (see equation (10.18)). 10.3.4. Experimentation on triangular partitions It has been shown in section 10.3 that the fractal coding of a natural image consisted of approximating each area of the image from other areas of the same image, by means of local transformations. The image is thus partitioned into N blocks rn , forming a partition R. Each block is then put in correspondence with another block dn of the image, from which it is possible to approximate, by an elementary transformation ωn , the gray-level function of rn . The operator W defining a fractal transformation of the image is composed of the N transformations ωn . It is finally contracting provided that a sufficient number of transformations ωn are contracting. Fractal transformation coding requires the coefficients of the N transformations ωn to be stored. It is thus all the more effective that the partition R contains a reduced number of blocks. The first works of Jacquin, mentioned in section 10.3, showed the advantage of using square regular partitions to build the operator W . However, they did not reach large compression rates because the partition R entailed too large a number of blocks rn . This issue could be circumvented by calculating the fractal transformation on square or rectangular partitionings [FIS 95a] adapted to the image contents (quadtree, HV). We shall now present an approach that consists of calculating the fractal transformation of the image on a triangular partitioning, which is flexible and adapted to the contents of the image. Various algorithms [CHAS 93, DAVO 97] allow us to build such a triangulation.
352
Scaling, Fractals and Wavelets
Dk
Rk
Di,j
Rj
Ri
Figure 10.9. Fractal transformation calculation. The left hand partition (D) encloses the source blocks and the right hand partition (R) contains the destination blocks r
Let us assume that the triangulation R (adapted to the image contents) has N destination blocks ri and that the (regular) triangulation D contains Q source blocks (see Figure 10.9). The algorithm consists of associating each block ri with the block dj that minimizes the error d between the gray-level function of block ri and that of block dj transformed by ω. The mass transformation to perform the collage of the source block dj onto the destination block ri is the same as the one proposed by Jacquin and Fisher. It uses (i) (i) only two coefficients: the shift coefficient β1 and the scale coefficient β2 . The decoding algorithm amounts to an iteration of operator W , after the partitions R and D were rebuilt, starting from an arbitrary image f0 . After k iterations of operator W , the gray-level fk (xi , yi ) of a pixel in block r reads: (10.30) fk (xi , yi ) = β2 fk−1 v −1 (xi , yi ) + β1 ∀(xi , yi ) ∈ r In practice, the result converges towards the reconstructed image, an attractor of the fractal transformation, after five to ten iterations (see Figure 10.10). Primarily, the number of iterations depends on the surface ratio between the blocks of partition D and those of partition R. As the number of iterations increases, the collage of a block dα(n) (covering several blocks r) onto its corresponding block rn reduces the size of the details within blocks of partition R. 10.3.5. Coding and decoding acceleration 10.3.5.1. Coding simplification suppressing the research for similarities Dudbridge proposed in 1995 [DUD 95b] a fast fractal compression method for images, based on a regular square partitioning. The speed of this compression algorithm is due to the fact that no search for a similar interblock is made. The
Iterated Function Systems and Applications in Image Processing
353
Figure 10.10. Decoding of Lena image 512 × 512. MSE At iteration 15: Tc = 11.2 : 1, PSNR = −10 log10 255 2 = 32.29 dB
image is partitioned into a set of square fixed size blocks, and each block is coded individually by a fractal transformation. According to the author, the method gives less efficient results than, for example, Jacquin’s method. The reasons for this will be explained at the end of the section.
354
Scaling, Fractals and Wavelets
Coding An image5 is coded using a set of contracting spatial transformations (IFS) {ω1 , . . . , ωN } defined on R2 , associated with a contracting transformation G acting on the pixels’ luminance. At resolution m, the square support of the IFS transformed image reads: A=
N + k=1
ωk (A) =
N + k1 =1
···
N + km =1
ωk1 ◦ · · · ◦ ωkm (A)
(10.31)
Ak1 ...km
It is noteworthy that the spatial transformation is applied to A and not to a subpart of A as it is the case in the traditional approach of coding defined by Jacquin. The quantity p = Ak1 ...km denotes an “element” of the image support at resolution m, which may contain several pixels of the original image. At the maximum resolution, the size of element p is equal to that of an image pixel. The set Pm = {Ak1 ...km ; k1 , . . . , km = 1, . . . , N } contains all the elements of the image at resolution m. In the following section, we will consider that the IFS is composed of N = 4 affine transformations defined by equations (10.10). In these conditions, equation (10.31) is illustrated in Figure 10.11.
Figure 10.11. The square image A is divided into four square elements by four affine contracting transformations ωk1 (k1 = 1 . . . 4). In the center, Ak1 = ωk1 (A) corresponds to one of the four elements of the image at resolution 1. On the right, Ak1 k2 = ωk1 ◦ ωk2 (A) corresponds to one of the 16 elements of the image at resolution 2
5. Within this section, the term image stands for a square block resulting from the regular partitioning of the original image.
Iterated Function Systems and Applications in Image Processing
The transformation G abides by the following equation [DUD 95b]: Gf (p) = (ak1 x + bk1 y + tk1 ) dx dy + sk1 v(p)
355
(10.32)
p
in which the function f : Pm → R gives the gray-level of element p and v(p) (p) (see is the sum of the gray-levels of the elements included in the block ωk−1 1 Figure 10.12): v(p) =
N
f (Ak2 ...km i ).
i=1
A1111
A1112
A1114
A1113
v (A 1441 )
-1 1
w (A 1441 )
Figure 10.12. Example for k1 = 1 (left upper quadrant), m = 4 and N = 4: v (A1441 ) = 4i=1 f (A441i ). In this particular case, the size of element p = A1441 corresponds to that of an image pixel
In contrast to the mass transformation of Jacquin, which contains only one scale factor and one shift factor on the gray-levels, equation (10.32) contains two additional coefficients, related to the position (x, y) in the image of the element to be approximated. Dudbridge demonstrates that the transformation G is finally contracting at all N resolutions m if | k1 =1 sk1 | is smaller than 1, with respect to the Euclidean distance.
356
Scaling, Fractals and Wavelets
The calculation of the coefficients ak1 , bk1 , tk1 and sk1 for all k1 of the set [1 . . . N ] is carried out so as to minimize the least squares error at resolution m between the original image f and its G-transform. This way, for the collage theorem to be satisfied, it suffices to minimize for all k1 in set [1 . . . N ], the following function: 2 (ak1 x + bk1 y + tk1 ) dx dy + sk1 v(p) − f (p) (10.33) p∈ωk1 (Pm−1 )
p
The N summations are performed on the sub-block k1 of the original image, noted ωk1 (Pm−1 ) (recall that this is one of the four quadrants of the original image). The sum v(p) depends on the resolution m of approximation of the luminance of element p. The minimization of function (10.33) amounts to solving the following system of equations [DUD 95b]: ⎡ 2 ⎤ x x y x 1 v(p) x⎥ ⎢ p p p p p p ⎥ ⎢ p p p p ⎢ ⎥⎡ ⎤ 2 ⎥ ⎢ ak1 ⎢ x y y y 1 v(p) y ⎥ ⎥ ⎢ bk ⎥ p p p p p ⎥⎢ ⎢ p p p p p ⎢ ⎥⎣ 1⎥ ⎢ 2 ⎥ tk 1 ⎦ ⎢ ⎥ ⎢ x 1 y 1 1 v(p) 1 ⎥ sk1 ⎢ ⎥ ⎢ p p p p p p p p p p ⎢ 2 ⎥ ⎦ ⎣ v(p) v(p) x v(p) y v(p) 1 p
p
⎡
p
p
p
p
p
⎤
f (p) x ⎥ ⎢ p ⎢ p ⎥ ⎥ ⎢ ⎢ f (p) y ⎥ ⎥ ⎢ ⎢ p p ⎥ = ⎢ ⎥ ⎥ ⎢ ⎢ f (p) 1 ⎥ ⎢ p ⎥ p ⎥ ⎢ ⎦ ⎣ f (p)v(p) p
An image (a square block of the original image partition) is then coded by a sequence of 4 × 4 real coefficients. Decoding The decoding algorithm allows for a fast and non-iterative reconstruction of the invariant function g associated with the operator G. It is only necessary to know the coefficients ak1 , bk1 , tk1 and sk1 associated with each of the N spatial transformations ωk1 .
Iterated Function Systems and Applications in Image Processing
357
Dudbridge demonstrates that the gray-level sum, noted gk1 , of the sub-elements included in the element Ak1 can be decomposed as follows [DUD 95b, MON 95a]: x dx dy + bk1 y dx dy gk1 = ak1 Ak 1
Ak 1
+ tk 1
Ak 1
1 dx dy + sk1
N
(10.34) gk
k=1
and that, consequently, the sum of the gray-levels of the N elements Ak1 reads: # # N # N a x + b y + t 1 k Ak k Ak k Ak k=1 gk = (10.35) N 1 − k=1 sk k=1 Similarly, gk1 k2 stands for the gray-level sum of the sub-elements of Ak1 k2 and reads: x dx dy + bk2 y dx dy gk1 k2 = ak1 Ak 1 k 2
Ak 1 k 2
+ tk2 with
N
k=1 gk2 k
Ak 1 k 2
dx dy + sk2
N
(10.36) gk2 k
k=1
= gk 2 .
The decoding procedure decomposes as follows: N – the sum k=1 gk is directly calculated from the coefficients of G (formula (10.35)). The result is equal to the gray-level sum of the pixels in the original image; – according to (10.34), gk1 (k1 = 1 . . . N ) is a function of the variables ak1 , bk1 , N tk1 , sk1 and of the value k=1 gk previously calculated; – according to (10.36), gk1 k2 is a function of the variables ak2 , bk2 , tk2 , sk2 and of the previously calculated gk2 ; – etc. This way, each element of the invariant function g at resolution m can be recursively calculated. The reconstruction procedure is not iterative, in contrast to most algorithms of fractal decoding. The method presented in this section, approximates the luminance function f in each square block of an original image partition. Each block is approximated by an invariant function g, independently of the rest of the image. At a given resolution m, the approximation is performed in the least squares sense, using an IFS associated
358
Scaling, Fractals and Wavelets g1
g3
g2
g 12
g 22
g 32
g 42
g4
Figure 10.13. Illustration of Dudbridge algorithm for decoding at resolution m = 3, a 8 × 8 image. The four values gk1 k2 (k1 = 1 . . . 4) at resolution 2 depend on the value gk2 at resolution 1 and on their respective position in the image
with a finally contracting transformation G in the luminance space. The expression of G (equation (10.32)) is comparable with that of the mass transformation suggested by Jacquin, since it also relies on a scale factor sk1 and on a shift factor tk1 . It also contains two additional coefficients ak1 and bk1 which act on the coordinates of the approximated elements inside the block: a first order approximation, in that case. Equation (10.32) shares similarities with equation (10.26): the coefficients ak1 and bk1 define weights associated with two inclined planes, in the luminance space. The reason why this method is not as efficient as the basic diagram of Jacquin is because a block is approximated from itself and not from another block of the image. Then, the transformation G must be sufficiently complex to achieve a good block approximation. That is why Dudbridge added two more parameters to the mass transformation expression. However, saving these extra parameters results in a lower compression ratio. Nonetheless, the method has the advantage of being very fast. Moreover, the coding-decoding algorithm is evenly balanced regarding calculation time. Initially, Dudbridge presented this coding technique in his thesis in 1992 [DUD 92]. Since then, the approach has been generalized on square blocks issued from a quadtree partitioning [DUD 95a]. The additional weights of the terms x2 , y 2 , x3 and y 3 in the expression of G, were studied by Monro et al. [MON 93a, MON 93b, MON 94, WOO 94]. The authors also extended this approach to the compression of video sequences [WIL 94]. 10.3.5.2. Decoding simplification by collage space orthogonalization We now consider that vector ˆ r, an approximation of the destination vector r, reads: ˆ r = β2 b2 + β1 b1
(10.37)
The vector ˆ r belongs to the vector subspace spanned by b1 and b2 , where b1 is a constant basis vector (b1 = [1, 1, . . . , 1]T ) and b2 a vector extracted from the image to be coded. Each vector ˆ r, r, b1 and b2 is of dimension B 2 .
Iterated Function Systems and Applications in Image Processing
359
De-correlation of the mass transformation coefficients Øien proposed in [ØIE 93] a solution designed to accelerate the decompression phase by orthogonalization of the basis vectors b1 and b2 that span the collage vector subspace. The second advantage of this approach is that we do not have to impose constraints on the scale coefficients of the fractal transformation. In signal processing applications, such as data compression, the handled vectors are generally represented by a linear combination of orthogonal functions. The most usual example is that of the Fourier transform where the functions are complex exponential. We can resort to the Gram-Schmidt procedure to orthogonalize the decimated block b2 with respect to the constant basis vector b1 of the vectorial subspace. This procedure amounts to multiply the vector b2 by the orthogonalizing matrix O = I − b1 b1 T , where I is the identity matrix of dimension B 2 × B 2 . The ˜ 2 reads: orthogonalized vector b ˜ 2 = Ob2 b Practically, this amounts to removing the b2 component that belongs to the subspace spanned by vector b1 , and thus to force the mean value of block b2 to zero. The new collage, now performed in the vectorial subspace spanned by the ˜ 2 , becomes: orthogonal vectors b1 and b ˜ 2 + α1 b1 ˆ ro = α2 b In this case, the coefficients αi are independent of each other. Øien and Lepsøy show in [ØIE 94b] that it is not necessary to formally orthogonalize the vectors to calculate the coefficients α1 and α2 . They are directly obtained from the expressions: 2
B 1 α1 = #r, b1 $ = 2 rj B j=1
and
α2 =
#r, b2 $ − α1 #b1 , b2 $
b2 2 − #b1 , b2 $2
(10.38)
in which rj and bj are the pixels inside blocks r and b2 , respectively. The coefficients α1 and α2 are related to the initial coefficients β1 and β2 through the relations: β2 = α2
and
β1 = α1 − #b1 , b2 $β2
Øien and Lepsøy show in [ØIE 94b] that with the orthogonalization of vectors b1 and b2 prior to the coding phase, it becomes possible to warrant the exact convergence of the decoder in a finite number of iterations.
360
Scaling, Fractals and Wavelets
10.3.5.3. Coding acceleration: search for the nearest neighbor In [SAU 95], Saupe proposes a fractal coding procedure of complexity in O(log Q), where Q is the number of source blocks, which is worthy comparing with the usual schemes whose complexity is in O(Q). The traditional procedure, using an affine transformation in the luminance spaces, searches for each destination block 2 2 r ∈ RB , of the Q source block b2 ∈ RB that minimizes: E(ˆ r, r) = min r − β1 b1 − β2 b2 2 β1 ,β2
Calculation of the optimal coefficients β1 and β2 and of the error is costly in terms of calculation time. 2
Saupe considers the orthonormal basis of the vector subspace of RB , spanned by the normalized vectors b1 (b1 = B1 (1, . . . , 1)) and φ(b2 ). The symbol φ represents the projection operator making φ(b2 ) orthonormal to b1 . It is shown, in this case [SAU 95], that the error E(ˆ r, r) is proportional to an increasing monotonic function of distance D given by: D(ˆ r, r) = min d φ(b2 ), φ(r) , d −φ(b2 ), φ(r) Minimizing E(ˆ r, r) amounts to minimizing D(ˆ r, r) and thus, to seeking for the closest neighbor of φ(r) among the 2Q vectors ±φ(b2 ). Different fast search algorithms of the nearest neighbor are proposed in the literature. In [FRI 77], the authors build a tree of dimension B 2 and define a search method of complexity in O(log Q). The results presented in [SAU 95] show that it is possible to gain a factor 1.3 to 11.5 over the compression time, without degrading the quality of the reconstructed image significantly. The gain depends of course on the nature of the image and on the number of source blocks considered. 10.3.6. Other optimization diagrams: hybrid methods Seminal work by Jacquin, based on the use of a local iterated contracting functions system, launched a great deal of research into the approximation or coding of real 1D, 2D and 3D signals by fractals [FIS 95a, JACQ 93, SAU 94, WOH 99]. These mainly concern: – the construction of an optimal partition to calculate the fractal transformation [DAVO 97, FIS 95b, FIS 95c, HURT 93c, NOV 93, REU 94, THO 95]. They are composed of square, rectangular or polygonal surface blocks locally adapted to the texture of the images (see Figure 10.14); – the acceleration of the coding algorithm (see [DAVO 96, DUD 92, LEP 93, TRU 00]);
Iterated Function Systems and Applications in Image Processing
361
– the use of constant vectors issued from a known dictionary, to approximate the destination blocks from the source blocks [GHA 93, VIN 93]; – the use of non-affine elementary functions ωn that allow for coding the spatial redundancy of the images [LIN 94, POP 97]; – the acceleration of the decoding algorithm: iterative, non-iterative, hierarchical [BARA 95, CHAN 00, ØIE 93, ØIE 94a]; – the theoretical study of the fractal transformation [VRS 99] and of the convergence of the decoder [HURT 93a, HURT 94, LUN 92, MUK 00]; – the extension of the method to the coding of video sequences [BART 95, BEA 91, BOG 94, FIS 94, HURD 92, HURT 93b, LAZ 94, MON 95b, WIL 94]; – the use of fractals in hybrid coding-decoding schemes, either based on a discrete cosine transform [BART 94a, BART 94b, BART 95b] or on a wavelet transform of the image [BEL 98, DAVI 98, KRU 95, RIN 95, SIM 95, WALLE 96]. The fractal code is, in this case, calculated on another representation of the original image, which can be more favorable to the search of local similarities.
Figure 10.14. Illustration of different partitionings: quadtree (square), HV (rectangular) and Voronoï (polygonal), used for the search of local similarities in the image and for the calculation of the fractal code
Figure 10.15 compares the compression performances using fractal coding on square, rectangular and polygonal partitioning with those of a normalized JPEG compression [WALLA 91]. It highlights the fact that beyond a reasonable compression rate, compression by fractals outperforms the JPEG compression, at least in terms of visual quality of the images.
362
Scaling, Fractals and Wavelets
38 JPEG Delaunay fractal Voronoi fractal HV fractal Quadtree 1 fractal Quadtree 2 fractal
Reconstruction SNR
36 34 32 30 28 26 10
20
30
40 50 60 70 Compression rate
80
90
100
Figure 10.15. Signal to noise ratios versus the compression rate, calculated on the 512 × 512 Lena image. These ratios compare the quality of the reconstructed images, after fractal compression on blocks of variable geometries, and after a normalized JPEG compressor
10.4. Bibliography [BARA 95] BARAHAV Z., M ALAH D., K ARNIN E., “Hierarchical interpretation of fractal image coding and its applications”, in F ISHER Y. (Ed.), Fractal Image Compression: Theory and Application to Digital Images, Springer-Verlag, New York, p. 91–117, 1995. [BARN 86] BARNSLEY M.F., E RVIN V., H ARDIN D., L ANCASTER J., “Solution of an inverse problem for fractals and other sets”, Proc. Natl. Acad. Sci. USA, vol. 83, p. 1975–1977, 1986. [BARN 88] BARNSLEY M.F., Fractal Everywhere, Academic Press, New York, 1988. [BARN 93] BARNSLEY M.F., H URD L.P., Fractal Image Compression, A.K. Peters, Wellesley, 1993. [BART 94a] BARTHEL K.U., S CHÜTTEMEYER J., VOYÉ T., N OLL P., “A new image coding technique unifying fractal and transform coding”, in IEEE International Conference on Image Processing (Austin, Texas), p. 112–116, November 1994. [BART 94b] BARTHEL K.U., VOYÉ T., “Adaptive fractal image coding in the frequency domain”, in Proceedings of International Workshop on Image Processing: Theory, Methodology, Systems, and Applications (Budapest, Hungary), June 1994.
Iterated Function Systems and Applications in Image Processing
363
[BART 95] BARTHEL K.U., VOYÉ T., “Three-dimensional fractal video coding”, in ICIP (Washington DC, USA), vol. 3, p. 260–263, 1995. [BART 95b] BARTHEL K.U., “Entropy constrained fractal image coding”, Fractals, in NATO ASI on Fractal Image Coding, Trondheim, Norway, July 1995. [BEA 91] B EAUMONT J.M., “Image data compression using fractal techniques”, BT Technol. J., vol. 9, no. 4, p. 93–109, 1991. [BEL 98] B ELLOULATA K., BASKURT A., B ENOIT-C ATIN H., P ROST R., “Fractal coding of subbands with an oriented partition”, Signal Processing: Image Communication, vol. 12, 1998. [BOG 94] B OGDAN A., “Multiscale (inter/intra frame) fractal video coding”, in Proceedings of the IEEE International Conference on Image Processing (ICIP’94, Austin, Texas), November 1994. [CAB 92] C ABRELLI C.A., F ORTE B., M OLTER U.M., V RSCAY E.R., “Iterated fuzzy set systems: A new approach to the inverse problem for fractals and other sets”, Journal of Mathematical Analysis and Applications, vol. 171, no. 1, p. 79–100, 1992. [CHAN 00] C HANG H.T., K UO C.J., “Iteration-free fractal image coding based on efficient domain pool design”, IEEE Transactions on Image Processing, vol. 9, no. 3, p. 329–339, 2000. [CHAS 93] C HASSERY J.M., DAVOINE F., B ERTIN E., “Compression fractale par partitionnements de Delaunay”, in Quatorzième colloque GRETSI (Juan-les-Pins, France), vol. 2, p. 819–822, 1993. [DAVI 98] DAVIS G., “A wavelet-based analysis of fractal image compression”, IEEE Transactions on Image Processing, vol. 7, p. 141–154, 1998. [DAVO 96] DAVOINE F., A NTONINI M., C HASSERY J.M., BARLAUD M., “Fractal image compression based on Delaunay triangulation and vector quantization”, IEEE Transactions on Image Processing: Special Issue on Vector Quantization, February 1996. [DAVO 97] DAVOINE F., ROBERT G., C HASSERY J.M., “How to improve pixel-based fractal image coding with adaptive partitions”, in L ÉVY V ÉHEL J., L UTTON E., T RICOT C. (Eds.), Fractals in Engineering, Springer-Verlag, p. 292–307, 1997. [DUD 92] D UDBRIDGE F., Image approximation by self-affine fractals, PhD Thesis, University of London, 1992. [DUD 95a] D UDBRIDGE F., “Fast image coding by a hierarchical fractal construction”, University of California, San Diego, 1995. [DUD 95b] D UDBRIDGE F., “Least-squares block coding by fractal functions”, in F ISHER Y. (Ed.), Fractal Image Compression: Theory and Application to Digital Images, Springer-Verlag, New York, p. 229–241, 1995. [ELT 87] E LTON J.H., “An ergodic theorem for iterated maps”, Ergodic Theory and Dynamical Systems, vol. 7, p. 481–488, 1987. [FIS 91] F ISHER Y., JACOBS E.W., B OSS R.D., Iterated transform image compression, Technical Report 1408, Naval Ocean Systems Center, San Diego, California, April 1991.
364
Scaling, Fractals and Wavelets
[FIS 94] F ISHER Y., ROGOVIN D., S HEN T.P., “Fractal (self-VQ) encoding of video sequences”, in Proceedings of the SPIE: Visual Communications and Image Processing (Chicago, Illinois), September 1994. [FIS 95a] F ISHER Y. (Ed.), Fractal Image Compression: Theory and Application to Digital Images, Springer-Verlag, New York, 1995. [FIS 95b] F ISHER Y., “Fractal image compression with Quadtrees”, in F ISHER Y. (Ed.), Fractal Image Compression: Theory and Application to Digital Images, Springer-Verlag, New York, p. 55–77, 1995. [FIS 95c] F ISHER Y., M ENLOVE S., “Fractal encoding with HV partitions”, in F ISHER Y. (Ed.), Fractal Image Compression: Theory and Application to Digital Images, Springer-Verlag, New York, p. 119–136, 1995. [FRI 77] F RIEDMAN J.H., F INKEL J.L., “An algorithm for finding best matches in logarithmic expected time”, ACM Trans. Math. Software, vol. 3, no. 3, p. 209–226, 1977. [GHA 93] G HARAVI -A LKHANSARI M., H UANG T.S., “A fractal-based image block-coding algorithm”, in Proceedings of ICASSP, p. 345–348, 1993. [HURD 92] H URD L.P., G USTAVUS M.A., BARNSLEY M.F., “Fractal video compression”, in Compcon Spring. Conference 37, p. 41–42, 1992. [HURT 93a] H ÜRTGEN B., “Contractivity of fractal transforms for image coding”, Electronics Letters, vol. 29, no. 20, p. 1749–1750, 1993. [HURT 93b] H ÜRTGEN B., B ÜTTGEN P., “Fractal approach to low rate video coding”, in Proceedings of SPIE, vol. 2094, p. 120–131, 1993. [HURT 93c] H ÜRTGEN B., M ÜLLER F., S TILLER C., “Adaptive fractal coding of still pictures”, in Proceedings of the Picture Coding Symposium, 1993. [HURT 94] H ÜRTGEN B., H AIN T., “On the convergence of fractal transforms”, in Proceedings of ICASSP, p. 561–564, 1994. [JACO 92] JACOBS E.W., F ISHER Y., B OSS R.D., “Image compression: a study of the iterated transform method”, Signal Processing, vol. 29, p. 251–263, 1992. [JACQ 92] JACQUIN A.E., “Image coding based on a fractal theory of iterated contractive image transformations”, IEEE Transactions on Image Processing, vol. 1, no. 1, p. 18–30, 1992. [JACQ 93] JACQUIN A.E., “Fractal image coding: a review”, Proceedings of the IEEE, vol. 81, no. 10, p. 1451–1465, 1993. [KRO 92] K ROPATSCH W.G., N EUHAUSSER M.A., L EITGEB I.J., B ISCHOF H., “Combining pyramidal and fractal image coding”, in Proceedings of the Eleventh ICPR (The Hague, Netherlands), vol. 3, p. 61–64, 1992. [KRU 95] K RUPNIK H., M ALAH D., K ARNIN E., “Fractal representation of images via the discrete wavelet transform”, in IEEE Eighteenth Conference of EE in Israel (Tel Aviv), March 1995. [LAZ 94] L AZAR M.S., B RUTON L.T., “Fractal block coding of digital video”, IEEE Transactions on Circuits and Systems for Video Technology, vol. 4, no. 3, p. 297–308, 1994.
Iterated Function Systems and Applications in Image Processing
365
[LEP 93] L EPSØY S., Attractor image compression – Fast algorithms and comparisons to related techniques, PhD Thesis, Norwegian Institute of Technology, Trondheim, June 1993. [LEV 90] L ÉVY V ÉHEL J., G AGALOWICZ A., Fractal approximation of 2-D object, Technical Report 1187, INRIA, Rocquencourt, France, 1990. [LIN 94] L IN H., V ENETSANOPOULOS A.N., “Incorporating nonlinear contractive functions into the fractal coding”, in Proceedings of the International Workshop on Intelligent Signal Processing and Communication Systems (Seoul, Korea), p. 169–172, October 1994. [LUN 92] L UNDHEIM L., Fractal signal modelling for source coding, PhD Thesis, Norwegian institute of technology, Trondheim, September 1992. [LUT 93] L UTTON E., L ÉVY V ÉHEL J., “Optimization of fractal functions using genetic algorithms”, in Fractal’93 (London, Great Britain), Springer, 1993. [MAN 89] M ANTICA G., S LOAN A., “Chaotic optimization and the construction of fractals: Solution of an inverse problem”, Complex Systems, vol. 3, p. 37–62, 1989. [MON 93a] M ONRO D.M., “Class of fractal transforms”, Electronics Letters, vol. 29, no. 4, p. 362–363, 1993. [MON 93b] M ONRO D.M., “Fractal transforms: Complexity versus fidelity”, in V ERNAZZA G., V ENETSANOPOULOS A.N., B RACCINI C. (Eds.), Image Processing: Theory and Applications, Elsevier Science Publishers, p. 45–48, 1993. [MON 94] M ONRO D.M., W OOLLEY S.J., “Fractal image compression without searching”, in Proceedings of ICASSP, vol. 5, p. 557–560, 1994. [MON 95a] M ONRO D.M., D UDBRIDGE F., “Rendering algorithms for deterministic fractals”, IEEE Computer Graphics and Applications, p. 32–41, January 1995. [MON 95b] M ONRO D.M., N ICHOLLS J.A., “Low bit rate colour fractal video”, in ICIP (Washington DC, USA), vol. 3, p. 264–267, 1995. [MUK 00] M UKHERJEE J., K UMAR P., G HOSH S.K., “A graph-theoretic approach for studying the convergence of fractal encoding algorithm”, IEEE Transactions on Image Processing, vol. 9, no. 3, p. 366–377, 2000. [NOV 93] N OVAK M., Attractor coding of images, PhD Thesis, Department of Electrical Engineering, Linköping University, 1993. [ØIE 93] Ø IEN G.E., L2-optimal attractor image coding with fast decoder convergence, PhD Thesis, Norwegian Institute of Technology, Trondheim, April 1993. [ØIE 94a] ØIE 94a Ø IEN G.E., BAHARAV Z., L EPSØY S., K ARNIN E., M ALAH D., “A new improved collage theorem with applications to multiresolution fractal image coding”, in International Conference on Accoustics, Speech, and Signal Processing, 1994. [ØIE 94b] Ø IEN G.E., L EPSØY S., “Fractal-based image coding with fast decoder convergence”, Signal Processing, vol. 40, p. 105–117, 1994. [POP 97] P OPESCU D.C., D IMCA A., YAN H., “A nonlinear model for fractal image coding”, IEEE Transactions on Image Processing, vol. 6, no. 3, 1997. [RAM 86] R AMAMURTHI B., G ERSHO A., “Classified vector quantization of images”, IEEE Transactions on Communications, vol. 34, no. 11, p. 1105–1115, 1986.
366
Scaling, Fractals and Wavelets
[REU 94] R EUSENS E., “Partitioning complexity issue for iterated functions systems based image coding”, in Proceedings of the Seventh European Signal Processing Conference (Edinburgh, Scotland), vol. 1, p. 171–174, September 1994. [RIN 94] R INALDO R., Z AKHOR A., “Inverse and approximation problem for two-dimensional fractal sets”, IEEE Transactions on Image Processing, vol. 3, no. 6, p. 802–820, 1994. [RIN 95] R INALDO R., C ALVAGNO G., “Image coding by block prediction of multiresolution subimages”, IEEE Transactions on Image Processing, p. 909–920, July 1995. [SAU 94] S AUPE D., H AMZAOUI R., “A review of the fractal image compression literature”, Computer Graphics, vol. 28, no. 4, p. 268–276, 1994. [SAU 95] S AUPE D., “Accelerating fractal image compression by multi-dimensional nearest neighbor search”, in S TORER J.A., C OHN M. (Eds.), Proceedings of the Data Compression Conference (DCC’95, Institute for Information Technology, Freiburg University), IEEE Computer Society Press, March 1995. [SIM 95] S IMON B., “Explicit link between local fractal transform and multiresolution transform”, in ICIP (Washington DC, USA), vol. 1, p. 278–281, 1995. [THO 95] T HOMAS L., D ERAVI F., “Region-based fractal image compression using heuristic search”, IEEE Transactions on Image Processing, vol. 4, no. 6, p. 832–838, 1995. [TRI 93] T RICOT C., Courbes et dimension fractale, Springer-Verlag, 1993. [TRU 00] T RUONG T.K., J ENG J.H., R EED I.S., L EE P.C., L I A.Q., “A fast encoding algorithm for fractal image compression using the DCT inner product”, IEEE Transactions on Image Processing, vol. 9, no. 4, p. 529–535, 2000. [VIN 93] V INES G., Signal modelling with iterated function systems, PhD Thesis, Georgia Institute of Technology, May 1993. [VRS 90] V RSCAY E.R., “Moment and collage methods for the inverse problem of fractal construction with iterated function systems”, in Fractal’90 conference, June 1990. [VRS 99] V RSCAY E.R., S AUPE D., “Can one break the collage barrier in fractal image coding?”, in Fractals in Engineering, Springer-Verlag, 1999. [WALLA 91] WALLACE G.K., “The JPEG still picture Communications of the ACM, vol. 34, no. 4, p. 30–44, 1991.
compression
standard”,
[WALLE 96] VAN DE WALLE A., “Merging fractal image compression and wavelet transform methods”, Fractals, 1996. [WIL 94] W ILSON D.L., N ICHOLLS J.A., M ONRO D.M., “Rate buffered fractal video”, in Proceedings of the ICASSP, vol. V, p. 505–508, 1994. [WOH 99] W OHLBERG B., DE JAGER G., “A review of the fractal image coding literature”, IEEE Transactions on Image Processing, vol. 8, no. 12, p. 1716–1729, 1999. [WOO 94] W OOLLEY S.J., M ONRO D.M., “Rate-distortion performance of fractal transforms for image compression”, Fractals, vol. 2, no. 6, p. 395–398, 1994.
Chapter 11
Local Regularity and Multifractal Methods for Image and Signal Analysis
11.1. Introduction In this chapter, we shall review some of the important and recent applications of local regularity and multifractal analysis to signal/image processing. Obviously, we will not aim at a complete coverage of the field, which would require a book of its own: (multi)fractal processing of signals and images is indeed now present in numerous applications. Rather, we will concentrate on a few topics, and try to explain in a very concrete manner how tools developed for the study of irregular functions may be applied to solve typical signal processing problems. This chapter is organized as follows: in section 11.2, we briefly recall the notions that will be used in what is to come, namely regularity exponents and multifractal spectra. For more details on these, see Chapters 1 and 3. In section 11.3, we explain how to estimate regularity exponents on numerical data and compare various methods to do so (we do not tackle the problem of multifractal spectrum estimation, which will not be needed here. See Chapters 1 and 3 for more information on this). Section 11.4 gives a detailed explanation of how to use fractal tools to perform signal and image denoising. We first recall a traditional wavelet-based denoising method, and explain why it is not adapted to processing irregular signals. We then
Chapter written by Pierrick L EGRAND.
368
Scaling, Fractals and Wavelets
present three different methods based on Hölder exponents and large deviation multifractal spectra that give good results on signals such as fractal functions, road profiles and SAR (radar) images. Section 11.5 explains how Hölder exponents may be used to perform data interpolation: the idea is to refine the resolution in such a way that local regularity is preserved at each point. Again, this approach is well adapted to processing irregular signals and images, as we show in examples. Section 11.6 gives an account of the remarkable applications of fractal tools to ECG analysis: links between the condition of the heart and some features of the multifractal spectrum of its ECG, relation between RR signals and their local regularity, etc. In section 11.7, we briefly describe an application of multifractal analysis to texture classification, and describe an example of well logs. Section 11.8 is devoted to the presentation of an image segmentation method based on characterizing edges through multifractal analysis. The issue of change detection in a sequence of images is dealt with in section 11.9. As in the contour segmentation application, the idea is to characterize relevant changes through their signatures in the multifractal spectrum. As a final image processing application, we describe in section 11.10 a method for reconstructing an image from a specific subset of pixels selected through multifractal analysis. To end this introduction, we should also mention that many of the methods described in this chapter are implemented in the free software toolbox FracLab [FracLab]. 11.2. Basic tools 11.2.1. Hölder regularity analysis This section focuses on the Hölder characterizations of regularity. To simplify notations, we assume that our signals are nowhere differentiable. Generalization to other signals simply requires the introduction of polynomials in the definitions (see Chapters 1 and 3).
Local Regularity and Multifractal Methods for Image and Signal Analysis
369
DEFINITION 11.1 (Pointwise Hölder exponent).– Let α ∈ (0, 1), and x0 ∈ K ⊂ R. A function f : K → R is in Cxα0 if, for all x in a neighborhood of x0 , |f (x) − f (x0 )| ≤ c|x − x0 |α
(11.1)
where c is a constant. The pointwise Hölder exponent of f at x0 , denoted αp (x0 ), is the supremum of the α for which the equation (11.1) holds. Let us now introduce the local Hölder exponent: let α ∈ (0, 1), Ω ⊂ R. We say that f ∈ Clα (Ω) if: ∃ C : ∀x, y ∈ Ω :
|f (x) − f (y)| ≤C |x − y|α
Let: αl (f, x0 , ρ) = sup {α : f ∈ Clα (B (x0 , ρ))} and notice that αl (f, x0 , ρ) is non-increasing as a function of ρ. We may thus set the following definition: DEFINITION 11.2.– Let f be a continuous function. The local Hölder exponent of f at x0 is the real number: αl (x0 ) = αl (f, x0 ) = lim αl (f, x0 , ρ) ρ→0
11.2.2. Reminders on multifractal analysis We briefly state in this section some basic facts about multifractal analysis. Multifractal analysis is concerned with the study of the regularity structure of functions or processes, both from a local and global perspective. More precisely, we start by measuring in some way the pointwise regularity, usually with some kind of Hölder exponent. The second step is to give a global description of this regularity. This can be done either in a geometric fashion using Hausdorff dimension, or in a statistical manner using a large deviation analysis. Formally, let X(t), t ∈ I ⊂ R be a deterministic function or a stochastic process on a probability space (Ω, F, P). For ease of notation, we shall assume without loss of generality that I = [0, 1]. We define the following functions (these are random functions in general when X is itself random). 11.2.2.1. Hausdorff multifractal spectrum To simplify notations, set α(t) = αp (t). The Hausdorff spectrum describes the structure of the function t → α(t) by evaluating the size of its level sets. More precisely, let: Tα = {t ∈ I, α(t) = α}
370
Scaling, Fractals and Wavelets
The Hausdorff multifractal spectrum is the function: fh (α) = dimH (Tα ) where dimH (T ) denotes the Hausdorff dimension of the set T . 11.2.2.2. Large deviation multifractal spectrum Let: Nnε (α) = #{k : α − ε ≤ αnk ≤ α + ε} where αnk is the coarse-grained exponent corresponding to the dyadic interval Ink = [k2−n , (k + 1)2−n ], i.e.: αnk =
log |Ynk | − log n
Here, Ynk is a quantity that measures the variation of X in the interval Ink . The choice Ynk := X ((k + 1)2−n ) − X (k2−n ) leads to the simplest analytical calculations. Another possibility is to set: Ynk = oscX (Ink ), i.e. the oscillation of X inside Ink . A third choice is to take Ynk to be the wavelet coefficient xn,k of X at scale n and location k (note that, in this case, the spectrum will depend on the chosen wavelet). The large deviation spectrum fg (α) is defined as follows: fg (α) = lim lim inf ε→0 n→∞
log Nnε (α) log n
Note that, whatever the choice of Ynk , fg always ranges in R+ ∪ −{∞}. The intuitive meaning of fg is as follows. For n large enough, one has roughly: Pn (αnk α) 2−n(1−fg (α)) where Pn denotes the uniform distribution over {0, 1, . . . , 2n − 1}. Thus, for all α such that fg (α) < 1, 1 − fg (α) measures the exponential decay rate of the probability of finding an interval Ink with coarse-grained exponent equal to α, when n tends to infinity. When X is a stochastic process, fg is in general a random function. In applications, it is convenient to consider in this case a “deterministic version” of fg , defined as follows. log πn (α) . Fg (α) = 1 + lim lim sup ε→0 n→∞ log(n) where: πn (α) := P × Pn [αnk ∈ (α − ε, α + ε)] and unlike fg , Fg may assume non-trivial negative values.
Local Regularity and Multifractal Methods for Image and Signal Analysis
371
11.2.2.3. Legendre multifractal spectrum It is natural to interpret the spectrum fg as a rate function in a large deviation principle (LDP). Large deviations theorems provide conditions under which such rate functions may be calculated as the Legendre transform of a limiting moment generating function. When applicable, this procedure provides a more robust estimation of fg than a direct calculation. Define, for q ∈ R: Sn (q) =
n−1
|Ynk |q
k=0
with the convention 0q := 0 for all q ∈ R. Let: τ (q) = lim inf n→∞
log Sn (q) − log(n)
The Legendre multifractal spectrum of X is defined as (τ ∗ denotes the Legendre transform of τ ): fl (α) := τ ∗ (α) = inf (qα − τ (q)). q∈R
Being defined through a Legendre transform fl is a concave function. The two spectra fg and fl are related as follows. Define the sequence of random variables Zn := log |Ynk | where the randomness is through a choice of k uniformly in the set {0, . . . , n − 1}. Consider the corresponding moment generating functions: cn (q) := −
log En [exp(qZn )] log(n)
where En denotes expectation with respect to Pn . A version of Gärtner-Ellis theorem ensures that if lim cn (q) exists (in which case it equals 1 + τ (q)), and is differentiable, then c∗ = fg − 1. We then say that the weak multifractal formalism fg = fl holds. 11.3. Hölderian regularity estimation 11.3.1. Oscillations (OSC) The most natural way to estimate regularity is to use the “oscillation” method. This method is a direct application of the definition of the Hölder exponent (see [TRI 95] for more on this topic).
372
Scaling, Fractals and Wavelets
As seen above, a function f (t) is Hölderian with exponent α ∈ (0, 1) at t if there exists a constant c such that for all t in a neighborhood of t, |f (t) − f (t )| ≤ c|t − t |α In terms of oscillations, this condition may be written as: ∃c, ∀τ oscτ (t) ≤ cτ α where oscτ (t) = sup|t−t |≤τ f (t ) − inf |t−t |≤τ f (t ) = supt ,t ∈[t−τ,t+τ ] |f (t ) − f (t )|. The estimator is then simply defined as the slope in the least-square linear regression of the logarithms of the oscillations versus the logarithms of the size τ of balls used to calculate the oscillations. 11.3.2. Wavelet coefficient regression (W CR) A method using wavelet coefficients is described in this section. It relies on a theorem by S. Jaffard. This theorem shows how we can estimate the regularity at the point t0 using the wavelet coefficients (provided the wavelets verify some regularity properties [JAF 04]). THEOREM 11.1 (S. Jaffard).– Let f be a uniformly Hölderian function and α the pointwise Hölder exponent of f at t0 . Then there exists a constant c such that the wavelet coefficients verify: 1
|cj,k | ≤ c2−j(α+ 2 ) (1 + |2j t0 − k|)α
∀j, k ∈ Z2 1
Conversely, if ∀j, k ∈ Z2 we obtain |cj,k | ≤ c2−j(α+ 2 ) (1 + |2j t0 − k|)α for a α < α then the Hölder exponent of f at t0 is α. From this theorem, a traditional local regularity estimator is obtained if we consider only the indexes j, k such that |k − 2j t0 | < cste. This amounts to making the assumption that the local and pointwise Hölder exponent of f at t0 coincide [LEV 04b]. Under this hypothesis, an estimator is obtained through the regression slope p of log2 |cj,k | versus j. More precisely, at each point t0 of a signal decomposed on n scales we estimate the regularity by: n 1 sj log2 |cj,k | α ˜ (n, t0 ) = − − Kn 2 j=1 12 with Kn = n(n−1)(n+1) and sj = j − t0 +1 “above” t0 , i.e. k = 2n−j+1 .
n+1 2 .
(11.2)
The cj,k are the wavelet coefficients
11.3.3. Wavelet leaders regression (W L) This method is very similar to the previous one, but often provides better results. For more information on wavelet leaders see [JAF 04].
Local Regularity and Multifractal Methods for Image and Signal Analysis
A dyadic cube at the scale j is given by: λ = interval becomes a cube in Rd .
!k
k+1 2j , 2j
373
. In d dimensions, this
DEFINITION 11.3.– The wavelet leaders are dλ = supλ ⊂λ |cλ |. λj (t0 ) is the dyadic cube of size 2−j at scale j containing the point t0 . Let dj (t0 ) = supλ ∈adj(λj (t0 )) |cλ | with adj(λj (t0 )) the set of dyadic cubes adjacents to λj (t0 ). PROPOSITION 11.1 (S. Jaffard).– If f ∈ C α (t0 ), then ∃c > 0, ∀j ≥ 0,
1
dj (t0 ) ≤ c2−(α+ 2 )j
(11.3)
Conversely, if equation (11.3) holds, and if f is uniform Hölderian, then f ∈ C α (t0 ). From this theorem and with the simplification adj(λj (t0 )) = λj (t0 ) only, the new estimator is determined. At each point t0 of the signal X decomposed on n scales, the regularity is estimated by the following formula: n 1 sj log2 max (|xλ |) α ˜ W L (n, t0 ) = − − Kn λ ⊂λ 2 j=1 11.3.4. Limit inf and limit sup regressions The definition of the Hölder exponent makes use of a lower limit, which allows the exponent to exist without conditions. The three estimators presented above, however, calculate the exponent through a linear regression. Typically, they will not converge to the true exponent when the resolution tends to infinity if the expression defining the exponent does not converge, i.e. when the lower limit is not a plain limit. Indeed, if the upper and lower limits are different, the slope given by a linear regression has no relevance. However, as shown in [LEG 04b, LEV 04a], it is still possible to use regressions to obtain the exponent. The method is general, as it applies to the estimation of upper and lower limits through a modified regression scheme, that we proceed to explain now. The use of these liminf and limsup regression methods is of great practical importance, as it allows us to estimate, on arbitrary signals, various fractal quantities such as dimensions, exponents, and multifractal spectra. Note that, even for fractal signals, the Hölder exponents are often obtained as genuine lower limits (i.e. the limit does not exist).
374
Scaling, Fractals and Wavelets l
Let (lj )j≥1 be an arbitrary sequence of real numbers, and denote uj = jj . Let a = lim inf j→∞ uj . In our framework, think for instance of lj as the logarithm of the sizes of balls used in the oscillation calculations. Define, for all n ≥ 1: Vn0 = {1, . . . , n}
L0n = {l1 , . . . , ln }
0 Let (a0n , b0n ) be the parameters of the least square n linear regression of Ln with respect to Vn0 , i.e. the real numbers that minimize j=1 (lj − aj − b)2 over all couples (a, b). We write: 0 0 an , bn = Reg Vn0 , L0n
Now let: Vn1 = {j ∈ Vn0 , lj ≤ a0n j + b0n }, L1n = {lj , j ∈ Vn1 },
(a1n , b1n ) = Reg(Vn1 , L1n )
and define recursively: i−1 Vni = {j ∈ Vni−1 , lj ≤ ai−1 n j + bn },
Lin = {lj , j ∈ Vni },
(ain , bin ) = Reg(Vni , Lin )
for all i = 2, . . . Nn , where Nn is defined as the first index such that #VnNn +1 < 2. The geometric interpretation of the sequence (ain , bin ) is simple: In the first step, we keep in (Vn1 , L1n ) those points that are “below” the regression line of L0n with respect to Vn0 . We then calculate the regression line of L1n with respect to Vn1 to obtain (a1n , b1n ), and iterate the process until at most one point remains below the regression line. n The slope of the liminf regression is then defined as aN n . (the method is similar for the limsup, just keep the point above the regression line). In many cases of interest, n aN n will tend to a when n tends to infinity.
11.3.5. Numerical experiments In this section we compare the three methods W CR, W L and OSC on different kinds of signals. For more experiments, see [LEG 04b]. Figure 11.1 represents a generalized Weierstrass function with a regularity H(t) = t and the estimations of the Hölder function by regression of the wavelet coefficients (W CR), by wavelet leaders (W L) and by oscillation (OSC). This experiment shows that the best results are obtained by the W L method in this case.
Local Regularity and Multifractal Methods for Image and Signal Analysis 5
1.2
4
1
3
0.8
2
0.6
1
0.4
0
0.2
−1
0
−2
0
500
1000
1500
2000
2500
3000
3500
4000
4500
−0.2 0
500
1000
1500
(a) 1.2
1
1
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0
0
500
1000
1500
2000
(c)
2500
3000
3500
4000
4500
2500
3000
3500
4000
4500
(b)
1.2
−0.2 0
2000
375
2500
3000
3500
4000
4500
−0.2 0
500
1000
1500
2000
(d)
Figure 11.1. (a) Generalized Weierstrass function (4096 points), regularity h(t) = t; (b) regularity estimation by W CR; (c) regularity estimation by W L; (d) regularity estimation by OSC
The second comparison deals with multifractional Brownian motion (MBM). MBM is an extension of fractional Brownian motion where the local regularity may be controlled. See Chapter 6 and [PEL 95, AYA 99, AYA 00b, AYA 00a]. For the experiment, 10 MBM with a regularity evolving like a sine function are built. The three estimation methods are applied to each signal and the results are displayed in Figure 11.2 (mean and variance). We see that, in this case, the oscillation-based method provides the best results both in terms of bias and variance. In conclusion, the methods described in this section generally provide decent estimates of the Hölderian regularity. Nevertheless, there is no “best” estimator between them. The estimation quality depends on the signal class.
376
Scaling, Fractals and Wavelets 2
2
1.5
1.5
1
1
0.5
0.5
0
0
−0.5
0
500
1000
1500
2000
2500
3000
3500
−0.5
4000
0
500
1000
(a)
1500
2000
2500
3000
3500
4000
(b) 2
1.5
1
0.5
0
−0.5
0
500
1000
1500
2000
2500
3000
3500
4000
(c)
Figure 11.2. Estimation of the regularity of a set of ten MBM with a regularity evolving like a sine function. The three estimation methods are tested. For each method, 10 Hölder functions are thus obtained. The empirical mean and the variance on these 10 functions are then calculated. Abscissa: time. Ordinates: mean estimated regularity (white), and error bars corresponding to two times the standard deviation on each side (gray). The theoretical regularity is displayed in black. (a) W CR method; (b) W L method; (c) OSC method
11.4. Denoising 11.4.1. Introduction Signal/image denoising is an important task in many areas including biology, medicine, astronomy, geophysics, and many more. For such applications and others, it is important to denoise the observed data in such a way that the features of interest to the practitioner are preserved. The basic framework is as follows. We observe a signal (or an image) Y which is a combination F (X, B) of the signal of interest X and a “noise” B. Making various assumptions on the noise, the structure of X and
Local Regularity and Multifractal Methods for Image and Signal Analysis
377
@ of the original the function F , we then try to derive a method to obtain an estimate X image which is in some sense optimal. F usually amounts to convolving X with a low pass filter and adding noise. Assumptions on X are almost always related to its regularity, e.g. X is supposed to be piecewise C n for some n ≥ 1. In this section, B is assumed to be independent of X, white, Gaussian and centered. 11.4.2. Minimax risk, optimal convergence rate and adaptivity A useful way to compare denoising methods is to analyze their convergence properties. In this section, we recall some basic facts (see [HAR 98] for more details). DEFINITION 11.4.– The minimax risk in LP is given by ˆ n − X||pp Rn (V, p) = inf sup E||X @ n ∈E X∈V X
where E is the set of measurable estimators and V a ball in a functional space. 1
DEFINITION 11.5.– rn Rn (V, p) p is called the optimal convergence rate or minimax convergence rate on the class V for the risk Lp . We say that the estimator @n − X p Rn (V, p). @n of X reaches the optimal convergence rate if supX∈V E X X p Typical function spaces that are considered in this framework are the so-called Besov Spaces (for a complete description of Besov spaces see [PEE 76, POP 88]). s and that the Lp loss is used. Suppose for instance that X belongs to a ball in Br,q Then, we can show that: sn • If r ≥ p (homogenous area) the optimal rate is 2− 2s+1 . sn p ≤ r ≤ p (intermediate area) the optimal rate is 2− 2s+1 and • If 2s+1 (s− r1 + p1 )n − 2(s− 1 + 1 )+1 r p 2 for linear estimators. (s− r1 + p1 ) 1 1 2 p (sparse area), the optimal rate is (n2−n ) (s− r + p )+1 for non-linear • If r ≤ 2s+1 estimators. In the following sections, the L2 loss is used, and as a consequence, there is no sparse zone. The corresponding optimal convergence rates are as follows. Non-linear estimator sn 1 r ≥ 2 Rn (V, 2) 2 = 2− 2s+1 1
r < 2 Rn (V, 2) 2 = 2− 2s+1 sn
Linear estimator sn 1 Rnlin (V, 2) 2 = 2− 2s+1 (s− r1 + 12 )n − 1 1 1 2 Rnlin (V, 2) 2 = 2 (s− r + 2 )+1
Table 11.1. Convergence rates
378
Scaling, Fractals and Wavelets
For some estimators, the availability of the optimal rate of convergence is conditioned to the knowledge of information about the signal, such as its regularity. This constraint is a drawback in applications. In this context we try to develop adaptive estimators. DEFINITION 11.6.– X ∗ is an adaptive estimator for the loss Lp and the set {Fα , α ∈ A} if for all α ∈ A, there exists a constant cα > 0: sup E||X ∗ − X||pp ≤ cα Rn (α, p)
X∈Fα
For general results about adaptivity, see [LEP 90, LEP 91, LEP 92, BIR 97]. 11.4.3. Wavelet based denoising A popular set of denoising methods is based on decomposing the corrupted signal in a wavelet basis, processing the wavelet coefficients, and then going back to the time domain. In the case of additive white noise, this is justified by two fundamental facts: first, many real-world signals have a sparse structure in the wavelet domain, i.e. a few coefficients are significant, and most are small or zero. Second, for an orthonormal wavelet transform, all wavelet coefficients of a white noise are iid random variables. Denoising in the wavelet domain thus allows us to separate in an easy way “large”, significant, coefficients, from “small” coefficients due mainly to noise. Throughout this section, the wavelet coefficients of a signal X are denoted by xj,k where j is scale and k is location. X is the original signal, Y the observed noisy signal @ an estimator of X. We assume that Y = X +B, where B is a centered Gaussian and X white noise with variance σ 2 , independent from the original signal X. Thus, we have yj,k = xj,k + bj,k . Since the wavelet basis is supposed to be orthonormal, the bj,k are also Gaussian and iid. The first and simplest methods for denoising based on the above principles are the so-called hard and soft thresholding [DEV 92, DON 94]. Since the time these methods were introduced, a huge number of improvements have been proposed, ranging from block thresholding [HAR 98] to Bayesian approaches [VID 99] and many more. We briefly recall the basics of hard thresholding, and show why a different method is needed for the processing of irregular signals. DEFINITION 11.7.– Let Yn be a sample of Y on 2n points. The estimator of X by @ HT , a signal with the following wavelet coefficients: hard thresholding is X {ˆ xHT j,k }j,k = {yj,k .1|yj,k |≥λn }j,k where λn is a given threshold.
Local Regularity and Multifractal Methods for Image and Signal Analysis
379
Traditional choices for λn include the so-called universal, sure and Bayesian n√ thresholds [VID 99]. In this section, the universal threshold λn = σ2− 2 2n will be used throughout. s and Xn its THEOREM 11.2 (Risk for hard thresholding (D. Donoho)).– Let X ∈ Bp,q sampled version on 2n points. 2sn @ HT )2 ] ≤ C.n.2− 2s+1 RHT := E[(Xn − X
Thus, hard thresholding is near-minimax. A limitation of hard thresholding, as well as of most wavelet-based methods, is that they are not well adapted to denoise highly textured or everywhere irregular signals, in particular (multi)fractal or multifractional signals, with potentially rapid variations in local regularity. It is particularly well-known that, when the original signal is itself irregular, most wavelet-based denoising methods will typically produce an oversmoothed signal and/or so-called “ringing” effects. Indeed, as recalled above, the basic idea behind wavelet thresholding is that many real-world signals have a sparse wavelet representation, with few large wavelet coefficients. Putting small coefficients to 0 in the noisy signal will then in general do no harm, since these are mainly due to noise. Everywhere irregular signals, on the other hand, have significant coefficients scattered all other the time-frequency plane. At high frequencies, these significant but relatively small coefficients in the signal crucially determine the local irregularity. Zeroing small coefficients will thus typically destroy the regularity information. As a consequence, it is no surprise that a specific method has to be designed for such signals. Figure 11.4 illustrates some of the drawbacks just mentioned. A theoretical result on a particular class of signals also allows us to measure precisely the over-smoothing effect of hard thresholding. Define the set P ART (α) as follows: 1 (11.4) PART(α) := X, {xj,k }j,k = {εj,k .2−j(α+ 2 ) }j,k , εj,k iid in {−1, 1} PROPOSITION 11.2.– Let α ˜ X HT (n, t) denote the regularity of the signal after hard thresholding, estimated by the wcr method. Then, for a signal X ∈ P ART (α), at each point t, 6α + 1 1 8α3 + 12α2 + 12α αX HT (n, t)] = − − + p lim E[˜ n→∞ 2 2(2α + 1)2 (2α + 1)3 where p is the number of vanishing moments of the wavelet. This result means that the regularity of the denoised signal is essentially controlled by p. In particular, if we use a wavelet with an infinite number of vanishing moments, the estimated regularity of the hard thresholded signal will be equal to infinity.
380
Scaling, Fractals and Wavelets
The following sections describe three denoising methods that are well fitted to the processing of extremely irregular signals such as (multi)fractal ones. The first and second methods both make it possible to control the local Hölder regularity. They differ in the way this regularity is estimated. In the first method, this is done through linear regression over all available scales. In the second one, an “exponent between scales” is used. The third method allows a control of the multifractal spectrum. 11.4.4. Non-linear wavelet coefficients pumping In this section, a refinement of hard thresholding is presented. It is called non-linear wavelet pumping (NLP) and is near-minimax, adaptive and allows us to control the regularity of the denoised signal through a parameter δ ∈ R+ . For a theoretical study, proofs and numerical experiments, see [LEG 04b]. DEFINITION 11.8.– Let Yn a sample of Y on 2n points. The estimator of X by NLP @ N LP , a signal with the following wavelet coefficients: method, is given by X LP −jδ {ˆ xN yj,k .1|yj,k |<λn }j,k j,k }j,k = {yj,k .1|yj,k |≥λn + 2
The decision law of this method is displayed in Figure 11.3 for a given scale j and compared to the decision law of hard thresholding (which is the same for every scale).
Figure 11.3. Decision law for hard thresholding (left) and for NLP (right). Abscissa: value of the noisy wavelet coefficient. Ordinate: value of the estimator of the original wavelet coefficient
11.4.4.1. Minimax properties The NLP method and hard thresholding have similar convergence properties. s THEOREM 11.3.– Let X ∈ Bp,q .
If δ >
s , then RN LP ≤ RHT + O(RHT ) 2s + 1
Local Regularity and Multifractal Methods for Image and Signal Analysis
381
Thus, NLP is near-minimax. Additionally, If δ >
1 the estimator is adaptive. 2
11.4.4.2. Regularity control The advantage of NLP is that it allows a control over the local regularity through the parameter δ. PROPOSITION 11.3 (Increase of regularity).– Let αY (n, t) and αX@ N LP (n, t) denote @ N LP at the respectively the regularity of the noisy signal Y and of the estimator X point t, estimated by wcr. Then at each point t: αX@ N LP (n, t) = αY (n, t) + Kn δ
n
jsj .
j=1 (n) |yj,k (t)|<λn
In other words, NLP increases the Hölder regularity proportionally to the δ parameter. This result sometimes allows us to find an optimal value for δ. This is particularly the case for the set of functions P ART (α) (defined in (11.4)). PROPOSITION 11.4.– For a signal X ∈ P ART (α), at each t: ! " (6α − 1) lim E[αX@ N LP (n, t)] − E[αY (n, t)] = δ 1 + n→∞ (2α + 1)3 i.e. ! " α − 2α2 (6α − 1) +δ 1+ lim E[αX@ N LP (n, t)] = n→∞ (1 + 2α)2 (2α + 1)3 Using this proposition, we may calculate the value δideal that ensures that the denoised signal will have the same average regularity as the original signal. PROPOSITION 11.5.– For a signal X ∈ P ART (α), the “optimal” parameter δ is δideal =
α(1 + 2α)(2α + 3) 2(2α2 + 3α + 3)
382
Scaling, Fractals and Wavelets
0.06
0.04
0.02
0
−0.02
−0.04
−0.06
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
(a)
(b)
0.06
0.06
0.04
0.04
0.02
0.02
0
0
−0.02
−0.02
−0.04
−0.06
−0.04
0
1000
2000
3000
4000
5000
6000
(c)
7000
8000
9000
−0.06
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
(d)
Figure 11.4. Denoising of a lacunary wavelet series: (a) original signal (regularity: 0.2, lacunarity: 0.7); (b) noisy version; (c) denoising by the NLP method; (d) denoising by hard thresholding
11.4.4.3. Numerical experiments Lacunary wavelet series We present an example of denoising with NLP, along with a comparison with hard thresholding, on a lacunary wavelet series [JAF 00]. The regularity is equal to 0.2, and the lacunarity parameter is 0.7. Figure 11.4 represents the original signal, the noisy signal and the two denoisings. The NLP method provides a reasonable result, while the hard thresholding clearly oversmooths the signal. SAR images As a second illustration, we display an original synthetic aperture radar (SAR) image along with its hard thresholding and NLP denoisings in Figure 11.5. As we can see, the original image appears very noisy, and does not seem to hold any useful information. The hard thresholded image is not very readable either. However, we can see clearly on the image processed with NLP a river flowing from the top of the image and assuming roughly an inverted “Y” shape. Denoising is used in this application as a pre-processing step that enhances the image so that it will be possible to automatically detect the river. Such a procedure is used by IRD, a French agency, which, in this particular application, is interested in monitoring water resources in a region of Africa.
Local Regularity and Multifractal Methods for Image and Signal Analysis
50
50
50
100
100
100
150
150
150
200
200
200
250
250 50
100
150
200
250
383
250 50
100
150
200
250
50
100
150
200
250
Figure 11.5. Left: original SAR image. Middle: denoising by HT. Right: denoising by NLP
11.4.5. Denoising using exponent between scales 11.4.5.1. Introduction In [ECH 07], Echelard presents a denoising method that is similar in spirit to that just described, and is thus also well fitted to the processing of irregular signals. The proposed approach consists of extrapolating the unknown, small, coefficients by imposing a local regularity constraint. More precisely, the small coefficients are reconstructed in such a way that the local regularity at each point of the denoised signal matches the regularity of the original signal. Of course, since the original signal is unknown, so is its regularity. Thus, we first need to estimate the local regularity of the original signal from the noisy observations. As in the previous section, a difficulty arises from working on discrete signals. Indeed, the very definition of Hölder exponents requires us to let the resolution tend to ∞, which cannot be done here. We require an adapted definition of α that both makes sense at finite resolution and allows us to capture the visual impression of regularity on sampled signals. In the previous section, a regression of the wavelet coefficients was used for this purpose. Here, a different path is taken. In view of the fact that the perceived regularity depends on the considered range of scales, an “exponent between two scales” is defined as follows: log |xj,k | − 1/2 αg (j1 , j2 , X) = min min −j j∈[j1 ,j2 ] k∈Z In order to maintain some information at small scale, the proposed method follows the steps below: • Estimate the critical scale cn , defined as the scale where the coefficients of the white noise become predominant as compared to the ones of the signal. • Estimate the regularity sn of the original signal at the considered point, using coefficients at scales larger than cn .
384
Scaling, Fractals and Wavelets
• Assign to the small scale coefficients a value that is “coherent” with the ones of the coefficients at larger scales. More precisely, the wavelet coefficients of the An are set as follows: denoised signal X (11.5) {˜ xj,k }j,k = min |yj,k | , 2Kn −j(sn +1/2) sgn yj,k j,k
for j > cn , and where Kn and sn are estimated from the noisy wavelet coefficients (yj,k ) at scales j < cn . This means that, at small scales, we do not accept overly large coefficients, that is, coefficients which would not be compatible with the estimated Hölder regularity of the signals (statistically, there will always be such coefficients, since noise has no regularity given that its coefficients do not decrease with scale). On the other hand, “small” coefficients (those not exceeding 2Kn −j(sn +1/2) ) are left unchanged. Note that both the estimated regularity sn and the critical scale cn depend on the considered point. Note also that this procedure may be seen as a location-dependent shrinkage of the coefficients. We can prove the following property, which essentially says that the above method does a good job in recovering the regularity of the original signal, as measured by the exponent between two scales, provided that we are able to estimate with good accuracy its Hölder exponent at any given point t: PROPOSITION 11.6.– Let X belong to C 0 (R) for some 0 > 0, and let α denote its Hölder exponent at point t. Let (sn )n be a sequence of real numbers tending almost surely (resp. in n An be defined as above. probability) to α. Let c(n) = 1+2s . Let X n An ) tends Then, for any function h tending to infinity with h(n) ≤ n, αg (h(n), n, X almost surely (resp. in probability) to α. For this method to be put to practical use, there thus remains to estimate the critical scale and Hölder exponent from the noisy observations. This is the topic of the next section. 11.4.5.2. Estimating the local regularity of a signal from noisy observations The main result in [ECH 07] concerning the estimation of the critical scale is the following. THEOREM 11.4.– Let (xi )i∈N denote the wavelet coefficients of X ∈ C 0 (R) “above” a point t where the local and pointwise Hölder exponent of X coincide. Let β = lim inf i→∞ − logi |xi | . Assume that there exists a decreasing sequence (εn ) such
Local Regularity and Multifractal Methods for Image and Signal Analysis
385
that εn = o n1 when n → ∞ and − logi |xi | ≥ β − εi , for all i. Let (yi ) denote the noisy coefficients corresponding to the xi . Let: Ln (p) =
n 1 y2 , (n − p + 1)2 i=p i
and denote p∗ = p∗ (n) an integer such that: Ln (p∗ ) =
min
p:1≤p≤n−b log(n)
where b > 1 is a fixed number. Finally let q(n) = ∀a > 1,
Ln (p),
n 1 . 2(β− n )
p∗ (n) ≤ q(n) + a log(n),
Then, almost surely:
n→∞
In addition, if the sequence (xi ) verifies the following condition: there exists a sequence of positive integers (θn ) such that, for all n large enough and all θ ≥ θn : q−1 1 − δβ∗ 1 2 xi > bσn2 , ∗ θ (1 − δβ )2 i=q−θ
where δ∗ ∈ (0, 12 ) and δ ∗ ∈ ( 12 , β). Then, almost surely: ∀a > 1,
p∗ (n) ≥ q(n) − max(a log(n), θn ),
n→∞
In other words, when the conditions of the theorem are met, any minimizer of L is, within an error of O(log(n)), approximately equal to the searched for critical scale. This allows in turn to estimate the Hölder exponent using the next corollary: COROLLARY 11.1.– With the same notations and assumptions as in the theorem above, with the additional condition that θn is not larger than b log(n) for all ˆ sufficiently large n, define: β(n) = 2p∗n(n) + n1 . Then the following inequality holds almost surely for all large enough n: ˆ |β(n) − β| ≤ 2bβ 2
log(n) . n
ˆ The value sn = β(n) + 12 is used in (11.5). Kn is estimated as the offset in the linear least square regression of the logarithm for the absolute value of the wavelet coefficients with respect to scale, at scales larger than p∗ (n).
386
Scaling, Fractals and Wavelets
Figure 11.6. Top: original Weierstrass function. Middle: noisy version. Bottom: signal obtained with the regularity preserving method
11.4.5.3. Numerical experiments Figure 11.6 shows the original, noisy and denoised versions of a Weiertsrass function. 11.4.6. Bayesian multifractal denoising 11.4.6.1. Introduction In [LEG 04b, LEV 03], a denoising method is presented that assumes a minimal local regularity. This assumption translates into constraints on the multifractal spectrum of the signals. Such constraints are used in turn in a Bayesian framework to estimate the wavelet coefficients of the original signal from the noisy ones. An assumption is made that the original signal belongs to a certain set of parameterized classes S described below. Functions belonging to such classes have a minimal local regularity, but may have wildly varying pointwise Hölder exponent. Along with possible additional conditions, this yields a parametric form for the prior distribution of the wavelet coefficients of X. These coefficients are estimated using a traditional maximum a posteriori technique. As a consequence, the estimate is defined as the signal “closest” to the observation which has the desired multifractal spectrum (or a degenerate version of it, see below). Because the multifractal spectrum subsumes information about the pointwise Hölder regularity, this procedure is naturally adapted for signals which have sudden changes in regularity.
Local Regularity and Multifractal Methods for Image and Signal Analysis
387
11.4.6.2. The set of parameterized classes S(g, ψ) The denoising technique described below is based on the multifractal spectrum rather than the use of the sole Hölder exponent. This will in general allow for more robust estimates, since we use a higher level description subsuming information on the whole signal. For such an approach to be practical, however, we need to make the assumption that the considered signals belong to a given set of parameterized classes, as we will now describe. Let F be the set of lower semi-continuous functions from R+ to R ∪ {−∞}. We consider classes of random functions X(t), t ∈ [0, 1], defined on (Ω, F, P) defined by (11.6) below1. Each class S(g, ψ) is characterized by the functional parameter g ∈ F and a wavelet ψ such that the set {ψj,k }j,k forms a basis of L2 . Let: – K be a positive constant log (K|x |) – Pεj (α, K) = P × Pj α − ε < 2 −j j,k| < α + ε Define:
S(g, ψ) =
X : ∃K > 0, j0 ∈ Z : ∀j > j0 , xj,k
and xj,k are identically distributed for (k, k ) ∈ {0, 1, . . . , 2j − 1} log2 Pεj (α, K) = g(α) + Rn,ε (α) and j
(11.6)
where Rn,ε (α) is such that limε→0 limn→∞ Rn,ε (α) = 0 uniformly in α. The assumption that, for large enough j, the wavelet coefficients (xj,k )k at scale j are identically distributed entails that: log2 (K|xj,k |) ε <α+ε πj (α, K) := P × Pj α − ε < −j 2n log2 (K|xj,k |) −n <α+ε =2 P α−ε< −j k=0 log2 (K|xj,k |) <α+ε =P α−ε< −j Consequently, definition (11.6) has a simple interpretation in terms of multifractal analysis: For a given wavelet ψ, we consider the set of random signals X such that the
1. Extension to functions defined on Rn requires only minor adaptations.
388
Scaling, Fractals and Wavelets
normalized signal KX has a deterministic multifractal spectrum Fg (α) (with respect to ψ) equal to 1 + g, with the following additional condition: Fg is obtained as a limit in j rather than a lim inf, this limit being attained uniformly with respect to α. This condition ensures that, for sufficiently large j, the rescaled statistics of the wavelet based coarse-grained exponents αj,k are close enough to their limit, allowing meaningful inference. The classes S(g, ψ) encompass a fairly wide variety of signals. Most models of (multi)fractal processes and certain other “traditional” processes belong to such classes. These include IFS, multiplicative cascades, fractional Brownian motion and stable processes. Such processes have been used in the modeling of Internet traffic, financial records, speech signals, medical images and more. 11.4.6.3. Bayesian denoising in S(g, ψ) The main steps in the traditional maximum a posteriori (MAP) approach in a Bayesian framework are given in this section. The MAP estimate of x ˆj,k of xj,k from the observation yj,k is defined to be an argument that maximizes P(xj,k /yj,k ). Using Bayes rules, and since P(yj,k ) does not depend on xj,k , maximizing P(xj,k /yj,k ) amounts to maximizing the product P(yj,k /xj,k )P(xj,k ). The MAP estimate is thus obtained as follows: x ˆj,k = argmaxx [P(yj,k /x)P(x)] The term P(yj,k /x) is easily computed from the law of B if one assumes that B is white, since the bj,k then have the same law as B (recall orthonormal wavelets are used). The prior P(xj,k ) is deduced from our assumption that X belongs to S(g, ψ) in (Kx) the following way. For x > 0, set αj (x) = log2−j . P(|xj,k | = x) = P(K|xj,k | = Kx) log2 (K|xj,k |) = αj (x) =P −j 2j(g(αj (x))−1) This leads us to define the approximate Bayesian MAP estimate as: % B $ C ˆ log2 (Kx) + log2 (P(yj,k /x)) × sgn(yj,k ) x ˆj,k = argmaxx>0 jg −j @ is given by: where sgn(y) denotes the sign of y and K @ = ( sup sup(xj,k ))−1 K j>j0
k
(11.7)
Local Regularity and Multifractal Methods for Image and Signal Analysis
389
The estimate for K can be heuristically justified as follows; writing @ is α with α > 0 implies that K|xj,k | < 1 for all couples (j, k). K chosen as the smallest normalizing factor that entails the latter inequality. log2 (K|xj,k |) −j
In the numerical experiments, we shall deal with the case where the noise is centered, Gaussian, with variance σ 2 . The MAP estimate then reads: % B $ C @ (yj,k − x)2 log2 (Kx) − x ˆj,k = argmaxx>0 jg × sgn(yj,k ) −j 2σ 2 While (11.7) gives an explicit formula for denoising Y , it is often of little practical use. Indeed, in most applications, we do not know the multifractal spectrum of X, so that without an evaluation of g, it is not possible to use (11.7) to obtain x ˆj,k . In addition, we should recall that Fg depends in general on the analyzing wavelet. We would thus need to know the spectrum shape for the specific wavelet in use. Furthermore, a major aim of this approach is to be able to extract the multifractal @ A strong justification of the multifractal features of X from the denoised signal X. X Bayesian approach is to estimate Fg as follows: a) denoise Y , b) evaluate numerically @ @ c) set F@X = F X@ . Obviously, from this point of view, it does the spectrum F X of X, g
g
g
not make sense to require the prior knowledge of FgX in the Bayesian approach. Thus, a “degenerated” version of (11.7) is presented which uses a single real parameter as input instead of the whole spectrum. The heuristic is as follows; from a regularity point of view, important information contained in the spectrum is its support, i.e. the set of all possible “Hölder exponents”. More precisely, let α0 = inf{α, Fg (α) > −∞}. While the spectra shapes obtained with different analyzing wavelets depend on the wavelet, their supports are always included in [α0 , ∞). The “flat” spectrum 1[α0 ,∞) thus contains intrinsic information. Furthermore, it only depends on the positive real α0 . Rewriting (11.7) with a flat spectrum yields the following explicit simple expression for x ˆj,k :
x ˆj,k = yj,k = sgn(yj,k )2−jα0
if K|yj,k | < 2−jα0
(11.8)
otherwise
Although α0 is really prior information, it can be estimated from the noisy observations [LEG 04b]. In this respect, it is comparable to the threshold used in the traditional hard or soft wavelet thresholding scheme. Furthermore, in applications, it is useful to think of α0 rather as a tuning parameter, whereby increasing α0 yields a smoother estimate (since the original signal is assumed to have a larger minimal exponent). Note that (11.8) has a flavor reminiscent of the method described in section 11.4.5.
390
Scaling, Fractals and Wavelets
11.4.6.4. Numerical experiments As a test signal for numerical experiments, we shall consider fractional Brownian motion (FBM). As is well-known, FBM is the zero-mean Gaussian process X(t) with covariance function: R(t, s) = E(X(t)X(s)) =
σ 2 2H |t| + |s|2H − |t − s|2H 2
where H is a real number in (0, 1) and σ is a real number. FBM reduces to Brownian motion when H = 1/2. In all other cases, it has stationary correlated increments. At all points, the local and pointwise Hölder exponents of FBM equal H almost surely. The Hausdorff multifractal spectrum of FBM is degenerated, as we have fh (α) = −∞ almost surely for α = H, fh (H) = 1. The large deviation spectrum, however, depends on the definition of Ynk : if we consider oscillations, then fg = fh . Taking increments, we get that, almost surely, for all α: ⎧ ⎪ if α < H ⎨−∞ fg (α) = fl (α) = H + 1 − α if H ≤ α ≤ H + 1 ⎪ ⎩ −∞ if α > H + 1 Moreover, in both cases (oscillations and increments), fg (α) is given by log Nnε (α) ε→0 n→∞ log n lim lim
(i.e. the lim inf in n is really a plain limit). Together with the stationarity property of the increments (or the wavelet coefficients), this entails that FBM belongs to a class S(g, ψ). If we define the Ynk to be wavelet coefficients, the spectrum will depend on the analyzing wavelet ψ. All spectra with upper envelope equal to the characteristic function of [H, ∞) may be obtained with adequate choice of ψ. The result of the denoising procedure will thus in principle be wavelet-dependent. The influence of the wavelet is controlled through the prior choice, i.e. the multifractal spectrum among all admissible ones. In practice, few variations are observed if we use a Daubechies wavelet with length between 2 and 20, and a non-increasing spectrum supported on [H, H + 1] with fg (H) = 1. A graphical comparison of results obtained through Bayesian multifractal denoising and traditional hard and soft thresholding is displayed in Figure 11.7. For each method, the parameters were manually set so as to obtain the best fit to the known original signal. By and large, the following conclusions may be drawn from these experiments. First, it is seen that, for irregular signals such as FBM, which belong to S(g, ψ), the Bayesian method yields more satisfactory results
Local Regularity and Multifractal Methods for Image and Signal Analysis 0.5
0.5
0
0
Ŧ0.5
Ŧ0.5 200
400
600
800
1000
0.5
0.5
0
0
Ŧ0.5
Ŧ0.5 200
400
600
800
1000
0.5
0.5
0
0
Ŧ0.5
Ŧ0.5 200
400
600
800
1000
391
200
400
600
800
1000
200
400
600
800
1000
200
400
600
800
1000
Figure 11.7. First line: FBM with H = 0.6 (left) and noisy version with Gaussian white noise (right). Second line: Denoised versions with a traditional wavelet thresholding; hard thresholding (left), soft thresholding (right). Third line: Bayesian denoising with increments’ spectrum (left), Bayesian denoising with flat spectrum (right)
than traditional wavelet thresholding (we should however recall that hard and soft thresholding were not designed for stochastic signals). In particular, this method preserves a roughly correct regularity along the path, while wavelet thresholding yields a signal with both too smooth and too irregular regions. Second, it appears that using the degenerate information provided by the “flat” spectrum does not significantly decrease the denoising quality. 11.4.6.5. Denoising of road profiles An important problem in road engineering is to understand the mechanisms of friction between rubber and the road. This is a difficult problem, since friction depends on many parameters: the type of rubber, the type of road, the speed, etc. Several authors have shown that most road profiles are fractal [RAD 94, HEI 97, GUG 98] on given ranges of scales. Such a property has obvious consequences on friction, some of which have been investigated for instance in [RAD 94, KLU 00]. The main idea is that, in the presence of fractal roads, all irregularity scales contribute to friction [DO 01].
392
Scaling, Fractals and Wavelets
In [LEG 04a], it is verified that road profiles finely sampled using tactile and laser sensors are indeed fractals. More precisely, it is shown that they have well-defined correlation exponents and regularization dimensions over a wide range of scales. However, various classes of profiles which have different friction coefficients are not discriminated by these global fractal parameters. This means that friction may have relatively low correlation with fractional dimensions or correlation exponents. In contrast, experiments show that the pointwise Hölder exponent allows us to separate road profiles which have different friction coefficients. The laser acquisition system developed at LCPC (Laboratoire central des ponts et chaussées), based on an Imagine optics sensor, allows us to obtain (at the finest resolution) road profiles with a sampling step of 2.5 microns. These signals are very noisy. As a consequence, when they are used for computing a theoretical friction ([DO 01]), a very low correlation with the real friction (0.48) is obtained. Since local regularity is related to friction, it seems natural to use one of the regularity-based denoising methods presented above in this case. It appears that a Bayesian denoising is well fitted; after this denoising, the correlation between theoretical real friction increases up to 0.86 (see Figure 11.8). For more on this topic, see [LEG 04b]. Denoised Profiles : correlation = 0.8565 1
0.9
0.9
0.8
0.8
0.7
0.7 Theoretical friction
Theoretical friction
Original Profiles : correlation = 0.4760 1
0.6 0.5 0.4
0.6 0.5 0.4
0.3
0.3
0.2
0.2
0.1
0.1
0
0
0.2
0.4 0.6 Measured friction
(a)
0.8
1
0
0
0.2
0.4 0.6 Measured friction
0.8
1
(b)
Figure 11.8. Theoretical friction versus measured friction. Each star represents a given class of road profiles. We calculate the friction for each profile in a class and performs an average: (a) originals profiles (correlation 0.4760); (b) denoised profiles (correlation 0.8565)
Local Regularity and Multifractal Methods for Image and Signal Analysis
393
11.5. Hölderian regularity based interpolation 11.5.1. Introduction A ubiquitous problem in signal and image processing is to obtain data sampled with the best possible resolution. At the acquisition step, the resolution is limited by various factors such as the physical properties of the sensors or the cost. It is therefore desirable to seek methods which would allow us to increase the resolution after acquisition. This is useful, for instance, in medical imaging or target recognition. At first sight, this might appear hopeless, since we cannot “invent” information which has not been recorded. Nevertheless, by making reasonable assumptions on the underlying signal (typically, a priori smoothness information), we may design various methods (note that we do not consider here the situation where several low resolution overlapping signals are available). However, most techniques developed so far suffer from a number of problems. While the interpolated image is usually too smooth, it also occurs sometimes that on the contrary too many details are added, in particular in smooth regions. In addition, the creation of details is not well controlled, so that we can neither predict how the high resolution image will look like, nor describe the theoretical properties of the interpolation scheme. The main idea of [LEV 06] is to perform interpolation in such a way that smooth regions as well as irregular regions (i.e. sharp edges or textures) remain so after zooming. This can be interpreted as a constraint on the local regularity; the interpolation method should preserve local regularity. 11.5.2. The method Let Xn be the signal obtained by sampling the signal X on 2n points. The proposed method is strongly related to the estimator W CR described in section 11.3.2 and follows the steps below: – estimate the regularity by the W CR method: the regression of the logarithm of wavelet coefficients vs scale is calculated above each point t; – the wavelet coefficients at the scale n+1 are obtained by the following formula: 2 log |xj,k |(3j − n − 2) n(n − 1) j=1 2 n
log2 |˜ xn+1,k | =
with xj,k , j = 1 . . . n the wavelet coefficients “above” t. This means that the regression slope (i.e. the estimated regularity by the W CR method) remains the same after interpolation (see Figure 11.9 left, second row); – perform an inverse wavelet transform.
394
Scaling, Fractals and Wavelets
With this method, the local estimated regularity of the signal/image will remain unchanged because the high frequencies content is added in a manner coherent with lower scales. From an algorithmic point of view, we note that only one computation is needed, whatever the number of added scales. 11.5.3. Regularity and asymptotic properties An+m be the signal after m interpolations, α ˜ (n, t) be the estimated regularity Let X A at t given by (11.2) and log2 Kn,k be the ordinate at zero of the W CR regression. PROPOSITION 11.7.– If X ∈ C α then, whatever the number m of added scales: An+m 2 ≤
X −X 2
@n c2 K 1 2−2αn + 2αˆ n 2−2@αn n 2α 2 2 −1 2 −1
@n, α with (K @n ) such that: @ n 2−2j αˆ n = K
' max
A n,k ,α(n,t)) (K ˜
˜ A n,k 2−2j α(n,t) K
(
PROPOSITION 11.8.– Assume that X ∈ C α and that at each point the local and the pointwise Hölder exponents of X coincide. Then, ∀ε > 0, ∃N : An+m 2 = O(2−(n+m)(α−ε) ) n > N ⇒ X − X An+m B s = O(2−(n+m)(α−s−ε) ) for all s < α − ε. In addition, X − X p,q See [LEG 04b] for more results. 11.5.4. Numerical experiments We show a comparison between the regularity-based interpolation method and a traditional bicubic method on a scene containing a Japanese door (toryi). Figure 11.9 displays the original 128×128 pixel image, and eight-times bicubic and regularity-based interpolations on a detail of the door image. 11.6. Biomedical signal analysis Fractal analysis has long been applied with success in the biomedical field. A particularly interesting example is provided by the study of ECG. ECG and signals derived from them, such as RR intervals, are an important source of information in the detection of various pathologies, e.g. congestive heart failure and sleep apnea, among others. The fractality of such data has been reported in numerous works over the years. Several fractal parameters, such as the box dimension, have been found to correlate well with this heart condition in certain situations ([PET 99, TU 04]).
Local Regularity and Multifractal Methods for Image and Signal Analysis
16
14
12
10
8
6
4
2
0
0
2
4
6
8
10
12
14
Figure 11.9. Left. First row: original door image, second row: Regression of the logarithm of the wavelet coefficients vs scales above the point t. The added wavelet coefficient is the one on the right. Right: 8 times bicubic (up) and regularity-based (bottom) interpolations on door image (detail)
395
396
Scaling, Fractals and Wavelets
More precise information on ECG is provided by multifractal analysis, because their local regularity varies wildly from point to point. In the specific case of RR intervals, several studies have shown that the multifractal spectrum correlates well with the heart condition ([IVA 99, MEY 03, GOL 02]). Roughly speaking, we observe two notable phenomena: – On average, the local regularity of healthy RR is smaller than that in presence of, e.g., congestive heart failure. In other words, pathologies increase the local regularity. – Healthy RR have much more variability in terms of local regularity; congestive heart failure reduces the range of observed regularities. These results may be traced back to the fact that congestive heart failure is associated with profound abnormalities in both the sympathetic and parasympathetic control mechanisms that regulate beat-to-beat variability [GOL 02]. A precise view on the mechanisms leading to multifractality is important if we want to understand the purposes it serves and how it will be modified in response to external changes or in case of abnormal behavior. As of today, there is no satisfactory multifractal model for RR intervals ([AMA 99]). As a preliminary step toward this goal, we shall describe in this section a remarkable feature of the time-evolution of the local regularity. Obviously, calculating the time evolution of the local regularity gives far more information than the sole multifractal spectrum. Indeed, the latter may be calculated from the former, while the reverse is not true. In addition, inspecting the variations of local regularity yields new insights which cannot be deduced from a multifractal spectrum, since all time-dependent information is lost on a spectrum. This is crucial for RR intervals, since, as we will see, the evolution of local regularity is strongly (negatively) correlated with the RR signals. This fact prompts the development of new models that would account for the fact that, when the RR intervals are larger, the RR signal is more irregular, and vice versa. In that view, we shall briefly describe a new mathematical model that goes beyond the usual multifractional Brownian motion (MBM). Recall that the MBM is the following random process, that depends on the functional parameter H(t), where H : [0, ∞) → [a, b] ⊂ (0, 1) is a C 1 function: 0 WH(t) (t) = [(t − s)H(t)−1/2 − (−s)H(t)−1/2 ]dW (s) −∞
t
+
(t − s)H(t)−1/2 dW (s).
0
The main feature of MBM is that its Hölder exponent may be easily prescribed; at each point t0 , it is equal to H(t0 ) with a probability of one. Thus, MBM allows us to describe phenomena whose regularity evolves in time/space. For more details on MBM, see Chapter 6.
Local Regularity and Multifractal Methods for Image and Signal Analysis
397
Figure 11.10 shows two paths of MBM with a linear function H(t) = 0.2 + 0.6t and a periodic H(t) = 0.5 + 0.3 sin(4πt). We clearly see how regularity evolves over time.
Figure 11.10. MBM paths with linear and periodic H functions
Estimation of the H functions from the traces in Figure 11.10, using the so-called generalized quadratic variations are shown in Figure 11.11 (the theoretical regularity is in gray and the estimated regularity is in black).
Figure 11.11. Estimation of the local regularity of MBM paths. Left: linear H function. Right: periodic H function
24-hour interbeat (RR) interval time series obtained from the PhysioNet database [PHY] along with their estimated local regularity (assuming that the processes may be modeled as MBM) are shown in Figure 11.12. These signals were derived from long-term ECG recordings of adults between the ages of 20 and 50 who have no known cardiac abnormalities and typically begin and end in the early morning (within an hour or two of the subject waking). They are composed of around 100, 000 points. As we can see from Figure 11.12, there is a clear negative correlation between the value of the RR interval and its local regularity: when the black curve moves up, the
398
Scaling, Fractals and Wavelets
Figure 11.12. RR interval time series (upper curves) and estimation of the local regularity (lower curves)
gray tends to move down. In other words, slower heartbeats (higher RR values) are typically more irregular (smaller Hölder exponents) than faster ones. In order to account for this striking feature, the modeling based on MBM must be refined. Indeed, while MBM allows us to tune regularity at each time, it does so in an “exogenous” manner. This means that the value of H and of WH are independent. A better model for RR time series requires us to define a modified MBM where the regularity would be a function of WH at each time. Such a process is called a self-regulating multifractional process (SRMP). It is defined as follows. We give ourselves a deterministic, smooth, one-to-one function g ranging in (0, 1), and we seek a process X such that, at each t, αX (t) = g(X(t)) almost surely. It is not possible to write an explicit expression for such a process. Rather, a fixed point approach is used, which we now briefly describe (see [BAR 07] for more details). Start from an MBM WH with an arbitrary function H (for instance a constant). At the second step, set H = g(WH ). Then iterate this process, i.e. calculate a new WH with this updated H function, and so on. We may prove that these iterations will almost surely converge to a well-defined SRMP X with the desired property, namely the regularity of X at any given time t is equal to g(X(t)). For such a process, there is a functional relation between the amplitude and the regularity. However, this does not make precise control of the Hölder exponent possible. Let us explain this through an example. Take definiteness g(t) = t for all t. Then, a given realization might result in a low value of X at, say, t = 0.5 and thus high irregularity at this point, while another realization might give a large X(0.5), resulting in a path that is smooth at 0.5. See Figure 11.13 for an example of this fact. In order to gain more control, the definition of an SRMP is modified as follows. First define a “shape function” s, which is a deterministic smooth function. Then, at each step, calculate WH , and set H = g(WH + ms), where m is a positive number. The function s thus serves two purposes. First, it allows us to tune the shape of X:
Local Regularity and Multifractal Methods for Image and Signal Analysis
399
1 0.8 0.6 0.4
1
2
3
4
0.2
0
1
2
3
0
1
2
3
4
x 10
1 0.8 0.6 0.4
1
2
3
4
0.2
4
x 10
Figure 11.13. Paths of SRMPs with g(Z) = Z
when m is large, X and s will essentially have the same form. Second, because of the first property, it allows us to decide where the process will be irregular and where it will be smooth. Figure 11.14 displays an example of SRMP with controlled shapes.
(a)
(b)
Figure 11.14. (a) SRMP with g(Z) = Z (black), and controlling shape function (gray); (b) the same SRMP (black) with estimated Hölder exponent (gray)
400
Scaling, Fractals and Wavelets
It is then possible to obtain a fine model for RR traces based on the following ingredients: – an “s” function, that describes the overall shape of the trace, and in particular the nycthemeral cycle; – a g function whose role is to ensure that the correct relation between the heart rate and its regularity is maintained at all times. The shape s is estimated from the data in the following way; for each RRi time series, histograms of both the signal and its exponent are plotted, and modeled as a sum of two Gaussians, as represented in Figure 11.15.
Figure 11.15. Histogram of RRi time series, modeled as a sum of two Gaussians
From these signals the shape functions are inferred. They are based on splines and parameterized by: – Dn , duration of the night: Dn ∈ [6, 10] – Dm , duration of the beginning of the measure: Dm ∈ [2, 4] – Ds , duration of the sleeping phase: Ds ∈ [0.5, 1.5] – Da , duration of the awakening phase: Dr ∈ [0.5, 1.5] – RRid , mean interbeat interval during the day: RRid ∈ [0.6018, 0.7944] – RRin , mean interbeat interval during the night: RRin ∈ [0.7739, 1.0531] randomly chosen, in each case, in their respective intervals, with uniform probability (see Figure 11.16 for a representation).
Local Regularity and Multifractal Methods for Image and Signal Analysis
401
Figure 11.16. Shape function of RR intervals
The g function is estimated in the phase space. More precisely, for each trace, the value of H as a function of the RR interval is plotted. Representing all these graphs on a single plot, a histogram is obtained, as in Figure 11.17. 8
5
6 4
4 5
2 3
3
8 5
6 4
2
2 5
2 1 0.2
0.4
0.6
0.8
1 RRi
1.2
1.4
1.6
1.8
8 0.4
0.5
0.6
0.7
0.8
0.9
1
RRi
Figure 11.17. Histogram in the phase space
The ridge line of this histogram, seen as a surface in the (RR, α) plane is then extracted (see Figure 11.17). It is seen that this ridge line is roughly a straight line, that is, fitted using least squares minimization in order to obtain an equation of the form α = g(RR) = aRR + b. The last step is to synthesize an SRMP with shape function s and regularity function g, as explained in the previous section. Paths obtained in this way are shown in Figure 11.18. Compare this with the graphs shown in Figure 11.12, displaying true RR traces. 11.7. Texture segmentation We will now briefly explain how multifractal analysis may be used for texture segmentation. We present an application to 1D signals, namely well logs. For an application to images, see [MUL 94, SAU 99].
402
Scaling, Fractals and Wavelets
Figure 11.18. Two forged RR intervals based on SRMP (upper curves) and estimated regularity (lower curves)
The characterization of geological strata with the help of well logs can be used for the interpretation of a sedimentary environment of an area of interest, such as reservoirs. Recent progress [SER 87] of electrofacies has enabled us to relate well logs to the sedimentary environment and to extrapolate information coming from the core of any vertical well span. Electrofacies predictions are based on multivariate correlations and cluster analysis for which the entry data are conventional well logs (including sonic logs, of density and gamma) as well as the information extracted from the core analysis. The microresistivity log (ML) measures the local rock wall resistivity of the wells. The measure is obtained by passing an electric current in the rock, to a lateral depth of approximately 1 cm. The resistivity varies according to the local porosity function and the connectivity of the pores (normally, the rock is a non-conductor and thus the current passes in the fluid contained in the pores). ML contain information not only on the inclination of geological strata but also on the texture of these strata. To analyze the irregular variations of ML, we may calculate texture parameters locally and at different depths in the well. These can be used to obtain a well segmentation by letting [SAU 97] r(xi ) denote the signal resistivity, where xi are equidistant coordinates which measure the depth in the wells. In order to emphasize the vertical variations of r(xi ), a transformed signal sω (xi ) is first defined sω (xi ) = |r(xi+1 ) − r(xi )|ω where ω > 0. This transformation amplifies the small scales and eliminates any constant component of signal r(xi ). Analysis of well logs from the Oseberg reservoir in the North Sea shows that, typically, a fractal behavior is observed for lengths of about [2 cm, 20 cm]. It has been
Local Regularity and Multifractal Methods for Image and Signal Analysis
403
found that relevant textural indices are given by the information dimension D(1) and the curvature parameter αc = 2|D (1)|/(D(0) − D(1)), where D(q) is defined by D(q) = τ (q)/(q − 1), with an obvious modification for q = 1. In particular, these indices allow us to separate the three strata present in these logs. For instance, D(1) is correlated with the degree of heterogenity: a formation which is more heterogenous translates into smaller D(1) [SAU 97]. 11.8. Edge detection 11.8.1. Introduction In the multifractal approach to edge detection, an image is modeled by a measure μ, or, more precisely, a Choquet capacity [LEV 98]. A Choquet capacity is roughly a measure which does not need to satisfy the additivity requirement. This distinction will not be essential for our discussion below, and the reader may safely assume that we are dealing with measures. See Chapter 1 for more precise information on this. The basic assumptions underlying the multifractal approach to image segmentation are as follows: • The relevant information for the analysis can be extracted from the Hölder regularity of μ. • Three levels contribute to the perception of the image: the pointwise Hölder regularity of μ at each point, the variation of the Hölder regularity of μ in local neighborhoods and the global distribution of the regularity in the whole scene. • The analysis should be translation and scale invariant. Let us briefly compare the multifractal approach to traditional methods such as mathematical morphology (MM), and gradient based methods, or more generally image multiscale analysis (IMA): • As in MM and IMA, translation and scale invariance principles are fulfilled. • There is no so-called “local comparison principle” or “local knowledge principle”, i.e. the decision of classifying a point as an edge point is not based only on local information. On the contrary, it is considered useful to analyze information about whole parts of the image at each point. • The most important difference between “traditional” and multifractal methods lies in the way they deal with regularity. While the former aims at obtaining smoother versions of the image (possibly at different scales) in order to remove irregularities, the latter tries to extract information directly from the singularities. Edges, for instance, are not considered as points where large variations of the signal still exist after smoothing, but as points whose regularity is different from the “background” regularity in the raw data. Such an approach makes sense for “complex” images, in which the relevant structures are themselves irregular. Indeed, an implicit assumption of MM and IMA is that the useful information lies at boundaries between originally smooth regions, so that it is natural to filter the image. However, there are cases (e.g.
404
Scaling, Fractals and Wavelets
in medical imaging, satellite or radar imaging) where the meaningful features are essentially singular. • As in MM, and contrarily to IMA, the multifractal approach does not assume that there is a universal scheme for image analysis. Rather, depending on what we are looking for, different measures μ may be used to describe the image. • Both MM and IMA consider the relative values of the gray levels as the basic information. Here the Hölder regularity is considered instead. This again is justified in situations where the important information lies in the singularity structure of the image. Throughout the rest of this section, we make the following assumption: f := fh = fg The simplest approach then consists of defining a measure (or, often, a sequence of Choquet capacities on the image), calculating its multifractal spectrum, and classifying each point according to the corresponding value of (α, f (α)), both in a geometric and a probabilistic fashion. The value of α gives a local information about the pointwise regularity: for a fixed capacity, an ideal step edge point in an image without noise is characterized by a given value. The value of f (α) yields a global information: a point on a smooth contour belongs to a set Tα whose dimension is 1, a point contained in a homogenous region has f (α) = 2, etc. The probabilistic interpretation of f (α) corresponds to the fact that a point in a homogenous region is a frequent event, an edge-point is a rare event, and, for instance, a corner an even rarer event (see Figures 11.19 and 11.20). Indeed, if too many “edge points” are detected, it is in general more appropriate to describe these points as belonging to a homogenous (textured) zone.
Figure 11.19. Three edges, a texture
Figure 11.20. Three corners, a texture
Local Regularity and Multifractal Methods for Image and Signal Analysis
405
In other words, the assumption that fg = fh allows us to link the geometric and probabilistic interpretations of the spectrum. Points on a smooth contour have an α such that: • fh (α) = 1 because a smooth contour fills the space as a line. • fg (α) = 1 because a smooth contour has a given probability to appear. In fact, we may define the point type (i.e. edge, corner, smooth region, etc.) through its associated f (α) value; a point x with f (α(x)) = 1 is called an edge point, a point x with f (α(x)) = 2 is called a smooth point, and, more generally, for t ∈ [0, 2], x is called a point of type t if f (α(x)) = t. A benefit of the multifractal approach is thus that it allows us to define not only edge points, but a continuum of various types of points. An important issue lies in the choice of a relevant sequence of capacities for describing the scene. The problem of finding an optimal c in a general setting is unsolved. In practice, we often use the following. Assume the image is defined on [0, 1] × [0, 1]. Let P := ((Ikn )0≤k<νn )1≤n≤N be a sequence of partitions of [0, 1] × [0, 1] and (xnk , ykn ) be any point in Ikn . Each Ikn is made of an integer number of pixels. Let L(Ikn ) denote the sum of the gray levels in Ikn . Let (x, y) denote a generic pixel in the image and L(x, y) denote the gray level at (x, y). Let (pn )1≤n≤N be a fixed sequence of positive integers and Ω be a region in the image. sum measure: cs (Ω) =
L(x, y)
(x,y)∈Ω
max capacities: cm n (Ω) =
max
n ,y n+pn )∈Ω Ikn+pn /(xn+p k k
L(Ikn+pn )
cM (Ω) = max L(x, y) (x,y)∈Ω
iso capacities: n cin (Ω) = max # {k / L(Ikn+pn ) = l, (xn+p , ykn+pn ) ∈ Ω} k
l
c (Ω) = max # {(x, y) / L(x, y) = l, (x, y) ∈ Ω} I
l
It is easy to see that: • cs (Ω) depends on both the gray level values and their distribution in Ω,
406
Scaling, Fractals and Wavelets M • cm n (Ω) and c (Ω) only depend on the gray level values, • cin (Ω) and cI (Ω) only depend on the gray level distribution.
M i I Thus, (cm n , c ) and (cn , c ) give in some loose sense “orthogonal” information about the image. Furthermore, it can be shown that they are more robust to noise than cs .
11.8.1.1. Edge detection The simplest procedure for extracting edges using multifractal analysis is as follows: • Choose a sequence c of capacities. • Calculate the Hölder exponent of c at each point of the image. • Calculate the multifractal spectrum of c. • Declare as “smooth” edge points those belonging to the set(s) Tα whose dimension is 1. • Declare as “irregular” edge points those belonging to the set(s) Tα whose dimension is between two fixed values, typically 1.1 and 1.5. A result of segmentation using this approach is shown in Figure 11.21.
(a)
(c)
(b)
(d)
Figure 11.21. (a) Original image; (b) Hölder exponents with max capacity; (c) smooth edges; (d) irregular edges
Local Regularity and Multifractal Methods for Image and Signal Analysis
407
The sum measure, max and iso capacities are the basic tools used for analyzing images. There are cases where specific capacities, and/or more elaborate schemes than the one just described, must be designed in order to get robust results. See [LEV 98] for more details. 11.9. Change detection in image sequences using multifractal analysis In [CAN 96], the authors propose a multifractal approach to the problem of change detection in image sequences. Multifractal analysis proves to be useful for detecting changes without any a priori knowledge of the object to be extracted. If a change occurs in an incoming image, it is reflected in the global description provided by the multifractal spectrum graph. The abscissa α of the spectrum part (α, fh (α)) that has changed makes it possible to extract binary images corresponding to the changes. As can be seen in Figure 11.22c and Figure 11.23b, the extracted changes using multifractal analysis are much more relevant than the absolute difference between the two images.
(a)
(b)
(c)
Figure 11.22. (a) First image, (b) second image and (c) absolute difference (pixel to pixel) between the two images
(a)
(b)
Figure 11.23. (a) Local Hölder exponents image and (b) changes detected using multifractal analysis
408
Scaling, Fractals and Wavelets
11.10. Image reconstruction In [TUR 02], the authors describe an interesting method for image reconstruction that mixes multifractal analysis and more traditional diffusion-like techniques. The image I (or, more precisely, the modulus of its gradient) is first modeled as a so-called log-Poisson multifractal measure. This means that its spectrum f admits the following parametric form: α − α∞ α − α∞ 1 − log , f (α) = f∞ + γ γ(2 − f∞ ) where α∞ is the lowest observed exponent, f∞ the associated spectrum value and γ := − log[1 + α∞ /(2 − f∞ )]. The justification for considering such a model is that it seems to describe many natural images with reasonable accuracy. For most images, it is observed that f∞ 1, and this is the value chosen for the rest of the method. The points in the image having exponent α∞ are the most singular ones. In many cases, the set comprising all these points, which is called the most singular manifold (MSM), is strongly related to the edges of the images. This observation and the fact that the whole spectrum may be computed once α∞ is known suggest that the MSM contains the most relevant information, and that the whole image may be reconstructed from it. In order to implement this idea, a linear operator is applied to the MSM. This operator must satisfy the following natural constraints: it should be translation-invariant, isotropic, and allow us to recover the original power (Fourier) spectrum of the image. Under these constraints, we may show that there is essentially exactly one possible operator, and that the image I may be reconstructed through the following formula: if.vˆ0 (f) , cˆ(f) = f2 where – cˆ is the Fourier transform of c := I − I0 , – I0 is the mean of I, – f denotes frequency, – v0 (x) = ∇c(x) if x belongs to the MSM, and v0 (x) = 0 otherwise. Reconstruction of real-world images, such as outdoor scenes or natural textures, is surprisingly good considering the simplicity of the method.
Local Regularity and Multifractal Methods for Image and Signal Analysis
409
11.11. Bibliography [AMA 99] A MARAL L.A.N., G OLDBERGER A.L., I VANOV P.C., S TANLEY H.E. “Modeling heart rate variability by stochastic feedback”, Computer Phys. Comm., vol. 121-122, p. 126–128, 1999. [AYA 99] AYACHE A., L ÉVY V ÉHEL J., “Generalized multifractional Brownian motion: definition and preliminary results”, in D EKKING M., L EVY V EHEL J., L UTTON E. and T RICOT C. (Eds.), Fractals: Theory and Applications in Engineering, Springer, 1999. [AYA 00a] AYACHE A., “The generalized multifractional field: a nice tool for the study of the generalized multifractional Brownian motion”, Journal of Fourier Analysis and Applications, vol. 8, p. 581–601, 2000. [AYA 00b] AYACHE A., L ÉVY V ÉHEL J., “The generalized multifractional Brownian motion”, Statistical Inference for Stochastic Processes 3, pp.7–18, 2000. [BAR 07] BARRIÈRE O., Synthèse et estimation de mouvements Browniens multifractionnaires et autres processus à régularité prescrite. Definition du processus auto-regulé multifractionnaire et applications, PhD Thesis, University of Nantes, 2007. [BIR 97] B IRGÉ L., M ASSART P., “From model selection to adaptive estimation”, in Torgersen E., Pollard D., Yang G. (Eds.), Festschrift for Lucien Le Cam, Springer, New York, p. 55–87, 1997. [CAN 96] C ANUS C., L ÉVY V ÉHEL J., “Change detection in sequences of images by multifractal analysis”, in Proc. ICASSP-96, May 7-10, Atlanta, 1996. [DEV 92] D EVORE R.A., L UCIER B., “Fast wavelet techniques for near optimal image processing”, IEEE Military Communications Conference, vol. 2, no. 12, 1992. [POP 88] D EVORE R.A., P OPOV V.A., “Interpolation of Besov spaces”, Transactions of the American Mathematical Society, vol. 305, no. 1, p. 397–414, 1988. [DO 01] D O M.T., Z AHOUANI H., Frottement pneumatique/chaussée, influence de la microtexture des surfaces de chaussée, JIFT, 2001. [DON 94] D ONOHO D.L., “De-noising by soft-thresholding”, IEEE Trans. Inf. Theory, vol. 41, no. 3, p. 613–627, 1994. [ECH 07] E CHELARD A., “Analyse 2-microlocale et application au débruitage”, PhD Thesis, University of Nantes, December 2007. [FracLab] FracLab: a software toolbox http://apis.saclay.inria.fr/FracLab/.
for
fractal
processing
of
signals,
[GOL 02] G OLDBERGER A.L., A MARAL L.A.N, H AUSDORFF J.M, I VANOV P.C., P ENG C.K., S TANLEY H.E., “Fractal dynamics in physiology: alterations with disease and aging”, PNAS, vol. 99, p. 2466–2472, 2002. [GUG 98] G UGLIELMI M., L ÉVY V ÉHEL J., Analysis and simulation of road profile by means of fractal model, Conference on Advances in Vehicle Control and Safety (AVCS 98), Amiens, 1998. [HAR 98] H ÄRDLE W., K ERKYACHARIAN G., P ICARD D., T SYBAKOV A., Wavelets, Approximation and Statistical Applications, Lecture Notes in Statistics, Springer, 1998.
410
Scaling, Fractals and Wavelets
[HEI 97] H EINRICH G., “Hysteresis friction of sliding rubbers on rough and fractal surfaces”, Rubber Chemistry and Technology, vol. 70, no.1, p. 1–14, 1997. [IVA 99] I VANOV C., A MARAL L.A.N., G OLDBERGER A.L., H AVLIN S., ROSENBLUM M.G., S TRUZIK Z.R., S TANLEY H.E., “Multifractality in human heartbeat dynamics”, Nature vol. 399, June 1999. [JAF 00] JAFFARD S., “On lacunary wavelet series”, The An. of Appl. Prob., vol. 10, no. 1, p. 313–319, 2000. [JAF 04] JAFFARD S., “Wavelet techniques in multifractal analysis, fractal geometry and applications: a jubilee of Benoit Mandelbrot”, in L APIDUS M., VAN F RANKENHUIJSEN M. (Eds.), Proceedings of Symposia in Pure Mathematics (AMS), vol. 72, part 2, p. 91–151, 2004. [KLU 00] K LÜPPEL M., H EINRICH G., “Rubber friction on self-affine road tracks”, Rubber Chemistry Technology, vol. 73, no. 4, p. 578–606, 2000. [LEG 04a] L EGRAND P., L ÉVY V ÉHEL J., D O M.-T., “Fractal properties and characterization of road profiles”, in N OVAK M. (Ed.), Thinking in Patterns: Fractals and Related Phenomena in Nature, World Scientific, New Jersey, p. 189–198, 2004. [LEG 04b] L EGRAND P., Débruitage et interpolation par analyse de la régularité Hölderienne. Application à la modélisation du frottement pneumatique/chaussée, PhD Thesis, University of Nantes, Ecole Centrale de Nantes, December 2004. [LEP 90] L EPSKII O., “On a problem of adaptative estimation in Gaussian white noise”, Theory Prob. Appl., vol. 35, pp. 454–466, 1990. [LEP 91] L EPSKII O., “Asymptotically minimax adaptative estimation I: upper bounds. Optimal adaptive estimates”, Theory Prob. Appl., vol. 36, p. 682–697, 1991. [LEP 92] L EPSKII O., “Asymptotically minimax adaptative estimation II: statistical models without optimal adaptation. Adaptive estimates”, Theory Prob. Appl., vol. 37, pp. 433–468, 1992. [LEV 95] L ÉVY V ÉHEL J., “Fractal approaches in signal processing”, Fractals, vol. 3, no. 4, p. 755–775, 1995. [LEV 98] L ÉVY V ÉHEL , J., “Introduction to the multifractal analysis of images”, in F ISHER Y. (Ed.), Fractal Image Encoding and Analysis, Springer Verlag, 1998. [LEV 03] L ÉVY V ÉHEL J., L EGRAND P., “Bayesian multifractal signal denoising”, IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP03), Hong Kong, April 6-10, 2003. [LEV 04a] L ÉVY V ÉHEL J., L EGRAND P., “Signal and image processing with FracLab, FRACTAL04”, Complexity and Fractals in Nature, 8th International Multidisciplinary Conference, Vancouver, Canada, April 4-7, 2004. [LEV 04b] L ÉVY V ÉHEL J., S EURET S., “The 2-microlocal formalism”, Fractal Geometry and Applications: A Jubilee of Benoit Mandelbrot, Proc. Sympos. Pure Math, 2004. [LEV 06] L ÉVY V ÉHEL J., L EGRAND P., “Hölderian regularity-based image interpolation”, IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP06), May 14-19, 2006, Toulouse, France.
Local Regularity and Multifractal Methods for Image and Signal Analysis
411
[MEY 03] M EYERS M., S TIEDL O., K ERMAN B., “Discrimination by multifractal spectrum estimation of human heartbeat interval dynamics”, Fractals, vol. 11, part 2, p. 195–204, 2003. [MUL 94] M ULLER J., “Characterization of the North Sea chalk by multifractal analysis”, Journal of Geophysical Research, vol. 99, p. 7275–7280, 1994. [PEL 95] P ELTIER R.F., L ÉVY V ÉHEL J., Multifractionnal Brownian Motion: definition and preliminary results, tech. report INRIA, No. 2645, 1995. [PEE 76] P EETRE J., New Thoughts on Besov Spaces, Duke University Mathematics Series, Duke University Press, Durham, 1976. [PET 99] P ETERS , R., “The fractal dimension of atrial fibrillation: a new method to predict left atrial dimension from the surface electrocardiogram”, Cardiology, vol. 92, no. 1, p. 17–20, 1999. [PHY] www.physionet.org. [RAD 94] R ADO Z., A study of road texture and its relationship to friction, PhD Thesis, Pennsylvania State University, 1994. [SAU 97] S AUCIER A., H USEBY O., M ULLER J., “Electrical texture characterization of dipmeter microresistivity signals using multifractal analysis”, Journal of Geophysical Research, vol. 102, p. 10327–10337, 1997. [SAU 99] S AUCIER A., M ULLER J., “A generalization of multifractal analysis based on polynomial expansions of the generating function”, in D EKKING M., L EVY V EHEL J., L UTTON E., T RICOT C. (Eds.), Fractals: Theory and Applications in Engineering, Springer-Verlag, London, p. 81–91, 1999. [SER 87] S ERRA O., “Sedimentological applications of wireline logs to reservoir studies”, in G RAHAM , T. (Ed.), Proceedings of the North Sea Oil and Gas Reservoirs. Norwell, Massachusetts, p. 277–299, 1987. [TRI 95] T RICOT C., Curves and Fractal Dimension, Springer-Verlag, 1995. [TU 04] T U C., Z ENG Y., YANG X., “Nonlinear processing and analysis of ECG data”, Technology and Health Care vol. 12, no. 1, p. 1–9, 2004. [TUR 02] T URIEL A., DEL P OZO A., “Reconstructing images from their most singular manifold”, IEEE. Trans. Image Processing, vol. 11, p. 345–350, 2002. [VID 99] V IDAKOVIC B., Statistical Modeling by Wavelets, Wiley, 1999.
This page intentionally left blank
Chapter 12
Scale Invariance in Computer Network Traffic
12.1. Teletraffic – a new natural phenomenon 12.1.1. A phenomenon of scales From the appearance of the first telephone exchanges – before the beginning of the 19th century – we have spoken of a kind of current, which, whether it be realized through electrons or light, consists fundamentally of a flow of information, that we call teletraffic. Like its cousin, vehicular traffic, this current is a phenomenon encompassing calm and fluctuating episodes, predictable rhythms and unpleasant surprises, with periods of free circulation, but also of monumental traffic jams that bear witness to the limited size both of carriageways and intersections. What is more remarkable however is the fact that, in common with numerous natural phenomena such as thermodynamics, and despite its man-made nature, teletraffic is in practice too complicated to be fully understood, even though such an understanding is theoretically possible. On the contrary, it is necessary to observe it, and to know how to neglect peripheral details in order to better understand its core nature. Moreover, it is in most cases necessary to impose a statistical framework on what is, at a more fundamental level, a deterministic entity. We are led, finally, to treat this object – which is engineered, highly structured and ordered, even managed – as if it were a true natural phenomenon, and to seek to understand it as if it were created by unknown processes, and equipped with an obscure and unfamiliar nature. The complexity of the phenomenon is not due solely to the size of contemporary networks, though this is clearly an important factor. Already in traditional telephone
Chapter written by Darryl V EITCH.
414
Scaling, Fractals and Wavelets
networks, highly developed though dedicated to the transmission of voices, there were significant heterogenities in the circulation of traffic. In terms of temporal irregularities, we can include the diurnal nature of human activity, and other factors which, while equally prosaic, are not without importance, like the end of the working week, lunch periods and even coffee breaks. Traffic also displays spatial heterogenity, an effect measured not in kilometers but in the degree of multiplexing, by which we understand the degree of call mixing, which increases towards the heart of the network where links grow thicker, like tributaries merging downstream. The higher the degree of multiplexing, the more the traffic character becomes removed from that of individual calls, showing that heterogenity is found not only in the topology of links but also in the distribution of their size or capacity. Multiplexing is closely associated with the hierarchical nature of the network, which can itself take different forms. To these fundamental effects we can add, ever since the arrival of the facsimile, the complications resulting from different services provided over the same network. However, fax traffic is quite simple. It differs from telephone calls only in the specific geographical distribution of the fax machines, and by the characteristic durations of fax transmissions. In contrast, other services provided over more modern networks, which freely combine images, computer files or data in the broadest sense, and which, moreover, increasingly allow user mobility, produce a very rich traffic mixture the individual characteristics of which vary widely. When we add to this the growth in use of all kinds of computing devices, computers and telephones being the obvious examples today, coupled with the astonishing prospect of their quasi-universal connection to a single global network, the full complexity of the teletraffic phenomenon is strikingly revealed. Apart from the factors of a “structural” nature mentioned above, there is another central point of particular interest here. It is the presence in traffic of a very significant range of times scales, in the sense of many orders of magnitude. This property constitutes the necessary basis of any scaling laws which may possibly govern traffic. In the next section, we will see that such laws in fact exist. Even if we ignore very large scale effects, such as seasonal cycles and the traffic evolution itself, the number of scales involved is very impressive. If we go from the traditional timescale of a day to the millisecond, being the transmission time of a thousand bytes across the world, then to the microsecond necessary to transfer them over a high speed local area network (say at 1 Gbit per second), we cover 23 octaves1, or 23 orders of magnitude: teletraffic is indeed a phenomenon of scales. Rarely do we find a quantity, of any kind, spread across such a broad range. What is even more significant is that this range will continue to grow, with the inevitable improvements, closely following those of computing, in the bandwidth of connections and the capacity of switches, and all this without any limit known to science.
1. It is convenient to take logarithms to the base 2, hence octave = log2 (scale).
Scale Invariance in Computer Network Traffic
415
12.1.2. An experimental science of “man-made atoms” If the properties of teletraffic generally need to be studied and discovered, those of their underlying components, the sources, the atoms of the network gas, are available to us. These components can be studied closely, modified or even replaced. The three principal parameters of a source are its duration, its average rate and its product, the total quantity of data sent during a “transfer”. Instead of “transfer”, we can talk about a “session” or “call”, flexible terms which vary according to the context. Often it is necessary to go beyond these three parameters, which give only a rough idea of the nature of traffic, in order to consider the internal structure of sources, which defines their character. Taking an example from telephony, audio, we notice that in conversations there are many silences where there is no useful information to be transmitted, which immediately gives an idea of the jerky or bursty nature of this traffic. The details of these periods of silence and activity constitute the particular nature of burstiness in telephony. On the other hand, we imagine that the transmission of a computerized text will be accomplished in a uniform way, at as high a speed as possible given the capacity of the available connection. In this latter case, traffic does not have a clear structure. It is the capacity of the network itself, which is not homogenous in time or transmission resources, which leads to its irregular behavior. Thus, the concept of source cannot be defined so easily. To emphasize this point, let us consider, for example, a certain number of sources of the same type carried over a local area network, whose traffic pass by a multiplexer, that is, a traffic concentrator, which alleviates traffic congestion in the network. The output of this multiplexer, which is, from the point of view of the original sources, a superposition, constitutes a source for a multiplexer or a switch located downstream, at a “regional” level in the hierarchy of the total network. Sources can be divided into two main categories based on their sensitivity to transmission delays. Among those which are not sensitive, that is “data” sources, we include computer files, images and sound recordings. The complementary class, of “real-time” type, is sensitive to delays and includes video transmission, distributed games, and real-time systems in general, including audio (telephony). It should be clarified that by “delay” we mean the variable delays suffered during transmission and not the total duration of transfer. It is essential, for example, in the case of audio that the pieces into which words are divided arrive in time to be rebuilt without any audible gap, whereas for a simple data transfer, only the total transfer duration matters, which amounts to the average data rate. On the other hand, as far as the reliability of reception is concerned, the sensitivities are reversed. It is the flow corresponding to audio (for example) which can tolerate the loss of a certain percentage of data without sacrificing adequate quality, whereas for sources of “data” type, we require a loss rate equal to zero – corrupted files are not acceptable!
416
Scaling, Fractals and Wavelets
12.1.3. A random current Even if sources can be individually described in a detailed, or even exact way (in principle), we generally take a stochastic point of view, and thus models proposed for traffic and traffic superpositions are random processes. The main reasons for this choice are the impossibility of managing within a deterministic framework the innumerable factors and details which intervene in practice, and to take into account the unpredictability of traffic. For example, the fact that the length of calls and the precise arrival times of new connections are not known in advance demands the adoption of a probabilistic approach. Such models can be defined in discrete or continuous time, according to the interpretation which is given to the flow, the time scale considered, and the questions that are of interest. The choice also depends on the analytic tools available, which can favor one approach or other for technical reasons. For example, let us take a process X(k) that models audio traffic in discrete time, giving the number of bytes to be transmitted in interval k of width 40 ms. This time series corresponds to a traffic rate on a discrete time grid whose resolution is close to an established mechanism of audio transmission, namely the packet based TCP/IP (we revisit this shortly). It also constitutes a traffic representation based on a fairly fine time scale. Beginning from a rate series X(k), the nature of traffic is described by 2 (k) = statistical quantities, starting with the mean μX (k) = E[X(k)], variance σX E[X(k) − μk ]2 (assuming it exists, which is not always the core), or even the entire distribution function X(k) for every k = 1, 2, 3, etc. Regarding its temporal structure, the most important quantity is the auto-correlation function, defined by cX (k, l) = E[(X(k) − μ(k))(X(l) − μ(l))]/σX (k)σX (l). Typically, we assume that a flow is stationary, that is its statistics do not depend on the time of origin, 2 2 (k) = σX and cX (k, l) = c ((τ = k − l)) = we therefore obtain μ(k) = μ, σX 2 E[(X(k) − μ)(X(l) − μ)]/σX . In this context, let us now examine the nature of a traffic superposition as mentioned earlier. Let two processes X1 and X2 have the same (statistical) characteristics as X, in other words, they are identical but independent copies of X. Assuming that their superposition reduces to a simple addition, from the superposition Y (k) = X1 (k) + X2 (k) we can easily show that the 2 2 2 (k) + σX (k) = 2σX mean μY = μX1 + μX2 = 2μX and variance σY2 (k) = σX 1 2 behave in a linear fashion, but on the other hand that the structure of covariances remains constant: c= ((τ )) cX (τ ). Although the covariance amplitude (the variance) increased, the ratio of the size of variations to the mean, σ/μ, is reduced by a factor √ of 2. The simple fact that multiplexing two independent flows results in another flow of reduced variability explains why larger links are more effective; for the same probability of traffic congestion, they can transport traffic with a mean rate which is closer to their capacity: this is the well known multiplexing gain. However, when it comes to peak rates, if X1 and X2 are transported by links of capacity C which supply a concentrator, the latter’s output must have a capacity of at least 2C to ensure a zero loss rate.
Scale Invariance in Computer Network Traffic
417
In teletraffic, we are often concerned with the irregular character of traffic, as it has been known for a long time, thanks to queueing theory (a major tool for the modeling of multiplexers), that the burstier a flow is the larger the queue content. In this probabilistic framework, the question is how to re-examine what “bursty” traffic means. We will answer this key question later, but it is appropriate to note here that both the time structure, described by cX (k), and the probability distribution play a part. Often, the distribution is taken as Gaussian, in which case only the mean and variance count, but reality frequently forces us far from this convenient choice. 12.1.4. Two fundamental approaches Until now, we have not dealt in detail with data transmission, which is a complex subject in itself, incorporating numerous logical and physical layers built on top of each other. However, it is hardly possible to have a clear vision of what teletraffic is without talking about certain essential aspects – such as the paradigms of circuit switching and packet switching and the idea of protocols and their hierarchical organization. Of course, the physical level, that is, the electronic or optical functioning of systems, does not concern us, though it an integral part of this system hierarchy. To know more about these core concepts of telecommunications engineering, consult, for example, [TAN 88]. In order to introduce circuit switching, let us examine a telephone network during the age of switchboards. When a call was accepted and connected, an actual line of copper wire, a circuit, connected the parties. Throughout the duration of the call, they enjoyed full use of the capacity of this circuit, even during silent periods when they had no need of it. Furthermore, the circuit bandwidth was at their disposal regardless of the state of other circuits. In more modern networks, we can no longer trace a physical circuit so explicitly, due to the diversity of technologies developed for the physical layer, although the essential characteristics of circuit switching remain the same. Let us carry this a little further. The idea of dividing the total bandwidth into equal portions, or circuits, leads to link characteristics which are predictable and constant. Once a call is accepted, there is no doubt about the infrastructure provided: a fixed quantity of bandwidth capable of being used by a service designed to function under such conditions. If, on the contrary, the system is overloaded then there is no available circuit, which is an easily verifiable situation without any ambiguity. Therefore, this system is simple in principle, but suffers from the disadvantage of wasting resources; it is entirely possible that the bandwidth is highly under-used even though most or even all of the circuits are occupied. To address the above disadvantage, packet switching proposes a sharing of resources whereby all flows are subdivided and the pieces sent independently over the same connection, without any detailed reservation. Each packet is separately transported without any direct link to its brothers from the same connection. We rely on the principle of statistic multiplexing described earlier to reduce the variability of
418
Scaling, Fractals and Wavelets
this unmanaged superposition, thereby allowing a higher number of simultaneous calls. The chief disadvantage is the possibility that the capacity of the link might be reached, leading to a loss of packets. Furthermore, the decision to admit a new flow, to accept a new connection, is not a simple matter but depends on a statistical judgment of the probability of a future link overload. It is also necessary to plan how to manage packet losses, perhaps by retransmitting them, which would lead to more complicated systems of sending and receiving. Nevertheless, packet switching offers the advantage of high efficiency and also a vast range of services, such as telemetry, whose very low rate would not justify the allocation of an entire circuit, and the possibility, if a link is weakly loaded, of using a high proportion of the capacity to quickly transfer a huge file. Evidently, there are hybrid systems existing between these two extremes. For example, we can enrich circuit switching by means of priority classes to ensure, or at least to increase, the probability that certain traffic types (very costly ones, for example) always find circuits available. These two paradigms are the traditional conceptual extremes, which do not depend on the technology of the day but, on the contrary, will always play a role even if, as systems, technologies, services and profitability criteria vary, one of the two will still dominate. They can even exist simultaneously, at different levels in the protocol hierarchy. A protocol is a language and a system by which agents can communicate. In the case of packet switching, on the one hand it relates to the fact that each packet consists of two parts, the header containing, among other things, the destination and origin addresses, and the payload where the actual data resides. On the other hand it relates to the computing infrastructure (software or hardware) that understands this structure and language to the bit level. However, it is often true that a single language is not sufficient; this is certainly the case when a connection traverses different networks and therefore different technologies and protocols, as for example with the Internet. Moreover, a system in which a single language is expected to manage the demands of all the functions and constraints of each level in the hierarchy would be unworkable; the task is therefore broken up. An example of some importance is the relationship between the IP (Internet Protocol) and TCP (Transmission Control Protocol) in the functioning of today’s Internet. The former allows the transfer across different networks. The IP headers contain globally meaningful addresses, sufficient information to steer each packet independently, without any knowledge of their fellow packets or their purpose. At this level, a connection, that is, a flow established between two end points having certain properties such as reliable communication, does not exist. Since it is often necessary to guarantee the arrival of all packets, it is essential to have another mechanism that carries the connection concept, capable of verifying the safe arrival of packets and of managing any necessary retransmissions. This is the role of TCP, in which packets containing their own headers and data payloads are transported inside the IP packet payload. Generally, there are several such layers which provide the link between “high level” applications and the technological, and then physical, layers. To know more about the TCP/IP tag-team, the reader can refer to [STE 94].
Scale Invariance in Computer Network Traffic
419
12.2. From a wealth of scales arise scaling laws 12.2.1. First discoveries During the first decades of its evolution, traffic modeling grew considerably and came to constitute, towards the end of the 1980s, a diverse and abundant literature. In parallel, the field of network performance analysis, essentially a branch of queueing theory, grew in conjunction with this work, unveiling the positive and/or negative effects of each new model. The advent of packet switching stimulated new developments and founded new classes of models, leading to a significant broadening in the class of systems capable of being treated. However, this corpus of knowledge remained deeply rooted in the intuitive and technical foundation developed for circuit switching, a thoroughly mastered field where models enjoyed a real and acknowledged success. In fact, there was even a sense that we already knew this new concept of packet traffic, so much so that despite major changes in network structure, we hardly felt the need to compare model predictions with reality. It was noted [PAW 88] that among several thousand papers published between 1966 and 1987 dealing with performance analysis, around only 50 dealt with actual network measurements. In the defence of the research community of the time, it is important to note that it was far from easy to obtain measurements, particularly high quality measurements. In general, collection systems lost packets and could only operate at low time resolution. On the other hand, it must be recognized that it was, above all, tractability concerns which guided researchers towards models which did not necessarily have any solid link with the actual properties of the data they were supposed to describe. It has long been taken as a given, and no less so today, that from the statistical point of view the fundamental characteristic of packet traffic is its bursty nature. By this we refer implicitly to a comparison with traditional telephonic traffic, and more precisely, to a Poisson process, the keystone model of the circuit switching world. Let us recall some of the basic properties of this point process. Let N (t) be the number of points falling within the interval [0, t]. The fact that the variance to mean ratio of this quantity is equal to 1 point for the mild variability of this process. As far as its time dependent structure is concerned, recall that the Poisson process is the traditional example of a memoryless process: the conditional probability density of the number of points in (t1 , t2 ] given the past history, is the same as the unconditional density which takes account of neither the number of points N (t1 ) in [0, t1 ] nor of their positions. Compared with this timid standard, the nature of a more boisterous traffic seemed clear in the minds of traffic engineers; it is sufficient to expect high variance/average ratios, maybe equal to 2 or even 3 and to replace exponentially decreasing covariance functions by others whose correlations are larger, but still short range, naturally! Such generalizations, often in the form of Markov chains, were developed and were taken to constitute an adequate responses to the question: “what is bursty traffic?” At the same time, however, the signs of a forthcoming revolution were already on the horizon.
420
Scaling, Fractals and Wavelets
In a 1988 survey paper, Ramaswami [RAM 88] spoke about the future of traffic modeling. He emphasized the inadequacy of traditional approaches, particularly the real danger of neglecting certain deterministic traffic features which are poorly modeled by accepted approaches, yet influential on performance. An example of such an effect is that one loss can provoke a series of losses. In the context of asymptotically stationary models, he also mentioned very long transient periods. Through these observations we can glimpse an emerging appreciation of the fact that the high variability of real traffic, as well as the far-reaching effects of “deterministic” events, must be modeled and can have a serious impact, and we even touch on the idea of non-stationary modeling. The author also spoke of the necessity of defining both Quality of Service metrics, and flow statistics, that were no longer universal but relative to time scale. This time we recognize an understanding of the fact that simple mean measures are too coarse for traffic which has access to, and uses, a wide range of scales. In 1991, Hellstern, Wirth, Yan, and Hoeflin [MEI 91] analyzed ISDN (Integrated Service Digital Network) data from a low speed packet network. They observed strong variability which was difficult to explain using traditional models. On the other hand, they successfully used a model with three components of which two contained unusual, if not unheard of statistical ingredients for the field, namely infinite variances and even means. Such discoveries clearly showed that reality had many things to say regarding the nature of bursty traffic. Also in 1991, Leland and Wilson [LEL 91], in a study of Ethernet traffic, the well-known local network technology (at that time operating at 10 Mbit per second), reported astonishing discoveries. They were obtained from the analysis of a dataset which was exceptional from the point of view of its high resolution as well as its length and reliability. Essentially, instead of finding a preferential time scale that could serve as a basis for a definition of burstiness, as in the Poisson case, they observed a chaotic behavior over 20 orders of magnitude; in other words, burstiness at all time scales! We could not find a result more in contradiction with the spirit, and even the hopes, of traditional modeling, without abandoning stationarity itself. 12.2.2. Laws reign With the appearance of the celebrated article by Leland, Taqqu, Willinger and Wilson [LEL 93] in 1993 (see also the review article [LEL 94] and also [ERR 93]), this new, poorly understood behavior finally made sense and at the same time was given a name: scale invariance. It happened that these “bizarres” properties had already been met elsewhere, and even constituted a canonical mode of behavior in nature, and a modeling paradigm well known to science in physics, biology, chemistry, geology and, under the name of fractal geometry, in mathematics. To this apparently complex phenomenon, described above, corresponded a corpus of knowledge capable of describing and characterizing it: fractal traffic was born. Using the same system of Ethernet data collection as that of Leland and Wilson [LEL 91], large measurement traces were taken, some of which subsequently
Scale Invariance in Computer Network Traffic
421
became freely available and formed unofficial reference datasets. Part of one of these public series, “pAug”, appears in Figure 12.1. From the data a time series X(k) was extracted, corresponding to the number of bytes crossing the network during intervals of width δ = 12 ms. We denote by X (m) the series X averaged over blocks of length m, a procedure called aggregation of level m, for example, X (3) (1) = (X(1) + X(2) + X(3))/3. Using this smoothing operator, an illustration as simple as it is fundamental to scale invariance is presented in Figure 12.1. From top to bottom, we trace the first 512 points of four series X(k) = X (1) (k), X (8) (k), X (64) (k) and X (512) , with δ which varies from δ = 12 ms to δ = 12 × 8 × 8 × 8 ms, or 6, 144 s. Apart from a reduction of variance, fortuitously compensated for by the automated setting of the graph scale, these three series present a quasi-uniform statistical face. A smoothing of this kind, across nine orders of magnitude, if carried out on short memory data, would have revealed a strong evolution towards a constant traffic rate. =12ms 10000 5000 0
50
100
150
200
250
300
350
400
450
500
450
500
=12 * 8 ms
8000 6000 4000 2000 0
50
100
150
200
250
300
350
400
=12 * 8 * 8 ms
6000 4000 2000 0
50
100
150
200
250
300
350
400
450
500
=12 * 8 * 8 *8 ms
4000 2000 0
50
100
150
200
250
300
350
400
450
500
Figure 12.1. Visual demonstration of scale invariance. Each plot shows the first 512 points, in bytes per bin, of “pAug” Ethernet data. From top to bottom, the resolution passes from δ = 12 ms to δ = 6.144 s. The visual appearance of variability remains the same; no smoothing effects are seen
This kind of scale invariance takes its pure form in the canonical process called fractional Gaussian noise (FGN) Z(t). This discrete time stationary process, which satisfies E[Z(t)] = 0, has a parametric correlation function with a single parameter,
422
Scaling, Fractals and Wavelets
the exponent β, 0 < β < 1: 1 (τ + 1)2−β − 2τ 2−β + (τ − 1)2−β 2 1 ∼ (1 − β)(2 − β)τ −β ≡ c∗γ τ −β 2
c∗Z (τ ) =
(12.1)
where τ = 1, 2, 3, and the asymptotic relation is valid for large τ . The function c∗Z (τ ) is a fixed point of the aggregation operator, that is, each of the Z (m) have this same correlation function. This perfect second order invariance is closely related to the power law decrease of c∗Z (τ ), a decrease so slow that the correlation sum ∞ ∗ τ =1 cZ (τ ) is infinite. Moreover, if a process is not a fractional Gaussian noise, but has an asymptotic decrease of the same form: cX (k) ∼ cγ k −β ,
0<β<1
(12.2)
then under aggregation the correlation functions of X (m) tend towards c∗Z (τ ). Thus, such processes form a second order asymptotically self-similar class. We say that they exhibit long-range dependence, giving, at second order, a precise meaning to the long memory concept. If, on the other hand, we were to examine the effect of successive aggregations on exponentially decreasing correlations, for which the sum of the correlations is finite, we would converge towards the trivial fixed point, white noise. A property equivalent to definition (12.2) [COX 84], sometimes named slowly decreasing variance, arises from the fact that v (m) = var[X (m) ] ∼
2cγ m−β (1 − β)(2 − β)
(12.3)
which should be compared against the case of an exponential decrease, for which v (m) goes as O(1/m). Such a slow decrease explains the visible difference between the four variance to mean ratios of Figure 12.1. Aggregation implies a normalization of 1/m as it is based on taking means, whereas m−β is the factor necessary to exactly compensate for the (asymptotic) invariance present. For the specific case of FGN, we have v (m) = m−β for each m 1. Note that we can also interpret v (m) as the asymptotic variance of an estimate of the mean of X, which is larger for a process with long-range dependence and depends on both β and cγ , but not on σX . These two fractal properties, long-range dependence and the slow decrease of variance, were rapidly verified in Ethernet networks across the world. In 1994, Duffy, et al. [DUF 94] also observed them in SS7 traffic (a signaling protocol), collected from a packet switched network CCSN (Common Channel Signaling Network) which is used to control other networks. In 1994, Paxson and Floyd, in the aptly named “The failure of Poisson modeling” [PAX 94b] (see also [PAX 94a]) showed that in
Scale Invariance in Computer Network Traffic
423
traffics that circulate between regional networks, in other words on the Internet, we often find scale invariance, and almost never find traffic obeying the traditional rules. Certain types of digitized video traffic also exhibited the phenomenon (see Beran et al. [BER 95]), though over a less spectacular range than for Ethernet. In 1995, Willinger et al. [WIL 95] returned to Ethernet data for a more refined analysis, examining not only the traffic itself but also its components, and once again found scale invariance. We will consider this example and its theoretical offshoots in section 12.3. Numerous discoveries followed (for a more complete list, see the bibliography in [WIL 96]) and almost always scale invariance was found, as well as heavy tails for many quantities such as the length of bursts. By this qualifier “heavy”, we understand a slow decrease in probability density at infinity, which gives rise to infinite variances or even means. Despite the weight of abundant empirical evidence, resistance against this new wave in traffic was not slow in showing itself. Essentially, this resistance was divided between those who preferred to believe that the evidence was in fact merely the artifact of corruptive non-stationarities, and others who thought that a continuum of scales could be effectively approximated by a sufficiently large number of discrete scales, obviating any need to talk about fractals. Although it is true that, in many cases, such objections were merely a reflex against the shock of new ideas, it is nevertheless true that non-stationarities can very well be confused with the increased variability of scale invariant processes. One of the first methods used to detect scale invariance and to estimate its exponent β was based on equation (12.3). From X (m) , we estimate v (m) using the standard variance estimator, and then plot its logarithm against log(m), the time-variance plot. The slope in this plot corresponds to an estimate of β. This method, although simple and with a low calculation cost, suffers from many statistical defects: a notable bias, a quite large variance, and in particular a poor robustness with respect to non-stationarity. This last defect is shared, at least partly, by many other methods (see [TAQ 95] for a comparison of several methods). The need to measure β in a reliable way stimulated further contributions to the already sizable statistical literature. In fact, the high volume of “teletraffic” data excluded the use of the estimators typically in use, which had good statistical performance, but very high computational costs. A semi-parametric method based on wavelets, introduced into the field by Abry and Veitch [ABR 98], solved these problems thanks to its low complexity, only O(n), without sacrificing robustness, and excellent statistical performance due to the natural match between wavelet bases and scale invariant processes. Wavelet analysis operates jointly in time and scale. It replaces the time signal X(t) with a set of coefficients, the details dX (j, k), j, k ∈ Z, where 2j denotes the scale, and 2j k the instant, around which the analysis is carried out. In the wavelet domain, equation (12.3) is replaced with var[dX (j, k)] = cf C 2(1−β)j , where the role of m is played by scale, of which j is the logarithm, cf is the frequency analog of cγ and is proportional to it, and where C is independent of j. The analog of the variance-time plot, the graph of the estimates of log(var[dX (j, ·)]) against j, is called the logscale diagram and constitutes a spectrum estimate of the process, where low frequencies correspond to large scales (on the
424
Scaling, Fractals and Wavelets Logscale Diagram, N=2
[ (j1,j2)= (3,15) Q= 0.011384 ]
Logscale Diagram, N=2
30
29
29
28
28
27
27 y
y
j
31
30
j
31
[ (j1,j2)= (3,15) Q= 0.011384 ]
26
26
25
25
24
24
23
23
22
22 2
4
6
8 Octave j
10
12
14
2
4
6
8 Octave j
10
12
14
Figure 12.2. Wavelet analysis of scale invariance. The Logscale diagram, an estimated “wavelet spectrum” (in log-log coordinates), is shown for the data. Left: Ethernet “pAug”. The slope gives an estimate of 1 − β with good properties, which is reliable from j = 3. We obtain 1 − β = 0.59 ± 0.01. Right: the number of new TCP connections in 10 ms bins on an Internet link. We see two scale ranges, from j = 1 to 8 and j = 8 to 19. Daubechies 2 wavelets were used
right side in Figure 12.2). For more details on this method, its use and robustness, see [ABR 00, VEI 99]. In the left plot of Figure 12.2, the wavelet method is applied to “pAug”. We see a general alignment across all scales in this time series of length n = 218 which, starting from j = 3, justifies the estimate of the slope (a weighted regression is used which gives more weight to small scales where there is more data, as indicated by the confidence intervals displayed). Thus, the self-similarity visible in Figure 12.1 can be objectively revealed and qualitatively measured. 12.2.3. Beyond the revolution Despite considerable resistance, the concept of the fractality spread quickly. In fact, after a short span of time, as frequently as resistance was found, the opposite attitude was also encountered, that is, the idea that we had only to measure the value β to capture the essence of fractal traffic. In fact, a model based on relation (12.3), comprising only three parameters, μX , σX and β, cannot claim, except in particular cases, to describe the essence of an object as rich as traffic. Even if we are ready to accept a Gaussian hypothesis, which eliminates the need for dealing with statistics of orders higher than two, models that allow more flexibility are required. For example, it is necessary to think about the constant cγ of equation (12.2), which gives the “size” of the law of which β expresses the nature. There is no reason, a priori, to assume that its value is equal to that of standard fractional Gaussian noise, where it takes
Scale Invariance in Computer Network Traffic
425
the parametric form c∗γ = 12 (1 − β)(2 − β). The same comments are valid for short range correlations, which are very important for sources such as video. In order for a traffic which presents long-range dependence but which also notably deviates from fractional Gaussian noise to be well-modeled by the latter, it is necessary to have a high value of m; however, the time scales thereby neglected can have significant impacts on performance. In 1997, Lévy Véhel and Riedi [LEV 97] observed that in TCP data, not only can the behavior over small scales be far from that of a fractional Gaussian noise, but we can even find another form of scale invariance, that of multifractality. These observations were confirmed on other TCP data by Feldman et al. [FEL 98]. In Figure 12.2, the plot on the right side shows a second order analysis of a time series which corresponds to the number of new TCP connections arriving in bins of width 10 ms. In addition to the slope, on the right, corresponding to long-range dependence, we observe a second slope at small scales, a second zone in which an invariance lies. However, multifractality means much more than a simple fact that there are two different regimes, each with its own invariance. To understand this difference, let us imagine that we defined, not a second order logscale diagram, on the basis of quantities |dX (j, k)|2 , but, in a similar manner, a logscale diagram at q th order based on |dX (j, k)|q , from which we estimated, over the same range of scales, an exponent βq . In the case of fractional Gaussian noise, though each of the {βq } (here positive real q are taken) are different, they are connected in a simple, linear way; essentially there is only one “true” exponent. On the other hand, in the multifractal case the {βq } enjoy considerable freedom. The invariance is fundamentally different for each different moment, and therefore it is necessary to know the entire spectrum of exponents, instead of just one, to fully understand the nature of the invariance present. A detailed treatment of multifractals is beyond the scope of this chapter (see [RIE 95, RIE 99], Chapter 1, Chapter 3 and Chapter 4 for a more in-depth analysis) but it is nevertheless relevant to describe a simple example: a deterministic multiplicative cascade. We begin with a unit mass, uniformly spread out over [0, 1]. We then distribute a fraction p ∈ (0, 1) of the mass on the first half [0, 12 ] and the remainder on ( 12 , 1]. The total mass is thus preserved and we say that the cascade is conservative. Now, let us repeat this procedure to obtain masses {p2 , p(1 − p), p(1 − p), (1 − p)2 } on the quarters of the interval, then {p3 , p2 (1 − p), p2 (1 − p), p(1 − p)2 , p2 (1 − p), p(1 − p)2 , p(1 − p)2 , (1 − p)3 } on the eighths, and so on. This procedure, repeated indefinitely, creates a mass distribution, that is a measure, which is singular: it is multifractal. Less rigid and random variants can be easily defined and taken to be traffic models by normalizing [0, 1] to the duration [0, t], and by regulating the number of iterations which controls the time resolution δ. The singularity and positivity of these measures are apt to describe the astonishing variability observed in small scale traffic, where a Gaussian hypothesis is far from reasonable. Thus, multifractal modeling offers the hope of venturing into the difficult
426
Scaling, Fractals and Wavelets
terrain of non-Gaussianity, with the strong structured support of scale invariance as a conceptual and mathematical tool. 12.3. Sources as the source of the laws 12.3.1. The sum or its parts Following the surprise generated by the discovery of fractal traffic, the underlying cause of the phenomenon soon became an issue. A relevant though elementary observation, already discussed in section 12.1, is that there is no shortage of the raw material, namely scales. A second observation is that there are certainly a great number of characteristic scales present in traffic, due either to perturbations of the network or inherent in the nature of the sources themselves. On the network side we have the queues inside switches, which control the finest scales, measured at the beginning of the 21st century in fractions of a microsecond. From there we move on to millisecond scales, strongly influenced by queues at the input and output of large switches, which average out the high frequency fluctuations of flows. Next, we find the scales of control mechanisms for admission and congestion, which have their own internal structures and associated time scales. On a much larger scale, say of 1 minute, we can cite the regular updating of address tables in switches, and finally, major changes of packet routes and link capacities. As for the sources, the picture is even richer. Each protocol in the hierarchy imposes its own characteristic scales, and then the nature of the traffic itself enters in: audio, video, files, and others. Finally, we must include scales that are associated with human activities, such as the frequency of telephone calls, the dynamics of hyperlink navigation in web pages, and patterns of working hours, to mention only a few. Thus, there is no lack of characteristic scales. We could even imagine that every scale is a characteristic scale for at least one traffic component. Unfortunately, even this does not in any way resolve the question of scale invariance we are concerned with, which deals with the relative structure of behavior at different scales, and not simply with the fact that effects exist at each scale. It must not be forgotten that traditional traffic also contains correlations at all scales even though their relative sizes, summarized in the form of cX , were such that a single one dominated. In contrast, with scale invariance they are connected to each other in a very particular way. To answer the question concerning the origin of this mysterious connection, two logical extremes come naturally to mind: scale invariance is found in every source and the traffic superposition is simply the inheritor of this fact, or it is an emergent property, either linearly – the whole is the sum of its parts – or non-linearly – the whole is more than the sum of its parts. Of course, the sources are combined and controlled by the network, the origin of the non-linear effects in question. However, this does not make the situation simple. Although it is the network which combines different flows, it is also the nature of other flows which, through the mechanisms provided by the network, largely controls the statistical evolution of a given flow. The network
Scale Invariance in Computer Network Traffic
427
is thus as much an agent as the cause of non-linear effects. As for the source, its characteristics are defined only in its broadest outlines by its underlying fundamental nature, real-time or not, etc. The majority of its characteristics are determined by the protocol (or protocols) which interpret it, and hence it is necessary to understand “source” as meaning the “fundamental data and protocol(s)”. In this section, we explore partial responses to these questions, starting with a description of the on/off source, which is a natural and commonly used model possessed of both theoretical and practical advantages. 12.3.2. The on/off paradigm The motivation of the idea is as physical as it is simple: a source alternates between periods of silence, where the rate is zero, and activity, where the rate is constant, say h. Such simplicity models a source which is highly spasmodic in that whenever it has anything to transmit, it sends it at its peak rate. This is a model which was originally aimed at modeling the “burst scale” situated at medium scales, a concept which is now rather old-fashioned. However, it remains true that its structure is not useful at very small scales, where the rate varies rapidly. In an on/off source, periods are generated by (positive) random variables which are mutually independent, distributed as a variable A for silences and B for activity. Thus, the transition points obey an alternating renewal process. Let E[B] = 1/μ and E[A] = 1/ν; it follows that average rate is λ = hν/(ν + μ). As for the process X(t), its 1D marginal is Bernoulli and we 2 = λ(1 − λ). Thanks to the structure of renewals, that of obtain E[X(t)] = λ and σX the correlations of X(t) is not difficult to understand: a link between instants t1 and t2 exists only if they fall within the same period (silence or burst); as such correlations are of short range, unless the probability of a long period is in itself large. Let A be a law of finite variance, but let B be one with a heavy tail: FB (t) ∼ hB t−α ,
t 1,
1<α<2
(12.4)
where FB is the complementary distribution function of B, that is FB (t) = P (B > t). The range of α corresponds to infinite variance for B and it is not difficult to verify [BRI 96] that it leads to long-range dependence for X(t), with parameters β = 3 − α,
cγ =
hB ν(1 − λ)3 (α − 1)
(12.5)
One of the principal reasons why the on/off approach, as well as others based on renewal processes [RYU 96], have been so often used is the ease with LRD can be introduced and controlled. This also brings a key advantage in terms of the generation of trajectories: we are not obliged to take the past into account in great detail. On the contrary, it is enough to draw samples from a random variable without variance, in an independent manner. The use of this method in Monte Carlo simulations is
428
Scaling, Fractals and Wavelets
thus widespread, although it suffers from a subtle, but significant, problem of slow convergence which is under-appreciated [ROU 99]. A variance which is infinite may appear unrealistic, and be seen as a weak device for generating long-range dependence, unrelated to empirical observations. How can we claim to observe an infinite variance when in practice one can only measure and handle finite quantities? The answer, like elsewhere in science, lies in the fact that a model does not claim absolute truth, but elegant utility. If when measuring the distribution of values of a quantity, we observe that they follow relation (12.4) across a wide range of t, up to and including the largest available, an infinite variance model becomes entirely relevant as an idealization. It was in this spirit that Cunha, Bestavros and Crovella declared in 1995 [CUN 95] that they observed heavy tails, infinite variances, in many characteristics of web documents, particularly their sizes. They observed this same property in the sizes of UNIX files, thus revealing an orebody rich in power laws, capable of contributing to the existence of on/off type sources with infinite variance. In [WIL 95], Willinger, Taqqu and Sherman went further in their analysis of local Ethernet traffic and also external Ethernet traffic, which consists of traffic offered to (and received from) the Internet. Not only did they observe evidence of on/off behavior in individual flows, defined as traffic flowing between unique emission and reception address pairs, but in most cases the estimated values of α were indeed well within the infinite variance range. 12.3.3. Chemistry In general, the addition of independent processes induces the addition of their temporal structures. More precisely, if the independent processes Xi (t) have γ Xi (τ ) as their covariance functions, then the covariance function of the process X = Xi is just γX (τ ) = γXi (τ ). It follows that the presence of long-range dependence in at least one of the Xi induces a superposition, with an exponent equal to the minimum of those of the long-range dependent components. This is reminiscent of rules governing the fractal dimension of a union of fractal sets [FAL 90]. If, for example, the Xi (t) are independent identically distributed (iid) copies of a LRD process of parameters (cγ , β), the parameters of the superposition are simply (cγ m, β). The persistence of long-range dependence applies in particular to a finite superposition of N identically distributed and independent on/off sources. As for infinite superpositions, models as interesting as they are significant emerge depending on the precise way in which the normalization is carried out. Let us initially examine a normalization relating to the instanteous rate h (during the on states), leaving the structure of individual sources untouched (notably ν and μ). By increasing N , we can show [KUR 96, TAQ 97] that there is convergence both in the distributional and weak senses to a Gaussian process. It is not surprising that,√to obtain this result, it is necessary to impose a normalization proportional to N after having first subtracted the mean. This limiting process has long memory and, by aggregating we
Scale Invariance in Computer Network Traffic
429
can obtain fractional Gaussian noise in a second limit operation, this time operating on time. This bond between on/off sources and the canonical scale invariant process will be explored further in the next section. Here we emphasize that the order of the limit operations, first in rate and then over time, is of central importance. If we try reversing these we obtain a completely different result, the Lévy flight, a stable process with stationary and independent increments which does not, for the moment, correspond to a model applicable to traffic. The interpretation of fractional Gaussian noise to be a combination of on/off sources reveals it to be an example of a process where the source of scale invariance lies in the linear primitive components themselves, to which a linear superposition does not add anything essential. On the other hand, there is another normalization which provides an example of where scale invariance is emergent. In this case, we leave the peak rate h constant, and lengthen the silent periods with N so as to maintain the total arrival rate constant. More precisely, we set ν = ρ/N , with fixed ρ, and obtain in the limit N → ∞ λ = hρ/μ and an arrival process of bursts that obeys a Poisson process of parameter ρ. In this limit, each source contributes only a single active period because silences before and after extend to infinity. The sole burst that remains to every source can be interpreted as the transfer of a single file at constant rate. The number of simultaneously active sources is described by a Poisson law of parameter ρ/μ. For finite N , as well as infinite limit N , the rate process has long memory. In contrast, for the limit process, long memory is no longer ascribable to individual sources but to heavy tailed distribution of the size of the transfers which remain individually constant, without any scale invariance of their own. An aggregate of such sources can be regarded as a random and independent model of file transfers across a network. The question of non-linear mixtures is, not unexpectedly, much more complicated, little studied, and beyond the scope of this chapter. In its broad sense, it implies that sources can influence each other, in other words that there exists feedback between the superposition and its components. We will briefly return to this later. 12.3.4. Mechanisms A coherent way of explaining scale invariance has already emerged: long-range dependence is generated by the heavy tailed property, is preserved by superposition, and is well idealized by a canonical Gaussian process. Are there network mechanisms capable of carrying out these linear mathematical operations? The answer is yes. From the discrete nature of packets, flows can inter-penetrate in switches andmultiplexers (traffic concentrators), thereby effectively adding, in an approximate sense, their instantaneous rates. Moreover, since the packets remain identifiable in the mixture, this quasi-additivity also acts in the inverse direction, in the demultiplexers where flows leave a large link to move away from the core of network, or in switches where flows are extracted from one superposition to be integrated into others. The
430
Scaling, Fractals and Wavelets
same logic is also valid for various methods of multiplexing relevant to circuit switching. A second question lies in the possibility that multiplicative, rather than additive, mechanisms exist in the network, potentially allowing the realization of one mathematical path (following cascades) for the generation of multifractal properties. It was suggested that the hierarchy of protocols can fulfill this role [FEL 98] by recursive subdivision of source data. However, the true cause (or causes) of multifractality observed remains to be determined. If, in low load, multiplexing, switching and demultiplexing operations are well understood in terms of linear operations, at high load non-linearities, mainly due to buffers, electronic queues, are inevitable. From strong smoothing we expect elimination of scale invariance over a certain scale range, however at a large scale the influence of heavy tails, a property of great robustness, will persist. However, the non-linear mechanisms potentially involved are richer than a simple truncation of what is otherwise a simple linear superposition. If a control mechanism regulates a flow resulting from a given source, such as for example in TCP connections, there is a coupling between the source and the network, a feedback, which modifies the transmission depending on the state of the network, controlled by example by the level of loss detected. Thus, network queues generate an indirect coupling between different sources, producing a highly non-linear dynamics capable of very significantly modifying the nature of traffic. This effect is stronger as the proportion of sources thus regulated is large. Such dynamics, and its potential capacity to generate scale invariance such as self-similarity and multifractality, has begun to generate considerable excitement in the networking research community. Finally, it is interesting to note a return to dynamic system approaches, which were considered by Erramilli and Singh early in the history of fractal traffic [ERR 90, ERR 95] but which did not evolve thereafter. 12.4. New models, new behaviors 12.4.1. Character of a model By a “good model” we understand, first of all, that the statistics of the data are well captured by the random structure of the model. It is imperative to insist on the principle of parsimony, in other words, that only the minimal number of parameters necessary to cover the essential degrees of freedom be used. An excess of parameters is the sign of a model which is over-fitted to a specific dataset, which does not therefore capture any generality, or hold structural validity. In this case, the majority of the parameters lack physical significance, and as a result their estimation is likely to be difficult and arbitrary. Finally, to measure the degree of adequacy of a model, good metrics should be chosen. In the context of telecommunications, these will not only refer to the statistics of a flow, but to the system as a whole. Thus, the model
Scale Invariance in Computer Network Traffic
431
should be judged by its capacity to predict some measure of quality of service, of which there are a number to chose from. Among the metrics which are precisely defined, and yet reasonably close to the concerns of users, we count loss rate and average packet delay, whereas an example which is more focused on engineering questions of network dimensioning is the distribution function of queue contents, which is the marginal of the “waiting process.” However, we also work under the constraint of considering problems for which we can hope to find solutions. Often, we impose simple idealizations, for example queues with infinite waiting rooms. We then commonly use the fact that the probability Q(x), that the level of an infinite queue exceeds x, bounds from above the probability of a loss in a corresponding system where the queue is of finite size x. From the first studies on the impact of fractal traffic we have seen that the behavior of systems can deviate notably from traditional intuition. In 1993, Veitch [VEI 93] emphasized this fact by presenting a simple system in which a fractal renewal process, with an average incoming rate of zero, could produce a dynamic non-trivial queue. In this section, we consider three model classes representing the state of the art and corresponding performance studies, essentially the form of Q(x) for large x. Each class considered itself exhibits untraditional behavior, though of very different types. Each of the models allows an interpretation in terms of a linear superposition of on/off sources, though they were not necessarily proposed in that light, and in each case other motivations are possible. 12.4.2. The fractional Brownian motion family Often, instead of studying traffic via its rate X(k), we turn to the series k Y (k) = i=1 X(i), measuring the mass of data accumulated over the interval (0, k]. Passing over to continuous time, if X(t) is stationary, Y (t) has stationary increments, that is the distributions of the increments {Y (t + δ) − Y (t), t ∈ } do not depend on t. We can decompose this process as Y (t) = μY t + σY W (t)
(12.6)
where W (t) has zero-mean. If the rate process exhibits long-range dependence, the natural choice to model W (t) is the fractional Brownian motion (FBM) BH (t), t ∈ , 0 < H < 1. This canonical process is the unique self-similar Gaussian process with stationary increments. Thus, it has a perfect scale invariance simultaneously in all its statistics across all scales, for example its variance obeys Var[BH (t)] = |t|2H . If we differentiate fractional Brownian motion with H < 1 and δ = 1, we obtain fractional Gaussian noise with β = 2(1 − H), which has long memory if H > 12 . In 1994, Norros [NOR 94] examined a system called fractional Brownian storage consisting of an infinite reservoir with a constant drainage rate of C, fed by Y (t). This type of system is known as a fluid queue, as the data flows into the reservoir which
432
Scaling, Fractals and Wavelets
is emptied in a continuous manner, and the queue state is given simply by the storage level. In terms of a limit of on/off sources, the idealization W (t) = BH (t) is valid when we have many of them each with h C, which corresponds to many traffic streams flowing in a high capacity link. The main result is that Q(x) is asymptotically close to a Weibull law, namely log Q(x) ∼ κx2(1−H)
(12.7)
where κ denotes a known constant. The slow decrease of this probability with x implies that a loss probability is more significant than in the common exponential case. This result was confirmed by Brichet et al. [BRI 96] who began by studying on/off sources themselves and examined the system in a limit leading to fractional Brownian motion. The logarithmic asymptotic equivalence was refined by Narayan [NAR 98] and by Simonian and Massoulie [SIM 99]. The asymptotic form of Q(x) is now known up to a constant. 12.4.3. Greedy sources In contrast to the fractional Brownian motion model, which idealizes a mixture of a great number of small sources, we can imagine a finite, even a small number of sources, each of which keeps an appreciable rate h. Such a scenario can model a link with a low level of aggregation, far from the center of the network, close to user access links. Indeed, even a single heavy on/off source with h > C, flowing into a reservoir being drained at rate C (with λ < C, naturally), generates remarkable statistics in the queue: the tail Q(x) of the queue is so heavy that its mean does not even exist [CHO 97]! Such a tail decay, Q(x) = O(x−(α−1) ) with 1 < α < 2, is slower than that of a Weibullian queue, for which all moments exist. There are many generalizations of this system sharing the same fundamental property: systems made up of a mixture of several sources of which some have long-range dependence while others do not, and some are characterized by h < C, and others by h > C. To learn more about these systems, see the survey article [BOX 97]. Intuitively, the property necessary for such behavior is that time intervals during which the total rate of the superposition exceeds C must have a heavy tail, without variance. 12.4.4. Never-ending calls A large fraction of the queueing theory literature concerns systems for which the arriving work is not fluid, but a point process. For example, the notation M/G/1 denotes a Poisson process as the arrival process (“M” for Markov), of which each point, upon reaching the server, is allocated a service time distributed according to a random variable of general (“G”) distribution, that is without restriction. Here “1” denotes a single server and, by convention, the waiting room is assumed to be unlimited. Point process models readily adapt themselves to the modeling of circuit switching: points represent call requests, whose durations are determined by independent copies of B.
Scale Invariance in Computer Network Traffic
433
Equipping B with a heavy tail is interpreted as modeling long calls, for example, those generated by people connected to the Internet through the telephone network. It is evident that such heavy connections will weigh upon the size of the queue. In fact, thanks to Cohen [COH 73], since 1973 it has been known that if B is characterized by an exponent α > 1, then the exponent of Q(x) is α − 1. Therefore, in particular, if service durations do not have a variance, the average waiting time before receiving service is infinite! This recalls certain fluid queue results, and in fact there are strong connections between the two types of system, the main difference being that, for point arrival, the incoming mass is instantaneously rather than progressively deposited into the queue, therefore further aggravating its load. A considerable number of results for such systems are now available. For a survey we can consult Boxma and Cohen [BOX 00]. Among the most important generalizations is the replacement of “M” by “GI”, indicating a renewal process whose inter-arrivals are distributed according to a variable A which is not exponential but arbitrary. This law can also have a heavy tail, in which case, depending on the ratio of the exponents of A and B, different behaviors are possible, especially when the system is heavily loaded: λ ≈ C. Another major factor lies in the choice of service discipline of the queue. In [COH 73] the traditional choice is made: arrivals are serviced in the order of arrival. However, there is no shortage of alternatives which are commonly employed in switches. For example, with processor sharing, where the server divides its capacity equally over all the customers present, we recover a finite average waiting time even when B is without variance, essentially because no arrival is forced to wait behind earlier arrivals which may have very long service times. 12.5. Perspectives Even if the fractal nature of teletraffic is now accepted, and a new understanding of its impact has been, to some extent, reached, the list of open questions remains long. In reality, we are in the early stages of studying this phenomenon, observing its evolution, and appreciating its implications. As far as long-range dependence is concerned, one category of outstanding questions concern the details of these effects on various aspects of performance. In some sense it is necessary to “redo everything” in the queueing literature and other fields, to take into account this invariance at large scales. Despite considerable progress, our knowledge falls far short of that necessary to design networks capable of mimimizing the harmful effects with confidence, and efficiently exploit the beneficial properties of long memory. A second category of questions that appears essential for the future is to understand the origin (or origins) of the apparent scaling invariance over small scales; multifractal behavior. Understanding these origins will be essential to predict whether this behavior will persist, not only in the sense of not disappearing, but also in the sense of its extension towards ever smaller scales, as they are progressively activated by advances in technology. Even if small scale invariance is influenced, or even entirely determined, by network design, the study of its impact on performance will remain
434
Scaling, Fractals and Wavelets
relevant. Though it appears obvious that, like any variability, its presence will be negative overall, we have yet to evaluate the cost of any impact against the cost of the actions that may be required to suppress it. The third category of questions concern protocol dynamics in closed loop control, such as in TCP, which configures the global network as an immense distributed dynamic system, from which the generation of scale invariances may be only one of the important consequences. The richness of non-linear and non-local interactions in this system merits that this new field be studied in full depth. The next phase in the evolution of fractal teletraffic phenomenon, as unpredictable as it is fascinating, could very well come from determinism rather than randomness. 12.6. Bibliography [ABR 98] A BRY P., V EITCH D., “Wavelet analysis of long-range dependent traffic”, IEEE Transactions on Information Theory, vol. 44, no. 1, p. 2–15, 1998. [ABR 00] A BRY P., TAQQU M.S., F LANDRIN P., V EITCH D., “Wavelets for the analysis, estimation, and synthesis of scaling data”, in PARK K., W ILLINGER W. (Eds.), Self-similar Network Traffic and Performance Evaluation, John Wiley & Sons, 2000. [BER 95] B ERAN J., S HERMAN R., TAQQU M.S., W ILLINGER W., “Variable-bit-rate video traffic and long range dependence”, IEEE Transactions on Communications, vol. 43, p. 1566–1579, 1995. [BOX 97] B OXMA O.J., D UMAS V., Fluid queues with long-tailed activity period distributions, Technical Report PNA-R9705, CWI, Amsterdam, The Netherlands, April 1997. [BOX 00] B OXMA O.J., C OHEN J.W., “The single server queue: Heavy tails and heavy traffic”, in PARK K., W ILLINGER W. (Eds.), Self-Similar Network Traffic and Performance Evaluation, John Wiley & Sons, 2000. [BRI 96] B RICHET F., ROBERTS J., S IMONIAN A., V EITCH D., “Heavy traffic analysis of a storage model with long range dependent on/off sources”, Queueing Systems, vol. 23, p. 197–225, 1996. [CHO 97] C HOUDHURY G.L., W HITT W., “Long-tail buffer-content distributions in broadband networks”, Performance Evaluation, vol. 30, p. 177–190, 1997. [COH 73] C OHEN J.W., “Some results on regular variation for the distributions in queueing and fluctuation theory”, Journal of Applied Probability, vol. 10, p. 343–353, 1973. [COX 84] C OX D.R., “Statistics: an appraisal”, in DAVID H.A., DAVID H.T. (Eds.), Long-Range Dependence: A Review, Iowa State University Press, Ames, USA, p. 55–74, 1984. [CUN 95] C UNHA C., B ESTAVROS A., C ROVELLA M., Characteristics of WWW client-based traces, Technical Report, Boston University, Boston, Massachusetts, July 1995. [DUF 94] D UFFY D.E., M CINTOSH A.A., ROSENSTEIN M., W ILLINGER W., “Statistical analysis of CCSN/SS7 traffic data from working CCS subnetworks”, IEEE Journal on Selected Areas in Communications, vol. 12, no. 3, 1994.
Scale Invariance in Computer Network Traffic
435
[ERR 90] E RRAMILLI A., S INGH R.P., “Application of deterministic chaotic maps to characterize broadband traffic”, in Proceedings of the Seventh ITC Specialist Seminar (Livingston, New Jersey), 1990. [ERR 93] E RRAMILLI A., W ILLINGER W., “Fractal properties in packet traffic measurements”, in Proceedings of the ITC Specialist Seminar (Saint Petersburg, Russia), 1993. [ERR 95] E RRAMILLI A., S INGH R.P., P RUTHI P., “An application of deterministic chaotic maps to model packet traffic”, Queueing Systems, vol. 20, p. 171–206, 1995. [FAL 90] FALCONER K., Fractal Geometry: Mathematical Foundations and Applications, John Wiley & Sons, 1990. [FEL 98] F ELDMANN A., G ILBERT A., W ILLINGER W., “Data networks as cascades: Explaining the multifractal nature of internet WAN traffic”, in ACM/Sigcomm’98 (Vancouver, Canada), 1998. [KUR 96] K URTZ T.G., “Limit theorems for workload input models”, in K ELLY F.P., Z ACHARY S., Z IEDINS I. (Eds.), Stochastic Networks: Theory and Applications, Clarendon Press, Oxford, p. 119–140, 1996. [LEL 91] L ELAND W.E., W ILSON D.V., “High time-resolution measurement and analysis of LAN traffic: Implications for LAN interconnection”, in Proceedings of the IEEE Infocom’91 (Bal Harbour, Florida), p. 1360–1366, 1991. [LEL 93] L ELAND W.E., TAQQU M.S., W ILLINGER W., W ILSON D.V., “On the self-similar nature of Ethernet traffic”, Computer Communications Review, vol. 23, p. 183–193, 1993. [LEL 94] L ELAND W.E., TAQQU M.S., W ILLINGER W., W ILSON D.V., “On the self-similar nature of Ethernet traffic (extended version)”, IEEE/ACM Transactions on Networking, vol. 2, no. 1, p. 1–15, 1994. [LEV 97] L ÉVY V ÉHEL J., R IEDI R.H., “Fractional Brownian motion and data traffic modeling: The other end of the spectrum”, in L ÉVY V ÉHEL J., L UTTON E., T RICOT C. (Eds.), Fractals in Engineering’97, Springer, 1997. [MEI 91] M EIER -H ELLSTERN K., W IRTH P.E., YAN Y.L., H OEFLIN D.A., “Traffic models for ISDN data users: Office automation application”, in Proceedings of the Thirteenth ITC (Copenhagen, Denmark), p. 167–172, 1991. [NAR 98] NARAYAN O., “Exact asymptotic queue length distribution for fractional Brownian traffic”, Adv. Perf. Analysis, vol. 1, no. 39, 1998. [NOR 94] N ORROS I., “A storage model with self-similar input”, Queueing Systems, vol. 16, p. 387–396, 1994. [PAW 88] PAWLITA P.F., “Two decades of data traffic measurements: A survey of published results, experiences, and applicability”, in Proceedings of the Twelfth International Teletraffic Congress (ITC 12, Turin, Italy), 1988. [PAX 94a] PAXSON V., F LOYD S., “Wide-area traffic: The failure of Poisson modeling”, IEEE/ACM Transactions on Networking, vol. 3, no. 3, p. 226–244, 1994. [PAX 94b] PAXSON V., F LOYD S., “Wide-area traffic: The failure of Poisson modeling”, in Proceedings of SIGCOMM’94, 1994.
436
Scaling, Fractals and Wavelets
[RAM 88] R AMASWAMI V., “Traffic performance modeling for packet communication – Whence, where, and whither?”, in Proceedings of the Third Australian Teletraffic Research Seminar, vol. 31, November 1988. [RIE 95] R IEDI R.H., “An improved multifractal formalism and self-similar measures”, J. Math. Anal. Appl., vol. 189, p. 462–490, 1995. [RIE 99] R IEDI R.H., C ROUSE M.S., R IBEIRO V.J., BARANIUK R.G., “A multifractal wavelet model with application to network traffic”, IEEE Transactions on Information Theory (special issue on “Multiscale Statistical Signal Analysis and its Applications”), vol. 45, no. 3, p. 992–1018, 1999. [ROU 99] ROUGHAN M., YATES J., V EITCH D., “The mystery of the missing scales: Pitfalls in the use of fractal renewal processes to simulate LRD processes”, in ASA-IMA Conference on Applications of Heavy Tailed Distributions in Economics, Engineering, and Statistics (American University, Washington, USA), June 1999. [RYU 96] RYU B.K., L OWEN S.B., “Point process approaches to the modeling and analysis of self-similar traffic. Part I: Model construction”, in IEEE INFOCOM’96: The Conference on Computer Communications (San Francisco, California), IEEE Computer Society Press, Los Alamitos, California, vol. 3, p. 1468–1475, March 1996. [SIM 99] S IMONIAN A., M ASSOULIÉ L., “Large buffer asymptotics for the queue with FBM input”, Journal of Applied Probability, vol. 36, no. 3, 1999. [STE 94] S TEVENS W., TCP/IP Illustrated. Volume 1: The Protocols, Addison-Wesley, 1994. [TAN 88] TANNENBAUM A.S., Computer Networks, Prentice Hall, Second Edition, 1988. [TAQ 95] TAQQU M.S., T EVEROVSKY V., W ILLINGER W., “Estimators for long-range dependence: An empirical study”, Fractals, vol. 3, no. 4, p. 785–798, 1995. [TAQ 97] TAQQU M.S., W ILLINGER W., S HERMAN R., “Proof of a fundamental result in self-similar traffic modeling”, Computer Communication Review, vol. 27, p. 5–23, 1997. [VEI 93] V EITCH D., “Novel models of broadband traffic”, in IEEE Globecom’93 (Houston, Texas), p. 1057, November 1993. [VEI 99] V EITCH D., A BRY P., “A wavelet based joint estimator of the parameters of long-range dependence”, IEEE Transactions on Information Theory (special issue on “Multiscale Statistical Signal Analysis and its Applications”), vol. 45, no. 3, p. 878–897, 1999. [WIL 95] W ILLINGER W., TAQQU M.S., S HERMAN R., W ILSON D.V., “Self-similarity through high-variability: Statistical analysis of the Ethernet LAN traffic at the source level”, in Proceedings of the ACM/SIGCOMM’95 conference, 1995 (available at the address: http://www.acm.org/sigcomm/sigcomm95/sigcpapers.html). [WIL 96] W ILLINGER W., TAQQU M.S., E RRAMILLI A., “A bibliographical guide to self-similar traffic and performance modeling for modern high-speed networks”, in K ELLY F.P., Z ACHARY S., Z IEDINS I. (Eds.), Stochastic Networks: Theory and Applications, Clarendon Press, Oxford, p. 339–366, 1996.
Chapter 13
Research of Scaling Law on Stock Market Variations
13.1. Introduction: fractals in finance Stock market graphs representing changes in the prices of securities over a period of time appear as irregular forms that seem to be reproduced and repeated in all scales of analysis: rising periods follow periods of decline. However, the rises are broken up with intermediate falling phases and falls are interspersed with partial rises, and this goes on until the natural quotation scale limit is reached. This entanglement of repetitive patterns of rising and falling waves in all the scales was discovered in the 1930s by Ralph Elliott, to whom this idea occurred while observing the ebb and flow of tides on the sands of a seashore. From this, he formulated a financial symbolization known as “stock market waves” or “Elliott’s waves,” which he broke up into huge tides, normal waves and wavelets, and also “tsunami”, from the name given in Japan to huge waves arising due to earthquakes. The theory called “Elliott’s waves” [ELL 38] presents a deterministic fractal description of the stock market from self-similar geometric figures that we find on all scales of observation and compiles a toolbox in the form of graphic analysis of stock market fluctuations used by certain market professionals: technical analysts. Elliott’s figures propose a calibration of rise and fall variations from a pythagorician numerology based on the usage of golden ratio and Fibonacci sequence, which are predictions strongly tinged with subjectivity, in so far as detection and positioning of waves depend on the graphic analyst’s view of the
Chapter written by Christian WALTER.
438
Scaling, Fractals and Wavelets
market which he examines. For the lack of an appropriate mathematical tool, this conceptualization of stock market variations was confirmed, as alchemy before chemistry, in the pre-scientific field until the emergence of fractals. The fractals of Benoît Mandelbrot, though developed in a radically different approach, fit in this understanding of stock market variations and present, as common point with Elliott’s waves, the fact of finding how to untangle the inextricable interlacing of stock market fluctuations in all the scales. Using stock market language, do we find ourselves in fall correction of a rising phase or in a fall period contradicted by a temporary rise? Fractals represent adequate conceptualization allowing the translation of intuitions of graphic analysts in rigorous mathematical representations. However, the adventure of fractals in finance does not have a smooth history. It rather refers to an eventful progression of Mandelbrot’s assumptions through the evolution of finance theory over 40 years, from 1960 until today, which stirred up a vehement controversy on modeling in infinite variance or infinite memory. The connecting thread of Mandelbrot’s works, followed by others – including his contradictors – was the research of scaling laws on stock market fluctuations, irrespective of whether this research followed the direction of scaling invariance, or pure fractal approach of markets, as proposed by Mandelbrot, or according to that of an instrumentation of multiscaling analysis of markets, such as that corresponding to mixed processes or of ARCH type that emerged in the 1980s, or to the changing system in the 1990s. The starting point of this controversy was the existence of leptokurtic (or non-Gaussian) distributions on stock market variations. This distributional anomaly in relation to the Brownian hypothesis of traditional financial modeling led Mandelbrot to propose α-stable distributions in 1962 to Paul Lévy’s infinite variance by replacing Gaussian for modeling of periodic returns. However, very soon, this new hypothesis provoked a relatively fierce controversy concerning the existence of variance and other new candidate processes appeared, all the more easily while scaling invariance of α-stable laws, cardinal property of Mandelbrot’s fractal hypothesis, did not appear, or only with difficulty, experimentally validated. The attempt to resolve leptokurtic problems by conservation of iid hypothesis and the proposal of α-stable distributions did not solve all the anomalies, since a new type of anomalies, or scaling anomalies appeared. Therefore, the theoretical research is interested in modeling leptokurticity in other possible ways and we turn towards the second pivot of financial modeling: the hypothesis of independence of successive returns, which was equally challenged. Hence, we looked in different forms of dependence between returns (linear and then non-linear dependence) for the cause of leptokurticity observed. This is the second round of empirical investigations. After highlighting the absence of short memory on returns, the research is turned towards the detection of long memory on returns.
Research of Scaling Law on Stock Market Variations
439
This attempt does not succeed either. Then, the focus is shifted to the process of volatilities, with the formalization of short memory on volatilities, i.e. an approach that led to the trend of ARCH modeling and then by highlighting long memory on volatilities (or long dependence), i.e. a trend that led to the rediscovery of scaling laws in finance. Finally, the fractal hypothesis was validated on the generating process of stock market volatilities. Today, the long memory of volatilities (i.e., a hyperbolic law on correlations among volatilities) has become a recognized fact on financial markets and financial modeling seeks to reconcile the absence of memory on returns and the presence of long memory on volatilities. After the first part, which briefly outlines the quantities followed in finance and traditional financial modeling, we present a review of theoretical works on results of research on scaling laws for the description of stock market variations1. This review very clearly shows the various distinct periods and thus we propose to establish a chronology in this research of scaling laws, i.e. a periodization which illustrates conceptual transfers whose subject has been finance for 40 years. The chronology is as follows: – during the first period, from 1960 to 1970, Mandelbrot’s proposals and the first promising discoveries of scaling invariance on the markets launched a debate in the university community, by introducing iid-α-stable and H-correlative models; – this debate developed during the period 1970-1990 and seems to be completed with the experimental rejection of fractals in finance on stock market returns; – however, parallel to fractals, developments of time series econometrics in the process of fractional integration degree of type ARFIMA from the 1980s and, then FIGARCH in the 1990s led, from 1990-2000, a rediscovery of scaling laws on the process of stock market volatilities, using long memory concepts; – finally, the measurement of time itself becomes an object of research, with the recent developments of modeling in time deformation. 13.2. Presence of scales in the study of stock market variations 13.2.1. Modeling of stock market variations 13.2.1.1. Statistical apprehension of stock market fluctuations When we want to statistically describe the behavior of a stock market between two dates 0 and T , with the aim of proposing its probabilistic modeling, two “natural”
1. Mathematical aspects of fractal modeling in general, developed in many works, are not dealt with here. Specific aspects of fractal modeling in finance and examples of application of iid-α-stable model are presented in detail in [LEV 02].
440
Scaling, Fractals and Wavelets
interpretations of the available data – price quotations – are possible. We may consider the prices quoted between 0 and T based on a fixed frequency of observation, which can be a day, a month or a trimester – but also an hour or five minutes. This further subdivides the interval [0, T ] in n periods equal to basic length τ = T /n, this duration τ defining a “characteristic time” of market observation. On the other hand, we considered the price quoted in every transaction that took place between 0 and T , which means splitting up the interval [0, T ] in 0 = t0 < t1 < . . . < tn = T and working in “time deformation”, or “transaction-time”, tj being the moment of the j th transaction. The first approach appears most immediate, but because of the discontinuous and irregular nature of stock market quotations, it is possible that the price recorded on date t does not correspond to a real exchange on the market, i.e., to an equilibrium of supply and demand at the time of quoting: in that case, the economic significance of statistical analysis could appear very weak. On the other hand, when the frequency of quoting is higher than a day, the variations between the previous day’s closing price and the following day’s opening price are treated as intra-daily variations. Finally, this quoting in physical time assumes that market activity is broadly uniform during the observation period, which is generally not the case. It is from here that the interest in the second approach is derived, corresponding as if it does to the succession of balanced price in supply and demand. These two approaches exist simultaneously in financial modeling and this alternative leads us to consider the issue of adequate time to measure stock market fluctuations, which was put forth for the first time by Mandelbrot and Taylor [MAND 67b] and by Clark [CLA 73], who introduced the concept of “time-information” – where information was associated with the volume of transactions2. The first analysis (calendar time) is widely used, but the second research trend (time deformation) has begun to be the subject of new interest. This interest is the result of change (and also reflects it) that appeared in the computing environment of stock markets and which is translated by an increasing abundance of available data of prices: if prices have been registered daily since the 19th century, during the 1980s they were quoted every minute and then in the 1990s, after each transaction. Thus, the magnitude of sample sizes increased in several powers of 10. Statistical tests carried out on markets during the 1960s made use of approximately 103 data. Those in the early 1990s processed nearly 105 data. The most recent investigations examine nearly 107 data.
2. We can observe that this approach is similar to that of Maurice Allais who introduced the concept of “psychological time” in economics (see for example [ALL 74]).
Research of Scaling Law on Stock Market Variations
441
In calendar time, the basic modeling is as follows. Let S(t) be the price of asset S on date t. The variation of price between 0 and T is: S(T ) = S(0) +
n
ΔS(k)
n=
k=1
T τ
(13.1)
The notation ΔS(k) represents the price variation of asset S between the dates t − τ and t, where t is expressed in multiple steps of basic time τ : ΔS(t, τ ) = S(t) − S(t − τ ) = S(kτ ) − S (k − 1)τ = ΔS(k) (13.2) In transaction-time, prices are quoted in every transaction made and price variations between two successive transactions are taken into consideration. Let N (t) be the number of transactions3 made between dates 0 and t. The variation of price between 0 and T in this case is:
N (T )
(T ) = S(0) +
ΔS(j)
(13.3)
j=1
The notation ΔS(j) represents price variation of the asset S between transactions j − 1 and j: ΔS(j) = S(j) − S(j − 1) The pricing process {S(j)} is therefore indexed by a functioning stock market time, or “transaction-time,” noted by θ(t): S(j) = S(θ(t)). Finally, market professionals generally say that the value of a quoted price (and thus relevance of measurement) is not the same depending on whether this price corresponds to a transaction of 500,000 securities or 5 securities. Therefore, the concepts of market “depth”, exchange “weight”, come into play. We measure the intensity of the exchange, or “activity level” of the market, by the quantity of securities exchanged, or volume of transactions. Financial modeling took into account this element and the volume of transactions in the evolution of price is analyzed4 today by introducing volume process in financial modeling.
3. The process of transaction inflows N (t) was dealt by Hasbrouck and Ho [HAS 87] and recently by Ghysels et al. [GHY 97]. Evertsz [EVE 95b] showed that the distribution of waiting time between two quotations followed Pareto’s power law, whose exponent value implies infinite mathematical expectation. 4. See, for example, Lamoureux and Lastrapes [LAM 94] or Gouriéroux and Le Fol [GOU 97b] who give a synthetic idea of this question. Maillet and Michel [MAI 97] showed that distribution of volumes follows Pareto’s power law.
442
Scaling, Fractals and Wavelets
Let V (t) be the total volume of securities exchanged between 0 and t. The total volume of securities exchanged between 0 and T is:
N (T )
V (T ) =
υ(j)
(13.4)
j=1
Notation υ(j) represents the volume of securities exchanged during the transaction j. The volume process {V (j)} is indexed by transaction-time. The price quoted on date T is therefore the result of three factors or processes between 0 and T : the transaction process of N (t), the process of price variations between two transactions ΔS(j) and the volume process υ(j). The price on date T is the result of the simultaneous effect of these three factors. 13.2.1.2. Profit and stock market return operations in different scales From basic data such as quoted prices on the period [0, T ], three quantities are of interest in finance. The three quantities are as follows: – profit realized on security during the period [0, T ], defined by: G(T ) = S(T ) − S(0)
(13.5)
– rate of return of security over the period [0, T ], defined by: R(T ) =
S(T ) − S(0) G(T ) = S(0) S(0)
– continuous rate of return security over the period [0, T ], defined by: r(T ) = ln 1 + R(T ) = ln S(T ) − ln S(0)
(13.6)
(13.7)
We are interested in the evolution of these quantities over successive sub-periods [t − τ, t]. The periodic gain realized on security during the sub-period [t − τ, t] is: ΔG(t, τ ) = G(t) − G(t − τ ) = S(t) − S(t − τ ) = ΔS(t, τ )
(13.8)
The rate of periodic return realized on security during the sub-period [t − τ, t] is: ΔR(t, τ ) =
S(t) − S(t − τ ) ΔS(t, τ ) = S(t − τ ) S(t − τ )
or: 1 + ΔR(t, τ ) =
S(t) 1 + R(t) = S(t − τ ) 1 + R(t − τ )
(13.9)
Research of Scaling Law on Stock Market Variations
443
The rate of periodic continuous return realized on security during the sub-period [t − τ, t] is: Δr(t, τ ) = ln 1 + ΔR(t, τ ) = ln S(t) − ln S(t − τ ) (13.10) = r(t) − r(t − τ ) When data are obtained in high frequency, ΔR(t, τ ) is “small” and we have ln(1 + ΔR(t, τ )) ≈ ΔR(t, τ ). Expressions (13.9) and (13.10) are very close and we measure periodic security returns by one or the other. The temporal aggregation of returns is realized by bringing expressions (13.6) and (13.9) closer; we have (t = kτ ): 1 + R(T ) =
T )
n ) 1 + ΔR(t, τ ) = 1 + ΔR(k)
t=τ
(13.11)
k=1
In the same way, (13.7) and (13.10) lead to: r(T ) =
T t=τ
Δr(t, τ ) =
n
Δr(k)
(13.12)
k=1
13.2.1.3. Traditional financial modeling: Brownian motion Modeling stock market variations has led us to assume that S(t) is a random variable: therefore, the value sequence S(1), S(2), S(3), etc. is considered as values on certain dates t of a process in continuous time. So, analysis of stock market variations leads to a stochastic process studying {S(t), t 0} or the associated processes {R(t), t 0} or {r(t), t 0} and their growth. The usual hypothesis of financial theory assumes that these random processes have independent and identically distributed (iid) increments of finite variance, which we can write, as an abbreviation, “iid-Gaussian” modeling. iid-Gaussian hypothesis has been the subject of much controversy for the last 50 years. iid sub-hypothesis was intensively tested in the theoretical works.5 Today, it is admitted that, for the calculation of usual evaluation and hedge models, this assumption is valid in first approximation and when τ is more than 10 minutes (see, for example, [BOU 97]). It is more convenient for the distribution of returns in scale τ than that of returns in scale T = nτ because, in this case, if P (Δr(t, τ )) is the probability distribution of periodic
5. See [CAM 97] for a complete review of the different ways to statistically test this and also the results obtained.
444
Scaling, Fractals and Wavelets
returns Δr(t, τ ), then:
⊗n P Δr(t, nτ ) = P Δr(t, τ )
(13.13)
where ⊗ represents the convolution operator. From the probabilistic point of view, the advantage of this hypothesis is purely computational. From the economic point of view, the independence of returns means considering that the available and relevant information for the evaluation of financial assets is correctly transferred in the quoted prices, which signifies the beginning of a concept of informational market efficiency; stationarity signifies that the economic characteristics of an observed phenomenon do not change much in the course of time. The existence of variance limits the fluctuation of returns, not a stock market crash or stock market boom. The first formal model representing stock market variations was proposed [BAC 00] in 1900 by Louis Bachelier6 and based on profits (13.8), then modified in 1959 by Osborne for returns (13.9) and (13.10): dS(t) = dr(t) = μ dt + σ dW (t) S(t)
(13.14)
with coefficients μ ∈ R and σ > 0, where W (t) is a standard Brownian motion7, i.e., E(W (1)) = 0 and E(W (1)2 ) = 1. Coefficient μ represents the expectation of instantaneous returns for the share purchased. The risk of a financial asset is generally measured by the coefficient σ of Brownian motion, called “volatility” by market professionals: this is a potential dispersal measure of stock market returns. There are other risk measures, which are all based on this idea of conditional variability of returns in a given time (see [GOU 97b]). The solution of (13.14) is obtained by supposing that X(t) = ln S(t) and by applying Itô’s differentiation formulae in dX(t). We obtain: σ2 t + σW (t) t ∈ [0, T ] (13.15) S(t) = S(0) exp μ− 2 which is considered as the standard model of stock market variations.
6. A biography of Louis Bachelier has been compiled by Courtault et al. [COU 00]. For a description of financial aspects of Bachelier’s work and their impact on the finance industry, see [WAL 96]. For an understanding of Bachelier’s probabilistic work with reference to his epoch, see [TAQ 00]. 7. Let us note that Bachelier did not know Brownian motion in the strict sense of its definition because it is only in 1905 that this definition would be given by Einstein, then in 1923 by Wiener. However, Bachelier assumes that the successive differences of the form ΔS(t, τ ) are independent of Gaussian distribution and of proportional variance in time interval τ , which leads to describe Wiener’s process.
Research of Scaling Law on Stock Market Variations
445
13.2.2. Time scales in financial modeling 13.2.2.1. The existence of characteristic time If we choose modeling in physical time, thus in fixed pace of time, the first question that arises is that of selecting the pace of time τ , i.e., resolution scale of market analysis: is it necessary to examine time variations – daily, weekly, monthly, etc.? Which is the most appropriate observation scale for capturing the statistical structure of stock market variations? Thus, a question of financial nature appears: should the probability law which governs stock market variations be the same at all scales? If we understand each time scale as representing an investment horizon for a given category of operators, there is apparently no particular reason for variations corresponding to a short trading horizon and those corresponding to a long horizon of portfolio manager to be modeled by the same probability law. Equation (13.12) shows that the return in scale T is the sum of returns in scales τ in the case of iid. Generally, when we add iid random variables, the resulting probability law is different from initial probability laws. Thus, a multiscale analysis seems, at first sight, inevitable, if we do not wish to lose information on the market behavior at each scale of characteristic time of a given economic phenomenon. The first analysis of market behavior used only one observation frequency, often monthly. It was Mandelbrot who, in 1962, became the first to introduce the concept of simultaneous analysis on several scales, in order to compare distributions of periodic returns Δr(t, τ ) based on these different scales τ . Mandelbrot sought to establish invariance by changing scale on periodic returns (i.e., a fractal structure of the market). If P (Δr(t, τ )) is the probability distribution of periodic returns Δr(t, τ ), relation (13.13) is simplified as: ⊗n = nH P Δr(t, τ ) (13.16) P Δr(t, τ ) where H is a self-similar exponent – which means that the process of returns {r(t), t 0} is self-similar to exponent H: L r(T ) = r(nτ ) = nH r(τ )
T = nτ
(13.17)
L where = symbolizes equality in distribution.
In such a market model, an important consequence of a fractal hypothesis is the absence of preferential observation scale and of characteristic discriminant time, for its statistical observation. In this case, it becomes possible to estimate probability law for a long horizon from the study of stock market fluctuations on a short horizon: distribution of returns in long-term horizon T is obtained, from the distribution of returns in short-term horizon τ , by means of relation (13.17). In other words, by observing the market in any scale, we can access its fundamental behavioral structure: the probability law which characterizes stock market fluctuations is independent of the scale of these fluctuations.
446
Scaling, Fractals and Wavelets
13.2.2.2. Implicit scaling invariances of traditional financial modeling However, traditional financial modeling has fractal properties: Brownian motion is a self-similar process of exponent H = 12 . Particularly, its increase Δr over a time τ follows a scaling law such that: Δr(t, τ ) ∼ τ 1/2
(13.18)
The distribution of ratio Δr/τ 1/2 is independent of time. Translated in financial terms, the magnitude order of a security return for a given time is proportional to the square root of this time. In the theory of finance, it is stated that the returns and the associated risk are proportional to the time spent. Relation (13.18) gives this proportionality, irrespective of the time scale (duration) considered. Hence, there is an invariance in the law of returns by changing the scale: law of security returns does not depend on the duration of security detention. Thus, the theoretical risk of a financial asset will be expanded in square root (exponent 12 ) of the detention time of this asset. Important people belonging to these markets permanently apply this fractal property, by opting to “annualize” volatility by means of the aforementioned formula (13.18) Thus, for example, volatility in 12 months will be equal to the volatility in a month multiplied by the square root of 12. This calculation of long-term risk level from short-term risk is also at the base of the banking industry’s prudential reflections on the control of risks on the market operations (see [BAL 94, BAL 96, BAL 98, BAL 99, IOS 94]). 13.3. Modeling postulating independence on stock market returns 13.3.1. 1960-1970: from Pareto’s law to Lévy’s distributions 13.3.1.1. Leptokurtic problem and Mandelbrot’s first model The scaling character of quoted price fluctuations in stock markets was first established by Mandelbrot through the study of price variations in cotton between 1880 and 1958. This is the first trace of an explicit comment about the existence of scaling phenomena on stock market variations. This existence was highlighted by the study of distribution tails, which brought out the connection between the discovery of scaling laws and the appropriate treatment of large stock market variations. From the first statistical study of stock market fluctuations, it was established that the empirical distributions of successive returns contained too many tail points for them to be adjusted by Gaussian densities: the empirical distributions obtained were all leptokurtic. This problem of great stock market variations was not solved and was temporarily abandoned by research for lack of appropriate means to model it. Moreover, this abnormal distribution tail was old and dated back to Pareto who had invented the law which carries his name precisely to give an account of the distribution
Research of Scaling Law on Stock Market Variations
447
of revenues in an economy on a given date and which is a power law. However, Pareto’s law did not seem to have the status of a limit law in probability and was not used in finance. Thus, Mandelbrot tackled the problem of large values of empirical distribution functions of returns Δr(t, τ ) = ln S(t) − ln S(t − τ ), where S(t) is the closing price of cotton on date t, with two values for τ : month and day. By calculating expressions Fr (Δr(t, τ ) > u) for positive values and Fr (Δr(t, τ ) < −u) for negative values, where Fr designates the cumulative frequency of variations Δr(t, τ ), a double adjustment to Pareto’s exponent law α is obtained: ! " (13.19a) ln Fr Δr(t, τ ) > u ≈ −α ln u + ln C (τ ) ! " (13.19b) ln Fr Δr(t, τ ) < −u ≈ −α ln u + ln C (τ ) Noting that adjustment rights corresponding to distributions τ = a day and τ = a month are parallels, Mandelbrot deduces that distribution laws of Δr for τ = a day and a month only differ by a changing of scale and hence proposes a new model of price variation: by conserving iid assumptions, the stability of phenomena between a day and a month is interpreted as a stability trace according to Lévy. A random variable X is called stable according to Lévy, or α-stable, if, for any couple c1 , c2 > 0, there is α ∈ ]0, 2] and d ∈ R such that: α c1 X1 + c2 X2 ≡ cX + d cα = cα 1 + c2
(13.20)
where ≡ symbolizes equality in distribution and where X1 and X2 are two copies independent of X. In the case where we have d = 0, X is strictly known as stable. Exponent α ∈ ]0, 2] is the characteristic exponent of stable laws. Mandelbrot’s inference comes from the following property of stable laws. If X is a stable law of a characteristic exponent α, then it can be shown that: 'A ( A2 −α 1 + x + O x−2α (13.21) P (X x) = 1 − F (x) = x−α πα 2πα where A1 and A2 are the independent quantities of α. In addition, by definition, a random variable follows Pareto’s law in a higher tail if: (13.22) P X x | x x(0) = 1 − F (x) = x−α h(x) where α is called Pareto’s index and where h(x) is a slowly varying function, i.e., lim h(tx)/h(x) = 1 when x → ∞ for any t > 0. When h(x) is a constant function, the law is said to be Pareto’s in a strict sense. By writing h(x) = [(A1 /(πα)) + (A2 /(2πα))x−α + O(x−2α )], relations (13.21) and (13.22) we show that α-stable laws are asymptotically Paretians with the tailing
448
Scaling, Fractals and Wavelets
index α: this is the reason why, in his 1962 communication, Mandelbrot concludes that “Paretian character[. . . ]is “predicted” or “confirmed” by stability”. The second important fact of this empirical emphasis concerns the value of α equal to 1.7. No higher order moment than α exists. Thus, since α is lower than 2, the variance Δr(t, τ ) is infinite. 13.3.1.2. First emphasis of Lévy’s α-stable distributions in finance Mandelbrot [MAND 63] developed and summarized variation modeling of prices proposed in 1962: “this was the first model that I have elaborated to describe the price variation practiced on certain stock exchanges of raw materials in a realistic way,” (see [MAND 97a], French edition, p.128). We can qualify this first model as “iid-α-stable,” insofar as iid hypotheses are conserved and that the characteristic exponent value α of stable laws goes from 2 (Gauss) to value α < 2. This model, which made it possible to create, in an unforeseen way, the flooding of stock markets, was named “Noah’s effect” by Mandelbrot, in reference to the biblical flood (see [MAND 73b]). Fama [FAM 65] and then Mandelbrot [MAND 67a] followed the initial investigations and validated the model. Finally, in 1968, the first tabulations of symmetric α-stable laws were carried out by Fama and Roll [FAM 68], which made it possible to generate the first parameter estimators of these distributions. 13.3.2. 1970–1990: experimental difficulties of iid-α-stable model 13.3.2.1. Statistical problem of parameter estimation of stable laws As Fama indicated in 1965, these first emphases of Lévy’s distributions were fragile because the estimation methods of characteristic exponent α were not very sure: the adjusting method of distribution tails in a bilogarithmic graph is very sensitive to subjective selection of the point from which we commence distribution tails. Fama [FAM 65] had proposed two other estimators based on invariance property by applied addition, be it an interquantile interval measure, or dilation law of empirical variance. However, these two estimators were equally fragile: the former presumed the independence of growth and the latter was very sensitive to the selection of sample size. A stage was reached in 1971: Fama and Roll, using properties relating to quantiles, which were detected with the help of previously made tabulations of symmetric stable distributions, proposed new estimate methods of the parameters α and γ of symmetric stable laws [FAM 71]. These first statistical tools allowed the implementation of the first tests of iid-α-stable model in 1970. Then, a second generation of estimators appeared during the 1970s. Successively, Press [PRE 72], DuMouchel [DUM 73, DUM 75], Paulson et al. [PAU 75], Arad
Research of Scaling Law on Stock Market Variations
449
[ARA 80], Koutrouvélis [KOU 80] and McCulloch [MCC 81] developed new estimation methods of parameter, using the characteristic function of stable laws8. Simultaneously, generators of stable random variables were designed by Chambers et al. [CHA 76], whose algorithms allow an improvement of the simulation possibilities on the financial markets. These new theoretical stages make it possible to improve the tests for the hypothesis of scale invariance. However, DuMouchel [DUM 83] showed that it is possible, by means of the preceding methods, to separate Lévy-stable from Pareto-unstable distributions (i.e., with convergence towards a normal law). He shows that these methods are good when the “true” distribution is stable, but are skewed when this is not the case, which lets a doubt remain regarding the validity of scale invariance, when this invariance is verified by means of these methods. In addition, the sample size can affect the results of estimations made with Koutrouvélis method and a fortiori with older methods9. For example, Walter [WAL 99] verifies that α increases according to a decrease in sample size but remains nearly constant when tests on sub-samples of constant size are carried out. More generally, we can say that the difficulties of characteristic exponent estimation make it very delicate to determine a definitive position. Thus, we find the following remark in a recent manual: “we think that estimate methods of parameter α are not precise enough to infer a clear conclusion on the real nature of distributions from estimates made on various time scales” (see [EMB 97], p. 406). When it occurs, the rejection of stability of α will not appear as “conclusive”, as Taylor affirms (see later on). In addition, other more recent studies, like those of Belkacem et al. [BEL 00], have shown the partial validations of this invariance. 13.3.2.2. Non-normality and controversies on scaling invariance In a general way, all work which will be undertaken on stock markets will not only confirm the abnormality of distributions of returns on various scales, and the possible adjustment in Lévy’s distributions on each scale, but also the difficulty in validating the fractal hypothesis. In fact, a scaling anomaly will quickly appear, which is a tendency towards the systematic increase in value of α(τ ) according to τ . The differences between these works will entail the choice of replacement process to give
8. For a review of these methods, see the works of Akgiray and Lamoureux [AKG 89] or Walter [WAL 94], who arrived at the same conclusion on selecting the best estimate method: that of Koutrouvélis [KOU 80]. 9. See the work of Koutrouvélis [KOU 80] and Akgiray and Lamoureux [AKG 89] for illustrations of this sample size problem, which has been known since the first works of Mandelbrot and Fama.
450
Scaling, Fractals and Wavelets
an account of this failure, by means of non-fractal modeling, i.e., of a multi-scale market analysis. Here we present the main articles relating to this emphasis. Teichmoeller [TEI 71], Officer [OFF 72], Fielitz and Smith [FIE 72], Praetz [PRA 72] and Barnea and Downes [BAR 73] obtain all the values of α which increase on average from 1.6 in high frequency to 1.8 in low frequency. This increase led Hsu et al. [HSU 74], who also verified it, to estimate that “in an economy where factors affecting price levels (technical developments, government policies, etc.) can undergo movements on a great scale, it seems unreasonable (our emphasis) to want to try to represent price variations by a single probability distribution” (see [HSU 74], p.1). Brenner [BRE 74], Blattberg and Gonedes [BLA 74] and Hagerman [HAG 78] continued the investigations by observing the same phenomenon. Hagerman concludes that “the symmetric stable model cannot reasonably (our emphasis) be regarded as a suitable description of stock market returns” (see [HAG 78], p. 1220). We can see a similarity of arguments between Hagerman and Hsu et al. [HSU 74] for whom it does not seem to be “reasonable” to retain a model with infinite variance. This argument was used by Bienaymé against Cauchy as early as 1853. Zajdenweber [ZAJ 76] verifies the adjustment in Lévy’s distribution but does not test the scale invariance. Upton and Shannon [UPT 79] take up the question in a different way by seeking to estimate the violation degree in normality based on observation scale by using the Kolmogorov-Smirnov (KS) method, which is a calculation of curve coefficients K and skewness S. The scale invariance is not retained. A new study by Fielitz and Rozelle [FIE 83] confirms the scaling anomaly. Other investigations are carried out on exchange markets. Wasserfallen and Zimmermann [WAS 85], Boothe and Glassman [BOO 87], Akgiray and Booth [AKG 88a], Tucker and Pond [TUC 88] and Hall et al. [HAL 89] tackled, for their part, the increase of α according to the decrease of observation frequency. At the end of the 1980s, the iid-α-stable model of stock market returns appeared to be rejected by all the research in this field. In 1986, as we read in a summarized work on the analysis of stock market variations: “many researchers estimated that the hypothesis of infinite variance was not acceptable. Detailed studies on stock market variations rejected Lévy’s distributions in a conclusive way. [. . .] Ten years after his article in 1965, Fama himself preferred to use a normal distribution for monthly variations and thus to give up stable distributions for daily variations” (see [TAY 86], p. 46). However, we can observe that theoretical scale invariance of Gaussian modeling (scaling law in square root of time) is not validated by real markets in all cases and that generalization by iid-α-stable model represents a good compromise between modeling
Research of Scaling Law on Stock Market Variations
451
power and statistical cost of estimation. We find such an argument, for example, in McCulloch [MCC 78], who advocates the small number of parameters required by stable laws, as compared with the five necessary parameters for jump models such as those proposed by Merton [MER 76]. In other words, the question remains open, even if it is probable that the “true” process of returns is more complex than iid-α-stable modeling. Certain works that were carried out show that the values of α can change in time10 (stationarity problem of Δr), which leads us to raise the question of dependence between increments of the prices process and in finding other forms of scaling laws on financial series. 13.3.2.3. Scaling anomalies of parameters under iid hypothesis Systematic increase of characteristic exponent α(τ ) of stable laws according to τ constitutes what is called a “scaling anomaly.” Indeed, in iid-α-stable modeling, the following relation must be verified: α(T ) = α(nτ ) = α(τ )
T = nτ
(13.23)
The fact that this relation is not found for all the values of n shows that scale invariance is not total on all time scales, or that the iid hypothesis is not valid. More generally, a way of highlighting invariance by changing the scaling probability law, and thus being able to determine fractal hypothesis, is to examine whether its characteristic parameters have a scaling behavior, i.e., seek a dilation (or contraction) law of parameters according to time scale. This idea is the beginning of an important trend in the theoretical research in finance. Let λ(τ ) be a statistical parameter of distribution Δr(t, τ ): λ(τ ) is a function of τ and searching for scaling laws on a market between 0 and T therefore leads to the estimation of parameter values based on each value of τ , then to the study of scale relation, or function λ: τ → λ(τ ). All the statistical distribution parameters are also a priori usable for the research of scaling laws on distributions. The most analyzed parameters in research works are either a scaling parameter or the curve coefficient, or kurtosis K. In the Gaussian case, the scaling parameter is the standard deviation and in case of iid increments, we must have the relation: σ(T ) = σ(nτ ) = n1/2 σ(τ )
T = nτ
(13.24)
This scaling relation, already postulated on variance by Bachelier [BAC 00], was introduced into research during the 1980s, and is known under the name of “test of
10. See an example in [WAL 94].
452
Scaling, Fractals and Wavelets
variance ratio”.11 Relation (13.24) shows that in the case of iid returns, we must have a proportionality σ(τ ) ∼ τ 1/2 . Some works have highlighted a slight violation of this relation, bringing to light a proportionality of type σ(τ ) ∼ τ H with H > 0.5. For example, Mantegna [MANT 91], and Mantegna and Stanley [MANT 00] make a list of the values close to 0.53 or 0.57. In case of non-Gaussian α-stable laws, the scaling parameter noted by γ is tested and we must have the relation12: γ(T ) = γ(nτ ) = n1/α γ(τ )
T = nτ
(13.25)
An important parameter is Pearson’s coefficient, or kurtosis K, defined by KX = E[(X − E(X))4 ]/E[(X − E(X))2 ]2 − 3, as this makes it possible to highlight a variation in the normality of the distribution observed. For a normal distribution, we have KX = 0. In the case of iid-Gaussian returns, we must have: K(T ) = K(nτ ) = K(τ )/n
T = nτ
(13.26)
Yet, for example, Cont [CON 97] finds that the kurtosis coefficient K(τ ) does not decrease in 1/n but rather in n1/α with α ≈ 0.5 indicating the presence of a possible non-linear dependence between variations (see section 13.4). Generally, the more we improve our knowledge of the scaling behaviors of various parameters, the more it becomes possible to choose between the two alternative terms, scale invariance or characteristic scales. The study of scaling behaviors of parameters thus helps in the modeling of stock market fluctuations. The existence of a scaling anomaly on parameter α during investigations carried on between 1970 and 1980, then on K parameter during the following decade, led certain authors to try to modify Mandelbrot’s model by limiting scale invariance, either to certain time scales, by introducing system changes (cross-over), or to certain parts of the distributions only on the extreme values. In these two fractal metamorphoses, this led to the introduction of a multiscale market analysis. 13.3.3. Unstable iid models in partial scaling invariance 13.3.3.1. Partial scaling invariances by regime switching models The question of mode changes, or partial scaling invariance on a given frequency band had already been dealt with by Mandelbrot [MAND 63], who assumed the
11. For example, see Lo and MacKinlay’s work [LO 88], who gave a list of previous works on the calculation of the variance ratio. 12. This relation is verified by Walter [WAL 91, WAL 94, WAL 99] and Belkacem et al. [BEL 00].
Research of Scaling Law on Stock Market Variations
453
existence of higher and lower limits (cut-off) in the fractality of markets (see also [MAND 97a], p. 51 and pp. 64–66) and introduced the concept of scaling range. Akgiray and Booth [AKG 88b] used this idea to reinforce McCulloch’s argument [MCC 78] on the cost-advantage ratio of a model in scaling invariance. Using stable distributions between two cutoffs is appropriate because it is less costly in parameter estimations than other modeling, which is perhaps finer (like the combinations of normal laws or mixed diffusion-jumps processes) but also more complex and therefore at the origin of a greater number of estimation errors. Therefore, the issue to be solved is the detection of points where change in speed occurs. Bouchaud and Potters [BOU 97] and Mantegna and Stanley [MANT 00] propose such a model, combining Lévy’s distributions and exponential law from a given value. 13.3.3.2. Partial scaling invariances as compared with extremes DuMouchel [DUM 83] suggests, without making a hypothesis a priori on the entire scaling invariance, “letting distribution tails speak for themselves” (see [DUM 83], p. 1025). For this, he uses the generalized Pareto’sdistribution introduced by Pickands [PIC 75], whose distribution function is: 1 − (1 − kx/σ)1/k k = 0 (13.27) F (x) = 1 − exp(−x/σ) k=0 where σ > 0 and k are the form parameters: the bigger k is, the thicker the distribution tail. In the case where distribution is stable with characteristic exponent α < 2 (scaling invariance), then we have α ∼ = 1/k. We can observe that, while Pareto’s laws had been Mandelbrot’s initial step in his introduction of the concept of scaling invariance on stock market variations, Du Mouchel operated in a manner similar to his predecessors and rediscovered Pareto’s law without the invariance sought by Mandelbrot. Mittnik and Rachev [MIT 89] propose to replace scale invariance on summation of iid-α-stable variables by another invariance structure, invariance compared with the minimum: X(1)
L
= an min X(i)+bn 1in
(13.28)
in which the stability property by addition is replaced by the stability property for an extreme value, i.e. the minimum. Weibull’s distribution corresponds to this structure. This was the beginning of a research trend that would lead to the rediscovery in finance, during the 1990s, of the theory of extreme values,13 which depicts another form of invariance: invariance compared with consideration of the maxima and minima.
13. For the application of the theory of extreme values in finance, see [LON 96, LON 00].
454
Scaling, Fractals and Wavelets
13.4. Research of dependency and memory of markets 13.4.1. Linear dependence: testing of H-correlative models on returns 13.4.1.1. Question of dependency of stock market returns The standard model of stock market variations made a hypothesis that returns Δr(t, τ ) = ln S(t) − ln S(t − τ ) were iid according to a normal variance law σ 2 τ . The question of validating the independence hypothesis emerged very early in the empirical works dealing with the characterizations of stock market fluctuations. Generally, dependency between two random variables X and Y is measured by the quantity Cf,g (X, Y ) = E(f (X)g(Y )) − E(f (X)]E[g(Y )) and we have the relation: independent X and Y
⇐⇒
Cf,g (X, Y ) = 0
The case of f (x) = g(x) = x corresponds to the measurement of usual covariance. Other cases include all the (non-linear) possible correlations between variables X and Y . Applied to stock market variations, this measure implies that the returns Δr(t, τ ) are independent only if we have C(h) = Cf,f (Δr(t), Δr(t+h)) = 0 for any function f (Δr(t)). Therefore, studying the independence of stock market variations will pave the way for the analysis of function: C(h) = E f Δr(t) f Δr(t + h) (13.29) − E f Δr(t) E f Δr(t + h) The chronology of the study merges with different choices made for the definition of function f (·). The earliest works (1930-1970) on the verification of increment independence were done only on f (x) = x. In this case, C(h) becomes the common autocovariance function: C(h) = γ(h) = E Δr(t)Δr(t + h) − E Δr(t) E Δr(t + h) (13.30) and the independence of increments corresponds to the invalidity of the linear correlation coefficient. In total, the conclusions of initial works proved the absence of a serial autocorrelation and contributed to the formation of a concept of informational efficiency in stock markets.14
14. See, for example, [CAM 97, TAY 86] for a review of this form of independence and [WAL 96] for the historical formation of the efficiency concept from initial works.
Research of Scaling Law on Stock Market Variations
455
13.4.1.2. Problem of slow cycles and Mandelbrot’s second model However, by the end of the 1970s, certain results contrary to this relation came up in the study of return behaviors in a long-term horizon, which led to tests called “long memory.” By noting by γ(h) = E(Δr(t)Δr(t + h)) − E(Δr(t))E(Δr(t + h)) the common autocovariance function and ρ(h) = γ(h)/γ(0) the associated autocorrelation function, the standard model of stock market variations implies that ρ(h) must decrease geometrically, i.e., ρ(h) cr−h with c > 0. However, it seemed that, in some cases, we obtain a hyperbolic decay ρ(h) ∼ ch2H−2 with c > 0 and 0 < H < 1, corresponding to a phenomenon of “long memory” or “long dependence.” This phenomenon of long memory was observed in the 1960s by Adelman [ADE 65] and Granger [GRA 66]; the latter described it as “the characteristic of fluctuating economic variables”. Besides, this led Mandelbrot [MAND 65] to rediscover Hurst’s law [HUR 51] by introducing the concept of “self-similar process” which later became [MAND 68] fractional Brownian motion (FBM), whose increments are self-similar with exponent H and autocovariance function γ(h) = 12 [|h + 1|2H − 2|h|2H + |h − 1|2H ]. Hence, Mandelbrot’s model can be qualified as “H-correlative” model. Mandelbrot called it “Joseph’s effect” with reference to the slow and aperiodic cycles evoked in biblical history concerning Joseph and the fluctuations in the Egyptian harvest [MAND 73a]. Summers [SUM 86], Fama and French [FAM 88], Poterba and Summers [POT 88] and DeBondt and Thaler [DEB 89] highlighted the phenomena of “average return” for successive returns, introducing the concept of long-term horizon on markets. Although divergent, the interpretations of these autocorrelation phenomena on a long horizon tended to question the hypothesis of common independence and to find a form of long memory on stock market returns. 13.4.1.3. Introduction of fractional differentiation in econometrics Since the 1970s, econometric limits of ARMA (p, q) and ARIMA (p, d, q) stationary processes in the description of financial series had progressively led to a generalization of these models by introducing a non-integer differentiation degree 0 < d < 12 with ARFIMA process, which found a great echo in finance in the 1980s. The fractional differentiation operator ∇d = (1 − L)d where ∇ is defined by ∇X(t) = X(t) − X(t − 1) = (1 − L)X(t) and: ∇d = (1 − L)d =
∞
(−1)k Cdk Lk
k=0
where Cdk is the binomial coefficient, made it possible to obtain “long memory” on studied economic series and met the demand for a new characterization of some of the properties observed in these series. Baillie [BAI 96] presents a complete synthesis
456
Scaling, Fractals and Wavelets
of the usage of these processes in econometrics of finance. ARFIMA and FBM trends recurred and led to the research of long memory on returns. 13.4.1.4. Experimental difficulties of H-correlative model on returns Tests carried out in the research work of these anomalies implemented Hurst’s R/S statistic, improved by Mandelbrot [MAND 72]. This statistic helps in finding the value of self-similar exponent H insofar as the ratio R/S is asymptotically proportional to nH : H ≈ ln(R/S)/ ln n. Thus, between 1980 and 1990, several works revealed values of H greater than 0.5, indicating the presence of long memory on markets, which seemed to corroborate the observations concerning the “abnormal” behavior of returns over long periods. However, Lo [LO 91] showed that this statistics is also sensitive to short memory effects: in the case of AR (1) process, the result R/S can be based on the rise of 73%. Lo proposed a modified statistic R/S, by adding weighed autocovariance terms to the denominator. Therefore, it appears that new values obtained from H were close to 0.5. Thus, for example, Corazza et al. [COR 97], Batten et al. [BAT 99] and Howe et al. [HOW 99] verify that the traditional analysis R/S gives values of H greater than 0.5 but the modified statistics R/S of Lo [LO 91] makes the values of H drop towards 0.5: “what is more astonishing in this result is not the absence of long memory but rather the radical change in judgment that we are led to implement when we use Lo’s modified statistics” (see [HOW 99], p.149). Further studies on independence will consider, in function C(h) defined in (13.29), for the case f (x) = x2 and f (x) = |x|. Absolute variations of price and their squares represent a measurement of price “volatility”. It is on this form of dependence, i.e. dependence on volatility, that scaling laws in finance will appear. 13.4.2. Non-linear dependence: validating H-correlative model on volatilities 13.4.2.1. The 1980s: ARCH modeling and its limits A common beginning of all the studies that were conducted in the 1990s is the observation of limits of iid-α-stable and H-correlative models, applied on stock market returns. This observation will lead us to look for a form of dependence on their volatility, by first introducing short memory on variances, with the trend of ARCH15 modeling, which is a trend that created a great number of models for this family developing initial logic of conditioning of variance in various directions (for a synthesis review see [BOL 92]). However, in 1997, we could read this comment on
15. Auto-regressive conditional heteroscedasticity: modeling introduced by Engle [ENG 82]. See an ARCH presentation in [GOU 97a, GUE 94].
Research of Scaling Law on Stock Market Variations
457
the ARCH trend: “Yet, the recent inflation of basic model varieties and terminology GARCH, IGARCH, EGARCH, TARCH, QTARCH, SWARCH, ACD-ARCH reveals that this approach appears to have reached its limits, cannot adequately answer to some questions, or does not make it possible to reproduce some stylized facts” (see [GOU 97b], p.8). These “stylized facts” particularly relate to hyperbolic decline in the correlation of volatilities, i.e., long memory, or scaling law on volatility. 13.4.2.2. The 1990s: emphasis of long dependence on volatility Baillie [BAI 96] and Bollerslev et al. [BOL 00] present a review of the emphasis of long memory on volatility. This scaling law on volatility makes it possible to understand scaling anomalies observed on kurtosis K. In fact, as Cont [CON 97] shows, if we assume that correlations on volatility are defined by a power law of type g(k) ∼ = g(0)k −α , then we obtain a scaling relation for kurtosis K: 6 K(τ ) + 2 K(τ ) + K(nτ ) = n (2 − α)(1 − α)nα which explains the phenomenon of abnormal decrease of kurtosis. Mandelbrot [MAND 71] showed the importance of taking into consideration the horizon in markets whose variations can be modeled by long dependence processes: particularly, probability of huge losses decreases less rapidly than in a iid-Gaussian world. Financiers often say that “patience reduces risk”: what long dependence shows is that this decrease is much slower than it appears and that it is necessary to be very patient. 13.5. Towards a rediscovery of scaling laws in finance After 40 years of financial modeling of stock market prices, we can observe that one of the new intellectual aspects of the 1990s in terms of describing stock market variations was a change in perspective on markets that appears in the research in finance. We can find a trace of this change in the emergence of new vocabulary. Although since Zajdenweber [ZAJ 76], all reference to fractals had disappeared from articles on finance (fractals developed in other research fields), Peters [PET 89], who estimated the value of Hurst’s exponent H on index SP500, and Walter [WAL 89, WAL 90] reintroduced Mandelbrot’s terminology and the concept of fractal structure of markets by considering “Noah” and “Joseph” effects simultaneously in their implications for understanding the nature of stock market variations. It is especially with long memory of volatilities that the concept of fractal structure reappeared and Baillie [BAI 96] can recall the relation between Mandelbrot’s terminology and the recent econometric studies. Richards [RIC 00] is directly interested in the fractal dimension of the market.
458
Scaling, Fractals and Wavelets
This is, in fact, the beginning of a progressive rediscovery of scaling laws and of a growing value for these laws. However, following Peters’ works, we can draw attention to the confusion that may emerge among the professional financial community, between the concept of fractals and that of chaos. Peters [PET 91] associated these two concepts in an approach that is more spontaneous than rigorous and consolidated them in his second work [PET 94], in which fractals and chaos are mistakenly unified in the title by presenting the application of chaos theory on investment policies, from a fractal description of stock market variations. Insofar as a great number of studies highlighted the non-applicability of approaches using chaos for the description of stock market variations16, this confusion, introduced by Peters, contributed (and perhaps still contributes) to problematizing the professional community’s understanding of fractal hypothesis, which often associates chaos concept with fractals. Notwithstanding this conceptual hesitation, we can conclude that, faced with the success of fractal modeling of volatility and with recent attempts to apply Brownian motion on deformed time [DAC 93, EVE 95a, MUL 90, MUL 93], the financial modeling of stock market variations must make way in the coming years for a partial rediscovery of scaling invariances, no longer in the context of unique fractal dimension (as in the case of iid-α-stable and H-correlative models) but from the introduction of deformed time models, which make it possible to understand market time by replacing physical time with intrinsic stock market time. The most recent modeling explores this promising method (see, for example, [ANE 00, MAND 97b]). 13.6. Bibliography [ADE 65] A DELMAN I., “Long cycles: facts or artefacts?”, American Economic Review, vol. 50, p. 444–463, 1965. [AKG 88a] A KGIRAY V., B OOTH G., “Mixed diffusion-jump process modeling of exchange rate movements”, The Review of Economics and Statistics, p. 631–637, 1988. [AKG 88b] A KGIRAY V., B OOTH G., “The stable-law model of stock returns”, Journal of Business and Economic Statistics, vol. 6, no. 1, p. 51–57, 1988. [AKG 89] A KGIRAY V., L AMOUREUX C., “Estimation of stable-law parameters: a comparative study”, Journal of Business and Economic Statistics, vol. 7, no. 1, p. 85–93, 1989. [ALL 74] A LLAIS M., “The psychological rate of interest”, Journal of Money, Credit, and Banking, vol. 3, p. 285–331, 1974. [ANE 00] A NÉ T., G EMAN H., “Order flow, transaction clock, and normality of asset returns”, Journal of Finance, vol. 55, no. 4, 2000.
16. For a synthesis, see, for example, [MIG 98].
Research of Scaling Law on Stock Market Variations
459
[ARA 80] A RAD R., “Parameter estimation for symmetric stable distributions”, International Economic Review, vol. 21, no. 1, p. 209–220, 1980. [BAC 00] BACHELIER L., Théorie de la spéculation, PhD Thesis in Mathematical Sciences, Ecole normale supérieure, 1900. [BAI 96] BAILLIE R., “Long memory processes and fractional integration in econometrics”, Journal of Econometrics, vol. 73, p. 5–59, 1996. [BAL 94] BÂLE, Risk management guidelines for derivatives, Basle Committee on Banking Supervision, July 1994. [BAL 96] BÂLE, Amendment to the capital accord to incorporate market risks, Basle Committee on Banking Supervision, January 1996. [BAL 98] BÂLE, Framework for supervisory information about derivatives and trading activities, Joint report, Basle Committee on Banking Supervision and Technical Committee of the IOSCO, September 1998. [BAL 99] BÂLE, Trading and derivatives disclosures of banks and securities firms, Joint report, Basle Committee on Banking Supervision and Technical Committee of the IOSCO, December 1999. [BAR 73] BARNEA A., D OWNES D., “A reexamination of the empirical distribution of stock price changes”, Journal of the American Statistical Association, vol. 68, no. 342, p. 348–350, 1973. [BAT 99] BATTEN J., E LLIS C., M ELLOR R., “Scaling laws in variance as a measure of long-term dependence”, International Review of Financial Analysis, vol. 8, no. 2, p. 123–138, 1999. [BEL 00] B ELKACEM L., L ÉVY V ÉHEL J., WALTER C., “CAPM, risk, and portfolio selection in α-stable markets”, Fractals, vol. 8, no. 1, p. 99–115, 2000. [BLA 74] B LATTBERG R., G ONEDES N., “A comparison of the stable and Student distributions as statistical models for stock prices”, Journal of Business, vol. 47, p. 244–280, 1974. [BOL 92] B OLLERSLEV T., C HOU R., K RONER K., “ARCH modeling in finance: A review of the theory and empirical evidence”, Journal of Econometrics, vol. 52, no. 1-2, p. 5–59, 1992. [BOL 00] B OLLERSLEV T., C AI J., S ONG F., “Intraday periodicity, long memory volatility, and macroeconomic announcements effects in the US treasury bond market”, Journal of Empirical Finance, vol. 7, p. 37–55, 2000. [BOO 87] B OOTHE P., G LASSMAN D., “The statistical distribution of exchange rates: Empirical evidence and economic implications”, Journal of International Economics, vol. 22, p. 297–319, 1987. [BOU 97] B OUCHAUD J.P., P OTTERS M., Théorie des risques financiers, Collection Aléas, Saclay, 1997. [BRE 74] B RENNER M., “On the stability of the distribution of the market component in stock price changes”, Journal of Financial and Quantitative Analysis, vol. 9, p. 945–961, 1974.
460
Scaling, Fractals and Wavelets
[CAM 97] C AMPBELL J., L O A., M ACKINLAY A.C., The Econometrics of Financial Markets, Princeton University Press, 1997. [CHA 76] C HAMBERS J., M ALLOWS C., S TUCK B., “A method for simulating stable random variables”, Journal of the American Statistical Association, vol. 71, no. 354, p. 340–344, 1976. [CLA 73] C LARK P., “A subordinated stochastic process model with finite variance for speculative prices”, Econometrica, vol. 41, no. 1, p. 135–155, 1973. [CON 97] C ONT R., “Scaling properties of intraday price changes”, Science and Finance Working Paper, June 1997. [COR 97] C ORAZZA M., M ALLIARIS A.G., NARDELLI C., “Searching for fractal structure in agricultural futures markets”, The Journal of Future Markets, vol. 17, no. 4, p. 433–473, 1997. [COU 00] C OURTAULT J.M., K ABANOV Y., B RU B., C RÉPEL P., L EBON I., L E M ARCHAND A., “Louis Bachelier on the centenary théorie de la spéculation”, Mathematical Finance, vol. 10, no. 3, p. 341–353, 2000. [DAC 93] DACOROGNA M., M ÜLLER U., NAGLER R., O LSEN R., P ICTET O., “A geographical model for the daily and weekly seasonal volatility in the foreign exchange market”, Journal of International Money and Finance, vol. 12, p. 413–438, 1993. [DEB 89] D E B ONDT W., T HALER R., “Anomalies: A mean-reverting walk down Wall Street”, Journal of Economic Perspectives, vol. 3, no. 1, p. 189–202, 1989. [DUM 73] D U M OUCHEL W., “Stable distributions in statistical inference: 1. Symmetric stable distributions compared to other long-tailed distributions”, Journal of the American Statistical Association, vol. 68, no. 342, p. 469–477, 1973. [DUM 75] D U M OUCHEL W., “Stable distributions in statistical inference: 2. Information from stably distributed samples”, Journal of the American Statistical Association, vol. 70, no. 350, p. 386–393, 1975. [DUM 83] D U M OUCHEL W., “Estimating the stable index in order to measure tail thickness: A critique”, The Annals of Statistics, vol. 11, no. 4, p. 1019–1031, 1983. [ELL 38] E LLIOTT R., The Wave Principle, Collins, New York, 1938. [EMB 97] E MBRECHTS P., K LÜPPELBERG C., M IKOSCH T., Modelling Extremal Events for Insurance and Finance, Springer, 1997. [ENG 82] E NGLE R., “Autoregressive conditional heteroskedasticity with estimates of the variance in United Kingdom inflation”, Econometrica, vol. 50, p. 987–1008, 1982. [EVE 95a] E VERTSZ C.G., “Fractal geometry of financial time series”, Fractals, vol. 3, no. 3, p. 609–616, 1995. [EVE 95b] E VERTSZ C.G., “Self-similarity of high-frequency USD-DEM exchange rates”, in Proceedings of the First International Conference on High Frequency Data in Finance (Zurich, Switzerland), vol. 3, March 1995. [FAM 65] FAMA E., “The behavior of Stock Market prices”, Journal of Business, vol. 38, no. 1, p. 34–195, 1965.
Research of Scaling Law on Stock Market Variations
461
[FAM 68] FAMA E., ROLL R., “Some properties of symmetric stable distributions”, Journal of the American Statistical Association, vol. 63, p. 817–836, 1968. [FAM 71] FAMA E., ROLL R., “Parameter estimates for symmetric stable distributions”, Journal of the American Statistical Association, vol. 66, no. 334, p. 331–336, 1971. [FAM 88] FAMA E., F RENCH K., “Permanent and temporary components of stock prices”, Journal of Political Economy, vol. 96, no. 2, p. 246–273, 1988. [FIE 72] F IELITZ B., S MITH E., “Asymmetric stable distributions of stock price changes”, Journal of the American Statistical Association, vol. 67, no. 340, p. 813–814, 1972. [FIE 83] F IELITZ B., ROZELLE J., “Stable distributions and the mixture of distributions hypotheses for common stock returns”, Journal of the American Statistical Association, vol. 78, no. 381, p. 28–36, 1983. [GHY 97] G HYSELS E., G OURIÉROUX C., JASIAK J., “Market time and asset price movements: Theory and estimation”, in H AND D., JARKA S. (Eds.), Statistics in Finance, Arnold, London, p. 307–322, 1997. [GOU 97a] G OURIÉROUX C., ARCH Models and Financial Applications, Springer-Verlag, 1997. [GOU 97b] G OURIÉROUX C., L E F OL G., “Volatilités et mesures du risque”, Journal de la Société de statistique de Paris, vol. 38, no. 4, p. 7–32, 1997. [GRA 66] G RANGER C.W.J., “The typical spectral shape of an economic variable”, Econometrica, vol. 34, p. 150–161, 1966. [GUE 94] G UEGAN D., “Séries chronologiques non linéaires à temps discret”, Economica, 1994. [HAG 78] H AGERMAN R., “More evidence of the distribution of security returns”, Journal of Finance, vol. 33, p. 1213–1221, 1978. [HAL 89] H ALL J., B RORSEN B., I RWIN S., “The distribution of future prices: a test of the stable paretian and mixture of normal hypotheses”, Journal of Financial and Quantitative Analysis, vol. 24, no. 1, p. 105–116, 1989. [HAS 87] H ASBROUCK J., H O T., “Order arrival, quote behavior, and the return generating process”, Journal of Finance, vol. 42, p. 1035–1048, 1987. [HOW 99] H OWE J.S., M ARTIN D., W OOD B., “Much ado about nothing: long-term memory in Pacific Rim equity markets”, International Review of Financial Analysis, vol. 8, no. 2, p. 139–151, 1999. [HSU 74] H SU D.A., M ILLER R., W ICHERN D., “On the stable paretian behavior of stok-market prices”, Journal of the American Statistical Association, vol. 69, no. 345, p. 108–113, 1974. [HUR 51] H URST H.E., “Long term storage capacity of reservoirs”, Transactions of the American Society of Civil Engineers, vol. 116, p. 770–799, 1951. [IOS 94] I OSCO, Operational and financial risk management control mechanisms for over-the-counter derivatives activities of regulated securities firms, Technical Committee of the International Organization of Securities Commissions, July 1994.
462
Scaling, Fractals and Wavelets
[KOU 80] KOUTROUVÉLIS I., “Regression-type estimation of the parameters of stable laws”, Journal of the American Statistical Association, vol. 75, no. 372, p. 918–928, 1980. [LAM 94] L AMOUREUX C., L ASTRAPES W., “Endogeneous trading volume and momentum in stock-return volatility”, Journal of Business and Economic Statistics, vol. 12, no. 2, p. 225–234, 1994. [LEV 02] L ÉVY V ÉHEL J., WALTER C., Les marchés fractals, PUF, Paris, 2002. [LO 88] L O A.W., M ACKINLAY A., “Stock prices do not follow random walks: evidence from a simple specification test”, Review of Financial Studies, vol. 1, p. 41–66, 1988. [LO 91] L O A.W., “Long-term memory in stock market prices”, Econometrica, vol. 59, no. 5, p. 1279–1313, 1991. [LON 96] L ONGIN F., “The asymptotic distribution of extreme stock market returns”, Journal of Business, vol. 69, no. 3, p. 383–408, 1996. [LON 00] L ONGIN F., “From value at risk to stress testing approach: the extreme value theory”, Journal of Banking and Finance, p. 1097–1130, 2000. [MAI 97] M AILLET B., M ICHEL T., “Mesures de temps, information et distribution des rendements intrajournaliers”, Journal de la Société de statistique de Paris, vol. 138, no. 4, p. 89–120, 1997. [MAND 63] M ANDELBROT B., “The variation of certain speculative prices”, Journal of Business, vol. 36, p. 394–419, 1963. [MAND 65] M ANDELBROT B., “Une classe de processus stochastiques homothétiques à soi ; application à la loi climatologique de H.E. Hurst”, Comptes rendus de l’Académie des sciences, vol. 260, p. 3274–3277, 1965. [MAND 67a] M ANDELBROT B., “The variation of some other speculative prices”, Journal of Business, vol. 40, p. 393–413, 1967. [MAND 67b] M ANDELBROT B., TAYLOR H., “On the distribution of stock price differences”, Operations Research, vol. 15, p. 1057–1062, 1967. [MAND 68] M ANDELBROT B., VAN N ESS J.W., “Fractional Brownian motion, fractional noises, and applications”, SIAM Review, vol. 10, no. 4, p. 422–437, 1968. [MAND 71] M ANDELBROT B., “When can price be arbitraged efficiently? A limit to the validity of random walk and martingale models”, Review of Economics and Statistics, vol. 53, p. 225–236, 1971. [MAND 72] M ANDELBROT B., “Statistical methodology for non-periodic cycles: from the covariance to R/S analysis”, Annals of Economic and Social Measurement, vol. 1, p. 259–290, 1972. [MAND 73a] M ANDELBROT B., “Le problème de la réalité des cycles lents et le syndrome de Joseph”, Economie appliquée, vol. 26, p. 349–365, 1973. [MAND 73b] M ANDELBROT B., “Le syndrome de la variance infinie et ses rapports avec la discontinuité des prix”, Economie appliquée, vol. 26, p. 349–365, 1973.
Research of Scaling Law on Stock Market Variations
463
[MAND 97a] M ANDELBROT B., Fractals and Scaling in Finance, Springer, New York, 1997 (abridged French version: Fractales, Hasard et Finances, Flammarion, Paris). [MAND 97b] M ANDELBROT B., F ISHER A., C ALVET L., “A multifractal model of asset returns”, Cowles Foundation Discussion Paper, no. 1164, September 1997. [MANT 91] M ANTEGNA R., “Lévy walks and enhanced diffusion in Milan stock exchange”, Physica A, vol. 179, p. 232–242, 1991. [MANT 00] M ANTEGNA R., S TANLEY E., An Introduction to Econophysics: Correlations and Complexity in Finance, Cambridge University Press, 2000. [MCC 78] M C C ULLOCH J.H., “Continuous time processes with stable increments”, Journal of Business, vol. 51, no. 4, p. 601–619, 1978. [MCC 81] M C C ULLOCH J.H., “Simple consistent estimators of the stable distributions”, in Proceedings of the Annual Meeting of the Econometric Society, 1981. [MER 76] M ERTON R., “Optimal pricing when underlying stock returns are discontinuous”, Journal of Financial Economics, vol. 3, p. 125–144, 1976. [MIG 98] M IGNON V., “Marchés financiers et modélisation des rentabilités boursières”, Economica, 1998. [MIT 89] M ITTNIK S., R ACHEV S., “Stable distributions for asset returns”, Applied Mathematics Letters, vol. 2, no. 3, p. 301–304, 1989. [MUL 90] M ÜLLER U., DACOROGNA M., M ORGENEGG C., P ICTET O., S CHWARZ M., O LSEN R., “Statistical study of foreign exchange rates: empirical evidence of a price change scaling law and intraday pattern”, Journal of Banking and Finance, vol. 14, p. 1189–1208, 1990. [MUL 93] M ÜLLER U., DACOROGNA M., DAVÉ R., P ICTET O., O LSEN R., WARD J., Fractals and intrinsic time – A challenge to econometricians, Olsen and Associates Research Group, UAM 1993-08-16, 1993. [OFF 72] O FFICER R., “The distribution of stock returns”, Journal of the American Statistical Association, vol. 67, no. 340, p. 807–812, 1972. [PAU 75] PAULSON A., H OLCOMB E., L EITCH R., “The estimation of the parameters of the stable laws”, Biometrika, vol. 62, no. 1, p. 163–170, 1975. [PET 89] P ETERS E., “Fractal structure in the capital markets”, Financial Analysts Journal, p. 32–37, July-August 1989. [PET 91] P ETERS E., Chaos and Order in the Capital Markets: A New View of Cycles, Prices, and Market Volatility, John Wiley & Sons, New York, 1991. [PET 94] P ETERS E., Fractal Market Analysis: Applying Chaos Theory to Investment and Economics, John Wiley & Sons, New York, 1994. [PIC 75] P ICKANDS J., “Statistical inference using extreme order statistics”, Annals of Statistics, vol. 3, p. 119–131, 1975. [POT 88] P OTERBA J.M., S UMMERS L., “Mean reversion in stock prices: Evidence and implications”, Journal of Financial Economics, vol. 22, p. 27–59, 1988.
464
Scaling, Fractals and Wavelets
[PRA 72] P RAETZ P., “The distribution of share price changes”, Journal of Business, vol. 45, p. 49–55, 1972. [PRE 72] P RESS S.J., “Estimation in univariate and multivariate stable distributions”, Journal of the American Statistical Association, vol. 67, no. 340, p. 842–846, 1972. [RIC 00] R ICHARDS G., “The fractal structure of exchange rates: Measurement and forecasting”, Journal of International Financial Markets, Institutions, and Money, vol. 10, p. 163–180, 2000. [SUM 86] S UMMERS L., “Does the stock market rationally reflect fundamental values?”, Journal of Finance, vol. 41, no. 3, p. 591–601, 1986. [TAQ 00] TAQQU M., “Bachelier et son époque: une conversation avec Bernard Bru”, in Proceedings of the First World Congress of the Bachelier Finance Society (Paris, France), June 2000. [TAY 86] TAYLOR S., Modelling Financial Time Series, John Wiley & Sons, 1986. [TEI 71] T EICHMOELLER J., “A note on the distribution of stock price changes”, Journal of the American Statistical Association, vol. 66, no. 334, p. 282–284, 1971. [TUC 88] T UCKER A., P OND L., “The probability distribution of foreign exchange price changes: test of candidate processes”, Review of Economics and Statistics, p. 638–647, 1988. [UPT 79] U PTON D., S HANNON D., “The stable paretian distribution, subordinated stochastic processes, and asymptotic lognormality: an empirical investigation”, Journal of Finance, vol. 34, no. 4, p. 1031–1039, 1979. [WAL 89] WALTER C., “Les risques de marché et les distributions de Lévy”, Analyse financière, vol. 78, p. 40–50, 1989. [WAL 90] WALTER C., “Mise en évidence de distributions Lévy-stables et d’une structure fractale sur le marché de Paris”, in Actes du premier colloque international AFIR (Paris, France), vol. 3, p. 241–259, 1990. [WAL 91] WALTER C., “L’utilisation des lois Lévy-stables en finance: une solution possible au problème posé par les discontinuités des trajectoires boursières”, Bulletin de l’IAF, vol. 349-350, p. 3–32 and 4–23, 1991 [WAL 94] WALTER C., Les structures du hasard en économie: efficience des marchés, lois stables et processus fractals, PhD Thesis, IEP Paris, 1994. [WAL 96] WALTER C., “Une histoire du concept d’efficience sur les marchés financiers”, Annales HSS, vol. 4, p. 873–905, July-August 1996. [WAL 99] WALTER C., “Lévy-stability-under-addition and fractal structure of markets: Implications for the investment management industry and emphasized examination of MATIF notional contract”, Mathematical and Computer Modelling, vol. 29, no. 10-12, p. 37–56, 1999. [WAS 85] WASSERFALLEN W., Z IMMERMANN H., “The behavior of intra-daily exchange rates”, Journal of Banking and Finance, vol. 9, p. 55–72, 1985. [ZAJ 76] Z AJDENWEBER D., “Hasard et prévision”, Economica, 1976.
Chapter 14
Scale Relativity, Non-differentiability and Fractal Space-time
14.1. Introduction The theory of scale relativity [NOT 93] applies the principle of relativity to scale transformations (particularly to transformations of spatio-temporal resolutions). In Einstein’s [EIN 16] formulation, the principle of relativity requires that laws of nature must be valid in every coordinate system, whatever their state. Since Galileo, this principle had been applied to the states of position (origin and orientation) and motion of the coordinate system (velocity and acceleration), i.e. states which can never be defined in an absolute way, but only in a relative way. The state of one reference system can be defined only with regard to another system. It is the same as regards the change of scale. The scale of one system can be defined only with regard to another system and so owns the fundamental property of relativity: only scale ratios have a meaning, never an absolute scale. In the new approach, we reinterpret the resolutions, not only as a property of the measuring device and/or of the measured system, but more generally as an intrinsic property of space-time, characterizing the state of scale of the reference system in the same way as velocity characterizes its state of motion. The principle of scale relativity requires that the fundamental laws of nature apply, whatever the state of scale of the coordinate system.
Chapter written by Laurent N OTTALE.
466
Scaling, Fractals and Wavelets
What is the motivation behind adding such a first principle to fundamental physics? It becomes imperative from the very moment we want to generalize the current description of space and time. Present description is usually reduced to differentiable manifolds (even though singularities are possible at certain particular points). So, a way to generalize current physics consists of trying to abandon the hypothesis of differentiability of spatio-temporal coordinates. As we will see, the main consequence of such an abandonment is that space-time becomes fractal, i.e. it acquires an explicit scale dependence (more precisely, it becomes scale-divergent) in terms of the spatio-temporal resolutions. 14.2. Abandonment of the hypothesis of space-time differentiability If we analyze the state of physics based on the principle of relativity before Einstein, we note that it is entirely traditional physics, including the theory of gravitation via the generalized relativity of motion, which is based on this principle. Quantum physics, although compatible with Galilean relativity of motion, seems not to rely on it with regard to its foundations. We could question whether a new generalization of the relativity, which includes quantum effects as its consequence (or, at least, some of them) remains possible. However, in order to generalize relativity, it is necessary to generalize a possible transformation between the coordinate systems, as well as the definition of what the possible coordinate systems are and, finally, to generalize the concepts of space and space-time. The general relativity of Einstein is based on the hypothesis that the space-time is Riemannian, i.e., describable by a manifold that is at least twice differentiable: in other words, we can define a continuum of spatio-temporal events, then speeds which are their derivative and then accelerations by a new derivation. Within this framework, Einstein’s equations are the most general of the simplest equations, which are covariant in twice differentiable coordinates transformations. Just as the passage of special relativity to generalized relativity is allowed by abandoning restrictive hypothesis (that of the flatness of the space-time through a consideration of curved space-time), a new opening is then possible by abandoning the assumption of differentiability. The issue now is to describe a space-time continuum which is no longer inevitably differentiable everywhere or almost everywhere. 14.3. Towards a fractal space-time The second stage of construction consists of “recovering” a mathematical tool that seems to be lost in such a generalization. The essential tool of physics, since Galileo, Leibniz and Newton is the differential equation. Is abandoning the assumption of the differentiability of space-time and therefore of the coordinate systems and of transformations between these systems the same as abandoning the differential equations?
Scale Relativity, Non-differentiability and Fractal Space-time
467
This crucial problem can be circumvented by the intervention of the concept of fractal geometry in space-time physics. With its bias, non-differentiability can be treated using differential equations. 14.3.1. Explicit dependence of coordinates on spatio-temporal resolutions This possibility results from the following theorem [NOT 93, NOT 96a, NOT 97a], which is itself a consequence of a Lebesgue theorem. It can be proved that a continuous and almost nowhere differentiable curve has a length depending explicitly on the resolution at which we consider it and tending to infinity when the interval of resolution tends to zero. In other words, such a curve is fractal in the general sense given by Mandelbrot to this term [MAN 75, MAN 82]. Applied to the coordinate system of a non-differentiable space-time, this theorem implies a fractal geometry for this space-time [ELN 95, NOT 84, ORD 83], as well as for the reference frame. Moreover, it is the dependence according to the resolution itself which solves the problem posed. Indeed, let us consider the definition of the derivative applied, for example, to a coordinate (which defines speed): x(t + dt) − x(t) (14.1) v(t) = lim dt→0 dt The non-differentiability is the non-existence of this limit. The limit being, in any case, physically unattainable (infinite energy is required to reach it, according to Heisenberg time-energy relation), v is redefined as v(t, dt), function of time t and of the differential element dt identified with an interval of resolution, regarded as a new variable. The issue is not the description of what occurs in extreme cases, but the behavior of this function during successive zooms on the interval dt. 14.3.2. From continuity and non-differentiability to fractality It can be proved [BEN 00, NOT 93, NOT 96a] that the length L of a continuous and nowhere (or almost nowhere) differentiable curve is dependent explicitly on the resolution ε at which it is considered and, further, that L(ε) remains strictly increasing and tends to infinity when ε → 0. In other words, this curve is fractal (we will use the word “fractal” in this general sense throughout this chapter). Let us consider a curve (chosen as a function f (x) for the sake of simplicity) in the Euclidean plane, which is continuous but nowhere differentiable between two points A0 {x0 , f (x0 )} and AΩ {xΩ , f (xΩ )}. Since f is non-differentiable, there is a point A1 of coordinates {x1 , f (x1 )}, with x0 < x1 < xΩ , such that A1 is not on the segment A0 AΩ . Thus, the total length becomes L1 = L(A0 A1 ) + L(A1 AΩ ) > L0 = L(A1 AΩ ). We can now iterate the argument and find two coordinates x01 and x11 with x0 < x01 < x1 and x1 < x11 < xΩ , such that L2 = L(A0 A01 ) + L(A01 A1 ) +L(A1 A11 ) + L(A11 AΩ ) > L1 > L0 . By iteration we finally construct successive
468
Scaling, Fractals and Wavelets
approximations of the function f (x) studied, f0 , f1 , . . . , fn , whose length L0 , L1 , . . . , Ln increase monotonically when the resolution ε ≈ (xΩ − x0 ) × 2−n tends to zero. In other words, continuity and non-differentiability imply monotonous scale dependence of f in terms of resolution ε. However, the function L(ε) could be increasing but converge when ε → 0. This is not the case for such a continuous and non-differentiable curve: indeed, the second stage of demonstration, which establishes the divergence of L(ε), is a consequence of Lebesgue theorem (1903), which states that a curve of finite length is differentiable almost everywhere (see for example [TRI 93]). Consequently, a non-differentiable curve is necessarily infinite. These two results, taken together, establish the above theorem on the scale divergence of non-differentiable continuous functions. A direct demonstration, using non-standard analysis, was given in [NOT 93], p. 82. This theorem can be easily generalized to curves, surfaces, volumes, and more generally to spaces of any dimension. Regarding the reverse proposition, a question remains as to whether a continuous function whose length is scale-divergent between any two points such that δx = xA − xB is finite (i.e., everywhere or nearly everywhere scale-divergent) and non-differentiable. In order to prepare the answer, let us remark that the scale-dependent length, L(δx), can be easily related to * the average value of the scale-dependent slope v(δx). Indeed, we have L(δx) = # 1 + v 2 (δx)$. Since we consider curves such that L(δx) → ∞ when δx → 0, this means that L(δx) ≈ #v(δx)$ at large enough resolution, so that L(δx) and v(δx) share the same kind of divergence when δx → 0. Basing ourselves on this simple result, the answer to the question of the non-differentiability of scale-divergent curves is as follows (correcting and updating here previously published results [NOT 08]): 1) Homogenous divergence. Let us first consider the case when the slopes diverge in the same way for all points of the curve, which we call “homogenous divergence”. In other words, we assume that, for any couple of points the absolute values v1 and v2 of their scale-dependent slopes verify: ∃K1 and K2 finite, such that, ∀δx, K1 < v2 (δx)/v1 (δx) < K2 . Then the mode of mean divergence is the same as the divergence of the slope on the various points, and it is also the mode of longitudinal divergence. In this case the inverse theorem is true, namely, in the case of homogenous divergence, the length of a continuous curve f is such that: L infinite (i.e., L = L(δx) → ∞ when δx → 0) ⇔ f non-differentiable. 2) Inhomogenous divergence. In this case there may exist curves such that only a subset of zero measure of their points have divergent slopes, in such a way that the
Scale Relativity, Non-differentiability and Fractal Space-time
469
length is nevertheless infinite in the limit δx → 0. Such a function may therefore be almost everywhere differentiable, and in the same time be characterized by a genuine fractal law of scale-dependence of its length, i.e. by a power law divergence characterized by a fractal dimension DF . The same reasoning may be applied to other types of divergences, such as logarithmic, exponential, etc. Therefore, when the divergence is inhomogenous, an infinite curve may be either differentiable or non-differentiable whatever its divergence mode, namely, this means that there is no inverse theorem in this case. When it is applied to physics, this result means that a fractal behavior may result from the action of singularities (in infinite number even though forming a subset of zero measure) in a space or space-time that nevertheless remains almost everywhere differentiable (such as for example Riemannian manifolds in Einstein’s general relativity). This comes in support of Mandelbrot’s view about the origin of fractals, which are known to be extremely frequent in many natural phenomena that yet seem to be well described by standard differential equations: this could come from the existence of singularities in differentiable physics (see e.g. [MAN 82], Chapter 11). However, the viewpoint of scale relativity theory is more radical, since the main problem we aim at solving in its framework is not the (albeit very interesting) question of the origin of fractals, but the issue of the foundation of the quantum theory and of gauge fields from geometric first principles. As we shall recall, a fractal space-time is not sufficient to reach this goal (specifically concerning the emergence of complex numbers). We need to work in the framework of non-differentiable manifolds, which are indeed fractal (i.e. scale-divergent) as has been shown above. However, the fractality is not central in this context, and it mainly appears as a derived (and very useful) geometric property of such continuous non-differentiable manifolds. 14.3.3. Description of non-differentiable process by differential equations This result is the key for enabling a description of non-differentiable processes in terms of differential equations. We introduce explicitly the resolutions in the expressions of the main physical quantities and, as a consequence, in the fundamental equations of physics. This means that a quantity f , usually expressed in terms of space-time variables x, i.e., f = f (x), must now be described as also depending on resolutions ε, i.e., f = f (x, ε). In other words, rather than considering only the strictly non-differentiable mathematical object f (x), we shall consider its various “approximations” obtained from smoothing or averaging it at various resolutions ε: f (x, ε) =
+∞
−∞
Φ(x, y, ε) f (x + y) dy
(14.2)
470
Scaling, Fractals and Wavelets
where Φ(x, y, ε) is a smoothing function centered on x, for example, a Gaussian function of standard error ε. More generally, we can use wavelet transformations based on a filter that is not necessarily conservative. Such a point of view is particularly well-adapted to applications in physics: any real measurement is always performed at finite resolution (see [NOT 93] for additional comments on this point). In this framework, f (x) becomes the limit for ε → 0 of the family of functions fε (x), i.e., in other words, of the function of two variables f (x, ε). However, whereas f (x, 0) is non-differentiable (in the sense of the non-existence of the limit df /dx when ε tends to zero), f (x, ε), which we call a “fractal function” (and which is, in fact, defined using a class of equivalence that takes into account the fact that ε is a resolution, see [NOT 93]), is now differentiable for all ε = 0. The problem of physically describing of the various processes where such a function f intervenes is now posed differently. In standard differentiable physics, it amounts to finding differential equations involving the derivatives of f with respect to space-time coordinates, i.e., ∂f /∂x, ∂ 2 f /∂x2 , namely, the derivatives which intervene in laws of displacement and motion. The integro-differential method amounts to performing such a local description of space-time elementary displacements, of their effect on quantum physics and then integrating in order to obtain the large scale properties of the system under consideration. Such a method has often been called “reductionist” and it was indeed adapted to traditional problems where no new information appears at different scales. The situation is completely different for systems characterized by a fractal geometry and/or non-differentiability. Such behaviors are found towards very small and very large scales, but also, more generally, in chaotic and/or turbulent systems and probably in basically all living systems. In these cases, new, original information exists at different scales and the project to reduce the behavior of a system to one scale (in general, to a large scale) from its description at another scale (in general, the smallest possible scale, δx → 0) seems to lose its meaning and to become hopeless. Our suggestion consists precisely of giving up such a hope and introducing a new frame of thought where all scales co-exist simultaneously inside a unique scale-space, and are connected together using scale differential equations acting in this scale-space. Indeed, in non-differentiable physics, ∂f (x)/∂x = ∂f (x, 0)/∂x no longer exists. However, physics of the given process will be completely described provided we succeed in knowing f (x, ε), which is differentiable (for x and ε) for all finite values of ε. Such a function of two variables (which is written more precisely, to be complete, as f [x(ε), ε)]) can be the solution of differential equations involving ∂f (x, ε)/∂x but also ∂f (x, ε)/∂ ln ε. More generally, with non-linear laws, the equations of physics take the form of second-order differential equations, which will
Scale Relativity, Non-differentiability and Fractal Space-time
471
then contain, in addition to the previous first derivatives, operators like ∂ 2 /∂x2 (laws of motion), ∂ 2 /∂(ln ε)2 (laws of scale), but also ∂ 2 /∂x∂ ln ε, which correspond to a coupling between motion and scale (see below). What is the physical meaning of the differential ∂f (x, ε)/∂ ln ε? It is simply the variation of the physical quantity f under an infinitesimal scale transformation, i.e., a resolution dilation. More precisely, let us consider the length of a non-differentiable curve L(ε), which can represent more generally a fractal curvilinear coordinate L(x, ε). Such a coordinate generalizes in a non-differentiable and fractal space-time the concept of curvilinear coordinates introduced for curved Riemannian space-time in Einstein’s general relativity [NOT 89]. 14.3.4. Differential dilation operator Let us apply an infinitesimal dilation ε → ε = ε(1+d ) to the resolution. We omit the dependence on x to simplify the notation in what follows, since for the moment we are interested in pure scale laws. We obtain: L(ε ) = L(ε + ε d ) = L(ε) +
∂L(ε) ˜ d )L(ε) ε d = (1 + D ∂ε
(14.3)
˜ is by definition the dilation operator. The comparison of the last two members where D of this equation thus yields: ˜ =ε ∂ = ∂ D ∂ε ∂ ln ε
(14.4)
This well-known form of the infinitesimal dilation operator, obtained by an application of Gell-Mann-Levy method (see [AIT 82]) shows that the “natural” variable for resolution changes is ln ε and that the differential equations of scale to build will indeed involve expressions such that ∂L(x, ε)/∂ ln ε. What will be the form that these equations take? In fact, equations describing the scale dependence of physical beings have already been introduced in physics: these are the renormalization group equations, particularly developed in the framework of Wilson’s “multiple-scale-of-length” approach [WIL 83]. In its simplest form, a “renormalization group”-like equation for a physical quantity L can be interpreted as stating that the variation of L under an infinitesimal scale transformation d ln ε depends only on L itself, in other words, L determines the whole physical behavior including the behavior in scale transformations. This is written ∂L(x, ε) = β(L) ∂ ln ε
(14.5)
Such an equation (and its generalization), the behavior of which we will analyze in more detail later on, is the differential equivalent of the generators in the case of
472
Scaling, Fractals and Wavelets
fractal objects built by iterations (for example, the von Koch curve). However, instead of passing from one stage of the construction to another by means of discrete finite dilations (successive factors in the case of the von Koch curve), we pass from ln ε to ln ε + d ln ε. In other words, the differential calculus made in the scale-space allows us to describe a non-differentiable behavior (in the limit) by differential equations. 14.4. Relativity and scale covariance We complete our current description which is made in terms of space (positions), space-time or phase-space, using a scale space. We now consider that resolutions characterize its space state, just as speeds characterize the state of motion of the coordinate system. The relative nature of temporal and spatial resolution intervals is a universal law of nature: only a ratio of length or time intervals can be defined, never their absolute value, as this is reflected in the need to appeal constantly to the units. This allows us to set the principle of scale relativity, according to which the fundamental laws of nature apply whatever the state of scale of the reference system is. In this framework, we shall call scale covariant the invariance of equations of physics under the transformations of spatio-temporal resolutions (let us note that this expression was introduced by other authors in a slightly different sense, as a generalization of scale invariance). It is also necessary to be careful because of the fact that a multiple covariance must be implemented in such an attempt, since it will be necessary to combine the covariance of motion and the new scale covariance, as well as a covariance under scale-motion coupling. We shall thus develop different types of covariant derivations which should be clearly distinguished: one strictly on the scales, then a “quantum-covariant” derivative which describes the inferred effects on the dynamics by the internal scale structures (which transforms traditional mechanics into quantum mechanics) and finally a covariant derivative which is identified with that of gauge theories and which describes non-linear effects of scale-motion coupling. 14.5. Scale differential equations We now pass on to the next stage and construct scale differential equations with a physical significance, then look at their solutions. For this we shall be guided by an analogy with the construction of the law of motion and by the constraint that such equations must satisfy the scale relativity principle. We shall find, at first, the self-similar fractal behavior at a constant dimension. In a scale transformation, such a law possesses the mathematical structure of the Galileo group and thus satisfies, in a simple way, the relativity principle.
Scale Relativity, Non-differentiability and Fractal Space-time
473
The analogy with motion can be pushed further. We know, on the one hand, that the Galileo group is only an approximation of the Lorentz group (corresponding to the limit c → ∞) and, on the other hand, that both remain a description of an inertial behavior, whereas it is with dynamics that motion physics finds its complexity. The same is true for scale laws. Fractals with constant dimension constitute for scales the counterpart of what the Galilean inertia is for the motion. We can then suggest generalizing the usual dilation and contraction laws in two ways: 1) one way is to introduce a Lorentz group of scale transformation [NOT 92]. In its framework, there appears a finite resolution scale, minimal or maximal, invariant under dilation, which replaces zero or infinity while maintaining their physical properties. We have suggested identifying these scales, respectively, with the Planck length and with the scale of the cosmological constant [NOT 92, NOT 93, NOT 96a]. This situation, however, still corresponds to a linear transformation of scale on the resolutions; 2) another way is to take into account non-linear transformations of scale, i.e., to move to a “scale dynamics” and if possible to a generalized scale relativity [NOT 97a]. We shall consider in what follows some examples of these kind of generalized laws, after finding the standard fractal (scale-invariant) behavior (and the breaking of this symmetry) as a solution of the simplest possible first-order scale differential equation. 14.5.1. Constant fractal dimension: “Galilean” scale relativity Power laws, which are typical of the self-similar fractal behavior, can be identified as the simplest of the laws sought. Let us consider the simplest possible scale equation, which is written in terms of an eigenvalue equation for the dilation operator: ˜ = bL DL
(14.6)
Its solution is a standard divergent fractal law: L = L0 (λ0 /ε)δ
(14.7)
where δ = −b = D − DT , since D is the fractal dimension assumed to be constant and DT is the topological dimension. The variable L can indicate, for example, the length measured on a fractal curve (which will describe particularly a coordinate in the fractal reference system). Such a law corresponds, with regards to scales, to inertia from the point of view of motion. We can verify this easily by applying a resolution transformation to it. Under such a transformation ε → ε , we obtain: ln(L /λ) = ln(L/λ) + δ ln(ε/ε ),
δ = δ
(14.8)
474
Scaling, Fractals and Wavelets
where we recognize the mathematical structure of the Galileo transformation group between the inertial systems: the substitution (motion → scale) results in the correspondences x → ln(L/λ), t → δ and v → ln(ε/ε ). Let us note the manifestation of the relativity of the resolutions from the mathematical point of view: ε and ε intervene only by their ratio, while the reference scale λ0 disappeared in relation (14.8). In agreement with the preceding analysis of the status of resolutions in physics, the scale exponent δ plays the role for the scales which is played by time with regard to motion, and the logarithm of the ratio of resolutions plays the role of velocity. The composition law of dilations, written in logarithmic form, confirms this identification with the Galileo group: ln(ε /ε) = ln(ε /ε ) + ln(ε /ε)
(14.9)
formally identical to Galilean composition of velocities, w = u + v. 14.5.2. Breaking scale invariance: transition scales Statement (14.7) is scale invariant. This invariance is spontaneously broken by the existence of displacement and motion. Let us change the origin of coordinate system. We obtain: L = L0 (λ0 /ε)δ + L1 = L1 [1 + (λ1 /ε)δ ]
(14.10)
where λ1 = λ0 (L0 /L1 )1/δ . Whereas the scale λ0 remains arbitrary, the scale λ1 (which remains relative in terms of position and motion relativity) displays a break in scale symmetry (in other words, of a fractal to non-fractal transition in the space of scales). Indeed, it is easy to establish that, for ε λ1 , we have L ≈ L1 and L no longer depends on resolution, whereas for ε λ1 , we recover the scale dependence given by (14.7), which is asymptotically scale invariant. However, this behavior (equation (14.10)), which thus satisfies the double principle of relativity of motion and scale, is precisely obtained as the solution to the simplest scale differential equation that can be written (first-order equation, depending only on L itself, this dependence being expandable in Taylor series: the preceding case corresponds to simplification a = 0): dL/d ln ε = β(L) = a + bL + · · · .
(14.11)
The solution (14.11) is effectively given by expression (14.10), with δ = −b, L1 = −a/b, knowing that λ1 is an integration constant. Let us note that, if we push the Taylor series further, we obtain a solution yielding several transition scales, in agreement with the behaviors observed for many
Scale Relativity, Non-differentiability and Fractal Space-time
475
natural fractal objects [MAN 82]. Particularly, going up to the second order, we find fractal structures with a lower and higher cut-off. We can also obtain behaviors which are scale-dependent toward the small and large scales, but which become scale-independent at intermediate scales. 14.5.3. Non-linear scale laws: second order equations, discrete scale invariance, log-periodic laws Among the corrections to scale invariance (characterized by power laws), one of them is led to play a potentially important role in many domains, which are not limited to physics. We are talking about the log-periodic laws which can be defined by the appearance of scale exponents or complex fractal dimensions. Sornette et al. (see [SOR 97, SOR 98] and the reference included) have shown that such behavior provides a very satisfactory and possibly predictive model of some earthquakes and market crashes. Chaline et al. [CHA 99] used such laws of scale to model the chronology of major jumps in the evolution of the species, and Nottale et al. [NOT 01a] showed that they also applied to the chronology of the main economic crises since the Neolithic era (see [NOT 00c] for more details). More recently, Cash et al. [CAS 02] showed that these laws describe the chronology of the main steps of embryogenesis and child development. This may be a first step towards a description of the temporal evolution of “crises” (in the general acception of this word), which could appear very general, all the more so as recent works validated these first results [SOR 01]. An intermittency model of this behavior was recently proposed [QUE 00]. Let us show how to obtain a log-periodic correction to power laws [NOT 97b] utilizing scale covariance [NOT 89], i.e. conservation of the form of scale dependent equations (see also [POC 97]). Let us consider a quantity Φ explicitly dependent on resolution, Φ(ε). In the application under consideration, the scale variable is identified with a time interval ε = T − Tc , where Tc is the date of crisis. Let us assume that Φ satisfies a renormalization-group-like first-order differential equation: dΦ − DΦ = 0 d ln ε whose solution is a power law, Φ(ε) ∝ εD .
(14.12)
In the quest to correct this law, we note that directly introducing a complex exponent is not enough since it would lead to large log-periodic fluctuations rather than to a controllable correction to the power laws. So let us assume that the cancellation of difference (14.12) was only approximate and that the second member of this equation actually differs from zero: dΦ − DΦ = χ d ln ε
(14.13)
476
Scaling, Fractals and Wavelets
We require that the new function χ is solution of an equation that keeps the same form as the initial equation: dχ − D χ = 0 d ln ε
(14.14)
Setting D = D + δ, we find that Φ is solution of a second order general equation: dΦ d2 Φ + CΦ = 0 −B (d ln ε)2 d ln ε
(14.15)
where we have B = 2D + δ and C = D(D + δ). This solution is written Φ(ε) = a εD (1 + b εδ ), where b can now be arbitrarily small. Finally, the choice of an imaginary exponent δ = iω yields a solution whose real part includes a log-periodic correction: Φ(ε) = a εD [1 + b cos(ω ln ε)]
(14.16)
Log-periodical fluctuations were also obtained within the approach of scale relativity through a reinterpretation of gauge invariance and of the nature of electromagnetism which can be proposed in this framework (see below and [NOT 96a, NOT 06]). 14.5.4. Variable fractal dimension: Euler-Lagrange scale equations Let us now consider the case of “scale dynamics”. As we have indicated earlier, the strictly scale-invariant behavior with constant fractal dimension corresponds to a free behavior from the point of view of the scale physics. Thus, just as there are forces which imply a variation with the inertial motion, we also expect to see the natural fractal systems displaying distortions compared with self-similar behavior. By analogy, such distortions can, in a first stage, be attributed to the effect of a “scale force” or even a “scale field”. Before introducing this concept, let us recall how we should reverse the viewpoint as regards the meaning of scale variables, in comparison with the usual description of fractal objects. This reversal is parallel, with respect to scales, to that which was operated for motion laws in the conversion from “Aristotelian” laws to Galilean laws. From the Aristotelian viewpoint, time is the measurement of motion: it is thus defined by taking as primary concepts space and velocity. In the same way, fractal dimension is defined, generally, from the “measure” of the fractal object (for example, curve length, surface area, etc.) and from the resolution: “t = x/v”
←→
δ = D − DT = d ln L/d ln(λ/ε)
(14.17)
Scale Relativity, Non-differentiability and Fractal Space-time
477
With Galileo, time becomes a primary variable and velocity is derived from a ratio of space over time, which are now considered on the same footing, in terms of a space-time (which remains, however, degenerated, since the speed limit C is implicitly infinite there). This involves the vectorial character of velocity and its local aspect (finally implemented by its definition like the derivative of the position with respect to time). The same reversal can be applied to scales. The scale dimension δ itself becomes a primary variable, treated on the same footing as space and time, and the resolutions are therefore defined as derivatives from the fractal coordinate and δ (i.e. as a “scale-velocity”): V = ln
λ = d ln L/dδ ε
(14.18)
This new and fundamental meaning given to the scale exponent δ = D − DT , now treated like a variable, makes it necessary to allot a new name to it. Henceforth, we will call it djinn (in preceding articles, we had proposed the word zoom, but this already applies more naturally to the scales transformation themselves, ln(ε /ε)). This will lead us to work in terms of a generalized 5D space, the “space-time-djinn”. In analogy with the vectorial character of velocity, the vectorial character of the zoom (i.e., of the scale transformations) is then apparent because the four spatio-temporal resolutions can now be defined starting from the four coordinates of space-time and of the djinn: v i = dxi /dt
←→
ln
λμ = d ln Lμ /dδ εμ
(14.19)
Note however that, in more recent works, a new generalization of the physical nature of the resolutions is introduced, which attributes a tensorial nature to them, analogous to that of a variance-covariance error matrix [NOT 06, NOT 08]. We could object to this reversal of meaning of the scale variables, that, from the point of view of the measurements, it is only through L and ε that we have access to the djinn δ, which is deduced from them. However, we notice that it is the same for the time variable, which, though being a primary variable, is always measured in an indirect way (through changes of position or state in space). A final advantage of this inversion will appear later on in the attempts to construct a generalized scale relativity. It allows the definition of a new concept, i.e. that of scale-acceleration Γμ = d2 ln Lμ /dδ 2 which is necessary for the passage to non-linear scale laws and to a scale “dynamics”. The introduction of this concept makes it possible to further reinforce the identification of fractals of constant fractal dimension with “scale inertia”. Indeed,
478
Scaling, Fractals and Wavelets
the free scale equation can be written (in one dimension to simplify the writing): Γ = d2 ln L/dδ 2 = 0
(14.20)
It integrates as: d ln L/dδ = ln
λ = constant ε
(14.21)
The constancy of resolution means here that it is independent of the djinn δ. The solution therefore takes the awaited form L = L0 (λ/ε)δ . More generally, we can then make the assumption that the scale laws can be constructed from a least action principle. A scale Lagrange function, L(ln L, V, δ), with V = ln(λ/ε) is introduced, and then a scale action:
δ2
S=
L(ln L, V, δ) dδ
(14.22)
δ1
The principle of stationary action then leads to Euler-Lagrange scale equations: ∂L d ∂L = dδ ∂V ∂ ln L
(14.23)
14.5.5. Scale dynamics and scale force The simplest possible form of these equations corresponds to a cancellation of the second member (absence of scale force), and to the case where the Lagrange function takes the Newtonian form L ∝ V 2 . We once again recover, in this other way, the “scale inertia” power law behavior. Indeed, the Lagrange equation becomes in this case: dV =0 dδ
⇒
V = constant
(14.24)
The constancy of V = ln(λ/ε) means here, as we have already noticed, that it is independent of δ. Equation (14.23) can therefore be integrated under the usual fractal form L = L0 (λ/ε)δ . However, the principal advantage of this representation is that it makes it possible to pass to the following order, i.e., to non-linear scale dynamic behaviors. We consider that the resolution ε can now become a function of the djinn δ. The fact of having
Scale Relativity, Non-differentiability and Fractal Space-time
479
identified the resolution logarithm with a “scale-velocity”, V = ln(λ/ε), then results naturally in defining a scale acceleration: Γ = d2 ln L/dδ 2 = d ln(λ/ε)/dδ
(14.25)
The introduction of a scale force then makes it possible to write a scale analog of Newton’s dynamic equation (which is simply the preceding Lagrange equation (14.23)): d2 ln L (14.26) dδ 2 where μ is a “scale-mass” which measures how the system resists scale force. F = μΓ = μ
14.5.5.1. Constant scale force Let us first consider the case of a constant scale-force. Continuing with the analogy with motion laws, such a force derives from a “scale-potential” ϕ = F ln L. We can write equation (14.26) in the form: d2 ln L =G dδ 2
(14.27)
where G = F/μ = constant. This is the scalar equivalent to parabolic motion in constant gravity. Its solution is a parabolic behavior: V = V0 + G δ,
ln L = ln L0 + V0 δ +
1 G δ2 2
(14.28)
The physical meaning of this result is not clear in this form. Indeed, from the experimental point of view, ln L and possibly δ are functions of V = ln(λ/ε). After redefinition of the integration constants, this solution is therefore expressed in the form: λ L 1 1 λ ln , ln ln2 (14.29) = δ= G ε L0 2G ε Thus, fractal dimension, usually constant, becomes a linear function of the log-resolution and the logarithm of length now no longer varies linearly, but in a parabolic way. This result is potentially applicable to many situations, in all the fields where fractal analysis prevails (physics, chemistry, biology, medicine, geography, etc.). Frequently, after careful examination of scale dependence for a given quantity, the power law model is rejected because of the variation of the slope in the plane (ln L, ln ε). In such a case, the conclusion that the phenomenon considered is not fractal could appear premature. It could, on the contrary, be a non-linear fractal behavior relevant to scale dynamics, in which case the identification and the study of scale force responsible for the distortion would be most interesting.
480
Scaling, Fractals and Wavelets
14.5.5.2. Scale harmonic oscillator Another interesting case of scale potential is that of the harmonic oscillator. In the case where it is “attractive”, the scale equation is written as: ln L + α2 ln L = 0
(14.30)
where the notation indicates the second derivative with respect to the variable δ. Setting α = ln(λ/Λ), the solution is written as: L ln = L0
1/2 ln2 (λ/ε) 1− 2 ln (λ/Λ)
(14.31)
Thus, there is a minimal or maximal scale Λ for the considered system, whereas the slope d ln L/d ln ε (which can no longer be identified with the djinn δ in this non-linear situation) varies between zero and infinity in the field of resolutions allowed between λ and Λ. More interesting still is the “repulsive” case, corresponding to a potential which we can write as ϕ = −(ln L/δ0 )2 /2. The solution is written as: D λ L λ − ln2 (14.32) ln = δ0 ln2 L0 ε Λ This solution is more general than that given in previous publications, where we had considered only the case ln(λ/Λ) = δ0−1 . The interest of this solution is that it again yields asymptotic behavior of very large or very small scales (ε λ or ε λ) the standard solution L = L0 (λ/ε)δ0 , of constant fractal dimension D = 1 + δ0 . On the other hand, this behavior is faced with increasing distortions when the resolution approaches a maximum scale εmax = Λ, for which the slope (which we can identify with an effective fractal dimension minus the topological dimension) becomes infinite. In physics, we suggested that such a behavior could shed new light on the quarks confinement: indeed, within the reinterpretive framework of gauge symmetries as symmetries on the spatio-temporal resolutions (see below), the gauge group of quantum chromodynamics is SU(3), which is precisely the dynamic symmetry group of the harmonic oscillator. Solutions of this type could also be of interest in the biological field, because we can interpret the existence of a maximum scale where the effective fractal dimension becomes infinite, like that of a wall, which could provide models, for example, of cell walls. With scales lower than this maximum scale (for small components which evolve inside the system considered), we tend either towards scale-independence (zero slope) in the first case, or towards “free” fractal behavior with constant slope in the second case, which is still in agreement with this interpretation.
Scale Relativity, Non-differentiability and Fractal Space-time
481
14.5.6. Special scale relativity – log-Lorentzian dilation laws, invariant scale limit under dilations It is with special scale relativity that the concept of “space-time-djinn” takes its full meaning. However, this has only been developed, until now, in two dimensions: one space-time dimension and one for the djinn. A complete treatment in five dimensions remains to be made. The previous comment, according to which the standard fractal laws (in constant fractal dimension) have the structure of the Galileo group, immediately implies the possibility of generalizing of these laws. Indeed, we know since the work of Poincaré [POI 05] and Einstein [EIN 05] that, as regards motion, this group is a particular and degenerated case of Lorentz group. However, we can show [NOT 92, NOT 93] that, in two dimensions, assuming only that the law of searched transformation is linear, internal and invariant under reflection (hypotheses deducible from the only principle of special relativity), we find the Lorentz group as the only physically acceptable solution: namely, it corresponds to a Minkowskian metric. The other possible solution is the Euclidean metric, which correctly yields a relativity group (that of rotations in space), but is excluded in the space-time and space-djinn cases since it is contradictory with the experimental ordering found for velocities (the sum of two positive velocities yields a larger positive velocity) and for scale transformations (two successive dilations yield a larger dilation, not a contraction). In what follows, let us indicate by L the asymptotic part of the fractal coordinate. In order to take into account the fractal to non-fractal transition, it can be replaced in all equations by a difference of the type L − L0 . The new log-Lorentzian scale transformation is written, in terms of the ratio of dilation between the resolution scales ε → ε [NOT 92]: ln(L/L0 ) + δ ln ln(L /L0 ) = & 1 − ln2 / ln2 (λ/Λ) δ =
δ + ln ln(L/L0 )/ ln2 (λ/Λ) & 1 − ln2 / ln2 (λ/Λ)
(14.33)
(14.34)
The law of composition of dilations takes the form: ln
ln(ε/λ) + ln ε = ln ln(ε/λ) λ 1+ ln2 (λ/Λ)
(14.35)
Let us specify that these laws are valid only at scales smaller than the transition scale λ (respectively, at scales larger than it when this law is applied
482
Scaling, Fractals and Wavelets
to very large scales). As we can establish on these formulae, the scale Λ is a resolution scale invariant under dilations, unattainable, (we would need an infinite dilation from any finite scale to reach it) and uncrossable. We proposed to identify it, towards very small scales, with the space and time Planck scale, lP = (G/c3 )1/2 = 1.616 05(10) × 10−35 m and tP = lP /c, which would then own all the physical properties of the zero point while remaining finite. In the macroscopic case, it is identified to the cosmic length scale given by the inverse of the root of the cosmological constant, LU = Λ−1/2 [NOT 93, NOT 96a, NOT 03]. We have theoretically predicted this scale to be LU = (2.7761 ± 0.0004) Gpc [NOT 93], and the now observed value, LU (obs) = (2.72 ± 0.10) Gpc, is in very good agreement with this prediction (see [NOT 08] for more details). This type of “log-Lorentzian” law was also used by Dubrulle and Graner [DUB 96] in turbulence models, but with a different interpretation of the variables. To what extent does this new dilation law change our view of space-time? At a certain level, it implies a complication because of the need for introducing the fifth dimension. Thus, the scale metrics is written with two variables: λ0 (14.36) dσ 2 = dδ 2 − (d ln L)2 /C02 , with; C0 = ln Λ The invariant dσ defines a “proper djinn”, which means that, although the effective fractal dimension, given by D = 1 + δ according to (14.34), became variable, the fractal dimension remained constant in the proper reference system. However, we can also note that the fractal dimension now tends to infinity when the resolution interval tends to the Planck scale. While going to increasingly small resolutions, a fractal dimension will thus successively pass the values 2, 3, 4, which would make it possible to cover a surface, then space, then space-time using a single coordinate. It is thus possible to define a Minkowskian space-time-djinn requiring, in adequate fractal reference systems, only two dimensions on very small scales. By tending towards large resolutions, the space-time-djinn metric signature (+, −, −, −, −) sees its fifth dimension vary less and less to become almost constant on scales currently accessible to accelerators (see [NOT 96a, Figure 4]). It finally vanishes beyond the Compton scale of the system under consideration, which is identified with the fractal to non-fractal transition in rest frame. At this scale the temporal metric coefficient also changes sign, which generates the traditional Minkowskian space-time of metric signature (+, −, −, −). 14.5.7. Generalized scale relativity and scale-motion coupling This is a vast field of study. We saw how we could introduce non-linear scale transformations and a scale dynamics. This approach is, however, only a first step towards a deeper “entirely geometric” level in which scale forces are but manifestations of the fractal and non-differentiable geometry. This level of
Scale Relativity, Non-differentiability and Fractal Space-time
483
description also implies taking resolutions into consideration, which would in turn depend on space and time variables. The first aspect leads to the new concept of scale field, which corresponds to a distortion in scale space compared with usual self-similar laws [NOT 97b]. It can also be represented in terms of curved scale space. It is intended that this approach will be developed in more detail in future research. The second aspect, of which we now point out some of the principal results, leads to a new interpretation of gauge invariance and thus gauge fields themselves. This in turn proves the existence of general relations between mass scale and coupling constants (generalized charge) in particle physics [NOT 96a]. One of these relations makes it possible, as we will see, to predict the value of the electron mass theoretically (considered as primarily of electromagnetic origin, in this approach), as a function of its charge. Lastly, to be complete, let us point out that even these two levels are only transitory stages from the perspective of the theory we intend to build. A more comprehensive version will deal with motion and scales on the same footing and thus see the principle of scale relativity and motion unified into a single principle. This will be done by working in a 5D space-time-djinn provided with a metric, in which all the transformations between the reference points identify with rotations: in the planes (xy, yz, zx), they are ordinary rotations of 3D space; in the planes (xt, yt, zt) they are motion effects (which are reduced to Lorentz boosts when the space-time-djinn is reduced to 4D space time on macroscopic scales); finally, four rotations in the planes (xδ, yδ, zδ, tδ) identify with changes of space-time resolutions. 14.5.7.1. A reminder about gauge invariance At the outset, let us recall briefly the nature of the problem set by gauge invariance in current physics. This problem already appears in traditional electromagnetic theory. This theory, starting from experimental constraints, has led to the introduction of a four-vector potential, Aμ , then of a tensorial field given by the derivative of the potential, Fμν = ∂μ Aν − ∂ν Aμ . However, Maxwell field equations (contrary to what occurs in Einstein’s general relativity for motion in a gravitational field) are not enough to characterize the motion of a charge in an electromagnetic field. It is necessary to add the expression for the Lorentz force, which is written in 4D form f μ = (e/c)F μν uν , where uν is the four-velocity. It is seen that only the fields intervene in this and not the potentials. This implies that the motion will be unaffected by any transformation of potentials which leave the fields invariant. It is obviously the case, if we add to the four-potential the gradient of any function of coordinates: Aμ = Aμ + ∂μ χ(x, y, z, t). This transformation is called, following Weyl, gauge transformation and the invariance law, which results from it is the gauge invariance. What was apparently only a simple latitude left in the choice of the potentials takes within the quantum mechanics framework a deeper meaning. Indeed, gauge
484
Scaling, Fractals and Wavelets
invariance in quantum electrodynamics becomes an invariance under the phase transformations of wave functions and is linked to current conservation using Noether’s theorem. It is known that this theorem connects fundamental symmetries to the appearance of conservative quantities, which are manifestations of these symmetries (thus the existence of energy results from the uniformity of time, the momentum of space homogenity, etc.). In the case of electrodynamics, it appears that the existence of the electric charge itself results from gauge symmetry. This fact is apparent in the writing of the Lagrangian which describes Dirac’s electronic field coupled to an electromagnetic field. This Lagrangian is not invariant under the gauge transformation of electromagnetic field Aμ = Aμ + ∂μ χ(x), but becomes invariant, provided it is completed by a local gauge transformation on the phase of the electron wave function, ψ → e−ieχ(x) ψ. This result can be interpreted by saying that the existence of the electromagnetic field (and its gauge symmetry) implies that of the electric charge. However, although impressive (particularly through its capacity for generalization to non-Abelian gauge theories which includes weak and strong fields and allows description of weak electric fields), this progress in comprehending the nature of the electromagnetic field and the charge remains incomplete, in our opinion. Indeed, the gauge transformation keeps an arbitrary nature. The essential point is that no explicit physical meaning is given to function χ(x): however, this is the conjugate variable of the charge in the electron phase (just as energy is the conjugate of time and momentum of space), so that it is from an understanding of its nature that an authentic comprehension of the nature of charge could arise. Moreover, the quantization of charge remains misunderstood within the framework of the current theory. However, its conjugate variable still holds the key to this problem. The example of angular momentum is clear in this regard: its conjugate quantity is the angle, so that its conservation results from the isotropy of space. Moreover, the fact that angle variations cannot exceed 2π implies that the differences in angular momentum are quantized in units of . In the same way, we can expect that the existence of limitation on the variable χ(x), once its nature is elucidated, would imply charge quantization and leads to new quantitative results. As we will see, scale relativity makes it possible indeed to make proposals in this direction. 14.5.7.2. Nature of gauge fields Let us consider an electron or any other charged particle. In scale relativity, we identify “particles” with the geodesics of a non-differentiable space-time. These paths are characterized by internal (fractal) structures (beyond the Compton scale λc = /mc of the particle in rest frame). Now consider any one of these structures (which is defined only in a relative way), lying at a resolution ε < λc . In a displacement of the electron, the relativity of scales will imply the appearance of a field induced by this displacement.
Scale Relativity, Non-differentiability and Fractal Space-time
485
To understand it, we can take as model an aspect of the construction, from the general relativity of motion, of Einstein’s gravitation theory. In this theory, gravitation is identified with the manifestation of the curvature of space-time, which results in vector rotation of geometric origin. However, this general rotation of any vector during a translation can result simply from the only generalized relativity of motion. Indeed, since space-time is relative, a vector V μ subjected to a displacement dxρ cannot remain identical to itself (the reverse would mean absolute space-time). It will thus undergo a rotation, which is written, by using Einstein summation convention on identical lower and upper indices, δV μ = Γμνρ V ν dxρ . Christoffel symbols Γμνρ , which emerge naturally in this transformation, can then be calculated, while processing this construction, in terms of derivatives of the metric potentials gμν , which makes it possible to regard them as components of the gravitational field generalizing Newton’s gravitational force. Similarly, in the case of fractal electron structures, we expect that a structure, which was initially characterized by a certain scale, jumps to another scale after the electron displacement (if not, the scale space would be absolute, which would be in contradiction with the principle of scale relativity). A dilation field of resolution induced by the translations is then expected to appear, which is written: e
δε = −Aμ δxμ ε
(14.37)
This effect can be described in terms of the introduction of a covariant derivative: eDμ ln(λ/ε) = e∂μ ln(λ/ε) + Aμ
(14.38)
Now, this field of dilation must be defined irrespective of the initial scale from which we started, i.e., whatever the substructure considered. Therefore, starting from another scale ε = ε (here we take into account, as a first step, only the Galilean scale relativity law in which the product of two dilations is the standard one), we get during the same translation of the electron: e
δε = −Aμ δxμ ε
(14.39)
The two expressions for the potential Aμ are then connected by the relation: Aμ = Aμ + e ∂μ ln
(14.40)
where ln (x) = ln(ε/ε ) is the relative scale state (it depends only on the ratio between resolutions ε and ε ) which depends now explicitly on the coordinates. In this regard, this approach already comes under the framework of general scale relativity and of non-linear scale transformations, since the “scale velocity” has been redefined as a first derivative of the djinn, ln = d ln L/dδ, so that equation (14.40) involves a second-order derivative of fractal coordinate, d2 ln L/dxμ dδ.
486
Scaling, Fractals and Wavelets
If we consider a translation along two different coordinates (or, in an equivalent way, displacement on a closed loop), we may write a commutator relation: e(∂μ Dν − ∂ν Dμ ) ln = (∂μ Aν − ∂ν Aμ )
(14.41)
This relation defines a tensor field Fμν = ∂μ Aν − ∂ν Aμ , which, unlike Aμ , is independent of the initial scale from where we started. We recognize in Fμν the analog of an electromagnetic field, in Aμ that of an electromagnetic potential, in e that of electric charge and in equation (14.40) the property of gauge invariance which, in accordance with Weyl’s initial ideas and their development by Dirac [DIR 73], recovers its initial status of scale invariance. However, equation (14.40) represents progress compared with these early attempts and with the status of gauge invariance in today’s physics. Indeed, the gauge function χ(x, y, z, t) which intervenes in the standard formulation of gauge invariance, Aμ = Aμ + e ∂μ χ and which has, up to now, been considered as arbitrary, is identified with the logarithm of internal resolutions, χ = ln ρ(x, y, z, t). Another advantage with respect to Weyl’s theory is that we are now allowed to define four different and independent dilations along the four space-time resolutions instead of only one global dilation. Therefore, we expect that the field above (which corresponds to a group U(1) of electromagnetic field type) is embedded into a larger field, in accordance with the electroweak theory and grand unification attempts. In the same way, we expect that the charge e is an element of a more complicated, “vectorial” charge. These early remarks have now developed into a full theory non-Abelian gauge fields [NOT 06], in which the main tools and results of Yang-Mills theories can be recovered as a manifestation of fractal geometry. Moreover, this generalized approach makes it possible to suggest a new and more completely unified preliminary version of electroweak theory [NOT 00b], √ in which the Higgs boson mass can be predicted theoretically (we find mH = 2mW = 113.73 ± 0.06 GeV, where mW is the W gauge boson mass). Moreover, our interpretation of gauge invariance yields new insights about the nature of the electric charge and, when it is combined with the Lorentzian structure of dilations of special scale-relativity, it makes it possible to obtain new relations between the charges and the masses of elementary particles [NOT 94, NOT 96a], as recalled in what follows. 14.5.7.3. Nature of the charges In gauge transformation Aμ = Aμ − ∂μ χ, the wave function of an electron of charge e becomes: ψ = ψ eieχ
(14.42)
Scale Relativity, Non-differentiability and Fractal Space-time
487
In this expression, the essential role played by the gauge function is clear. It is the conjugate variable of the electric charge, in the same way as position, time and angle are conjugate variables of momentum, energy and angular momentum, (respectively) in the expressions of the action and/or the quantum phase of a free particle, θ = i(px − Et + σϕ)/. Our knowledge of what constitutes energy, momentum and angular momentum comes from our understanding of the nature of space, time, angles and their symmetry (translations and rotations), using Noether’s theorem. Conversely, the fact that we still do not really know what an electric charge is, despite all the development of gauge theories comes, in our view, from the fact that the gauge function χ is considered devoid of physical meaning. We have interpreted in the previous section the gauge transformation as a scale transformation of resolution, ε → ε , ln = ln(ε/ε ). In such an interpretation, the specific property that characterizes a charged particle is the explicit scale dependence of its action and therefore of its wave function in function of resolution. The result is that the electron’s wave function is written: e2
ψ = ψ ei c ln
(14.43)
Since, by definition (in the system of units where the permittivity of vacuum is 1): e2 = 4παc
(14.44)
ψ = ψ ei4πα ln
(14.45)
equation (14.43) becomes:
Now considering the wave function of the electron as an explicitly dependent function on resolution ratios, we can write the scale differential equation of which ψ is a solution as: −i
∂ψ = eψ ln )
∂( ec
(14.46)
˜ = −i∂/∂( e ln ) a dilation operator. Equation (14.46) can We recognize in D c then be read as an eigenvalue equation: ˜ = eψ Dψ
(14.47)
In such a framework, the electric charge is understood as the conservative quantity that comes from the new scale symmetry, namely, the uniformity of resolution variable ln ε.
488
Scaling, Fractals and Wavelets
14.5.7.4. Mass-charge relations In the previous section, we have stated the wave function of a charged particle in the form: ψ = ψ ei4πα ln
(14.48)
In the Galilean case such a relation leads to no new result, since ln is unlimited. However, in the special scale-relativistic framework (see previous section), scale laws become Lorentzian below the Compton scale λc of the particle, then ln becomes limited by the fundamental constant C = ln(λc /lP ), which characterizes the considered particle (where lP = (G/c3 )1/2 is the Planck length scale). This implies a quantization of the charge, which amounts to relation 4παC = 2kπ, i.e.: k (14.49) 2 where k is an integer. This equation defines a general form for relations between masses and charges (coupling constants) of elementary particles. αC =
For example, in the case of the electron, the ratio of its Compton length /me c to Planck length is equal to the ratio of Planck mass (mP = (c/G)1/2 ) to electron mass. Moreover, within the framework of the electroweak theory, it appears that the coupling constant of electrodynamics at low energy (i.e., fine structure constant) results from a “running” electroweak coupling dependent on the energy scale. This running coupling is decreased by a factor 38 owing to the fact that the gauge bosons W and Z become massive and no longer contribute to the interaction at energies lower than their mass energy. We thus obtain a mass-charge relation for the electron which is written: mP 8 α ln =1 3 me
(14.50)
Such a theoretical relation between the mass and the electron charge is supported by the experimental data which leads to a value 1.0027 for this product and becomes 1.00014 when taking the threshold effects at Compton transition into account. Such a relation accounts for many other structures observed in particle physics and suggests solutions to the questions of the origin of the masses of certain particles, of the coupling values and of the hierarchy problem between electroweak and grand unification scales [NOT 96a, NOT 00a, NOT 00b]. 14.6. Quantum-like induced dynamics 14.6.1. Generalized Schrödinger equation In scale relativity, as we have seen, it is necessary to generalize the concept of space-time and once again to work within the framework of fractal space-time. We
Scale Relativity, Non-differentiability and Fractal Space-time
489
consider the coordinate systems (and paths, particularly fractal space geodesics) which are themselves fractal, i.e., having internal structures at all scales. We concentrated, in the preceding sections, on possible descriptions of such structures, which relates to scale space. We will now briefly consider, to finish, its induced effects on displacements in ordinary space. The combination of these effects leads to the introduction of a description tool of the quantum mechanical type. In its framework, we give up the traditional description in terms of initial conditions and deterministic individual trajectories, for the benefit of a statistical description in terms of probability amplitudes. Let us point out the essence of the method used within the framework of scale relativity to pass from a traditional dynamics to a quantum-like dynamics. The three minimal conditions, which make it possible to transform the fundamental equation of dynamics into a Schroedinger equation are as follows: 1) there is an infinity of potential paths; this first condition is a natural outcome of non-differentiability and space fractality, if the paths could be identified with the geodesics of this space; 2) the paths are fractal curves (dimension D = 2, which corresponds to a complete loss of information on elementary displacements playing a special role here). In the case of a space and its geodesics, the fractal character of the space implies the fractality of its geodesics directly; 3) there is irreversibility at the infinitesimal level, i.e., non-invariance in the reflection of time differential element dt → −dt. Again, this condition is an immediate consequence of the abandonment of the differentiability hypothesis. Let us recall that one of the fundamental tools, which enable us to manage non-differentiability, consists of reinterpreting differential elements as variables. Thus, the space coordinate becomes a fractal function X(t, dt) and its velocity, although becoming undefined at the limit dt → 0, is now also defined as a fractal function. The difference is that there are two definitions instead of one (that are transformed one into the other by the reflection dt ↔ −dt), and thus the velocity concept becomes two-valued: X(t + dt, dt) − X(t, dt) (14.51) V+ (t, dt) = dt X(t, dt) − X(t − dt, dt) V− (t, dt) = (14.52) dt The first condition leads us to use a “fluid”-like description, where we no longer consider only the velocity of an individual path, but rather the mean velocity field v[x(t), t] of all the potential paths. The second condition brings us back to preceding works concerning scale laws satisfying the relativity principle. We saw that, in the simplest “scale-Galilean” case,
490
Scaling, Fractals and Wavelets
the coordinate (which is a solution of a scale differential equation) decomposes in the form of a traditional, scale-independent, differentiable part and of a fractal, non-differentiable part. We use this result here, after having differentiated the coordinate. This leads us to decompose the elementary displacements dX = dx + dξ in the form of a mean scale-independent, dx = v dt, and of a fluctuation dξ characterized by a law of fractal behavior, dξ ∝ dt1/DF , where DF is the fractal dimension of the path. The third condition implies, as we have seen, a two-valuedness of the velocity field. Defined by V = dX/dt = v + dξ/dt, it decomposes, in the case of both V+ and V− , in terms of a non-fractal component v (thus derivable in ordinary sense) and of a divergent fractal component dξ/dt, of zero-mean. We are thus led to introduce a 3D twin process: i i = dxi± + dξ± dX± i i dt, #dξ± $ = 0 and: in which dxi± = v± / i j 0 2D 2−2/DF dξ± dξ± = ±δ ij dt dt dt
(14.53)
(14.54)
(c = 1 is used here to simplify the writing; δ ij is a Kronecker symbol). The symbol D is a fundamental parameter of scale, which characterizes fractal trajectories behavior (it is nothing other than a different notation for the fractal to non-fractal transition scale introduced previously). This parameter determines the essential transition, which appears in such a process between the fractal behavior on a small scale (where the fluctuations dominate) and the non-fractal behavior on a large-scale (where the mean traditional motion dominates). A natural representation of the two-valuedness of variables due to irreversibility consists of using complex numbers (we can show that this choice is “covariant” in the sense that it preserves the form of the equations [CEL 04]). A complex time derivative operator is defined (which relates to the scale-independent differentiable parts): 1 d+ + d− d+ − d− dˆ = −i dt 2 dt dt
(14.55)
Then we define an average complex velocity which results from the action of this operator on the position variable: Vi =
i i v i − v− v i + v− dˆ i x = V i − i Ui = + −i + dt 2 2
(14.56)
Note that, in more recent works, we have constructed such an operator from the whole velocity field, including the non-differentiable part, and still obtained
Scale Relativity, Non-differentiability and Fractal Space-time
491
the standard Schrödinger equation as the equation of motion [NOT 05, NOT 07], therefore allowing for the possible existence of fractal and non-differentiable wave functions in quantum mechanics [BER 96]. After having defined the laws of elementary displacements in such a fractal and locally irreversible process, it is necessary for us now to analyze the effects of these displacements on other physical functions. Let us consider a differentiable function f (X(t), t). Its total derivative with respect to time is written: ∂f dX 1 ∂ 2 f dX i dX j df = + ∇f. + dt ∂t dt 2 ∂X i ∂X j dt
(14.57)
We may now calculate the (+) and (−) derivatives of f . In this procedure, the mean value of dX/dt amounts to d± x/dt = v± , while #dX i dX j $ is reduced to j i dξ± $. Finally, in the particular case when the fractal dimension of the paths is #dξ± DF = 2, we have #dξ 2 $ = 2Ddt, and the last term of equation (14.57) is transformed into a Laplacian. We obtain in this case: d± f /dt = (∂/∂t + v± .∇ ± DΔ)f
(14.58)
Although we consider here only the fractal dimension DF = 2, we recall that all the results obtained can be generalized to other values of the dimension [NOT 96a]. By combining these two derivatives, we obtain the complex derivation operator with respect to time: ∂ dˆ = + V.∇ − i DΔ dt ∂t
(14.59)
It has two imaginary terms, −iU.∇ and −iDΔ, in addition to the standard Eulerian total derivative operator, d/dt = ∂/∂t + V.∇. We can now rewrite the fundamental dynamic equation using this derivative operator: this will then automatically take into account the new effects considered. It keeps the Newtonian form: m
dˆ2 x = −∇φ dt2
(14.60)
where φ is an exterior potential. If the potential is either zero or a gravitational potential, this equation is nothing other than a geodesic equation. We have therefore implemented a generalized equivalence principle, thanks to which the motion (gravitational and quantum) remains locally described under an inertial form: indeed, as we will see now, this equation can be integrated under the form of a Schrödinger equation.
492
Scaling, Fractals and Wavelets
More generally, we can generalize Lagrangian mechanics with this new tool (see [NOT 93, NOT 96a, NOT 97a, NOT 07]). The complex character of velocity V implies that the same is true of the Lagrange function and therefore of the action S. The wave function ψ is then introduced very simply as a re-expression of this complex action: ψ = eiS/2mD
(14.61)
It is related to complex velocity in the following manner: V = −2iD∇(ln ψ)
(14.62)
We can now change the descriptive tool and write the Euler-Newton equation (14.60) in terms of this wave function: 2imD
dˆ (∇ ln ψ) = ∇φ dt
(14.63)
After some calculations, this equation can be integrated in the form of a Schrödinger equation [NOT 93]: D2 Δψ + iD
φ ∂ ψ− ψ=0 ∂t 2m
(14.64)
We find the standard quantum mechanics equation by selecting D = /2m. By setting that ψψ † = , we find that the imaginary part of this equation is the continuity equation: ∂ /∂t + div ( V ) = 0 which justifies the interpretation of
(14.65)
as a probability density [NOT 05, NOT 07].
14.6.2. Application in gravitational structure formation Physics has for a long time been confronted with the problem of the very non-homogenous spacial distribution of matter in the universe. This distribution of spatial structures often show a hierarchy of organization, whether it is in the microscopic domain (quarks in the nucleons, nucleons in the nucleus, nucleus and electrons in the atom, atoms in the molecule, etc.) or macroscopic domain (stars and their planetary system, star groups and clusters gathering with the interstellar matter, itself fractal, in galaxies which form groups and clusters of galaxies, which belong to superclusters of galaxies, themselves subsets of the large scale structures of the universe). What is striking, in these two cases, is that it is vacuum rather than matter which dominate, even on very large scales where we thought we would find a homogenous distribution.
Scale Relativity, Non-differentiability and Fractal Space-time
493
The theory of scale relativity was built, among other aims, to deal with questions of scale structuring. We take into account an explicit intervention of the observation scales (which amounts to working within the framework of a fractal geometry), or more generally of the scales which are characteristic of the phenomena under consideration, as well as relations between these scales, by the introduction of a resolution space. As we saw, such a description of structures over all the scales (or on a broad range) induces a new dynamics whose behavior becomes quantum rather than traditional. However, the conditions under which Newton’s equation is integrated in the form of a Schrödinger equation (which correspond to a complete loss of information on individual trajectories, from the viewpoint of angles, space and time) do not manifest themselves only on microscopic scales. Certain macroscopic systems, such as protoplanetary nebulae, which created our solar system, could satisfy these conditions and thus be described statistically by a Schrödinger-type equation (but with, of course, an interpretation different from that of standard quantum mechanics). Such a dynamics leads naturally to a “morphogenesis” [DAR 03] since it generates organized structures in a hierarchical way, dependent on the external conditions (forces and boundary conditions). A great example of the application of such an approach is in planetary system formation. It is fascinating that theoretical predictions about it could be made [NOT 93], then validated in our solar system, several years before the discovery [MAY 95, WOL 94] of the first extrasolar planets. We theoretically predicted that the distribution of semi-major axes of planetary orbits should show probability peaks for values an /GM = (n/w0 )2 , where M is the mass of the star, w0 = 144 km/s is a universal constant which characterizes the structuring of the inner solar system and which is observed (with its multiples and sub-multiples) from the planetary scales to extragalactic scales, and n is an integer. It is also expected that the eccentricities show probability peaks for values e = k/n, where k is an integer ranging between 0 and n − 1. Since then more than 250 exoplanets have been discovered, of which the observed distribution of semi-major axes (Figure 14.1) and eccentricities (Figure 14.2) show a highly significant statistical agreement with the theoretically awaited probability distributions [NOT 96b, NOT 97c, NOT 00d, NOT 01b, DAR 03]. 14.7. Conclusion The present contribution has mainly focused on the detailed principle and theoretical development of the “scale-relativistic” approach. However, we have not been able to touch on everything. For example, the construction of an equation of the Schrödinger type starting from the abandonment of differentiability, explicitly shown above in the case of Newton’s fundamental equation of motion, can be generalized in all cases where the equations of traditional physics could be put in the form of Euler-Lagrange equations. This was done explicitly for the equations of rotational
494
Scaling, Fractals and Wavelets
a/M (A.U. / M ) 0.043
Number
7
0.171
0.385
0.685
1.07
1.54
2.09
6
7
5 3 1
1
2
3 4 5 4.83 (a/M)1/2
Figure 14.1. Histogram of the observed distribution of variable n ˜ = 4.83 (a/M )1/2 where a indicates the semi-major axis and M star mass (in solar system units, i.e., astronomical unit AU and solar mass M ), for the recently discovered exoplanets and the planets of our inner solar system. We theoretically expect probability peaks for integer values of this variable. The probability of obtaining such a statistical agreement by chance is lower than 4 × 10−5
16 14
Number
12 10 8 6 4 2 0
1
2
3
4
5
6
n.e ˜ = n.e, where m is the principal quantum Figure 14.2. Histogram of the distribution of k number (which characterizes the semi-major axes) and e eccentricity, for the exoplanets and the planets of the inner solar system. The theory predicts probability peaks for integer values of this variable. The probability of obtaining such an agreement by chance is lower than 10−4 . Combined probability to obtain by chance the two distributions (semi-major axes and eccentricities) is 3 × 10−7 , i.e., a level of statistical significance reaching 5σ
Scale Relativity, Non-differentiability and Fractal Space-time
495
motion of a solid, for the equation of motion with a dissipation function, for Euler and Navier-Stokes equations, or even for scalar field equations [NOT 97a, NOT 08]. Among the possible generalizations of the theory, we can also mention abandoning the differentiability, not only in the usual space (which leads, as we saw, to the introduction of a scale-space governed by differential equations acting on scale variables, in particular on the spatio-temporal resolutions), but also in the space of scales itself. All the previous construction can again apply to this deeper level of description. This leads to the introduction of a “scale-quantum mechanics” [NOT 08]. In this framework, which is equivalent to a “third quantization,” fractal “objects” of a new type can be defined: rather than having structures at well-defined scales (the case of the ordinary fractal objects), or than having variable scale structures described by traditional laws (the case of “scale-relativistic” fractals considered in this chapter), they are now characterized by an amplitude of probability for scale ratios (“quantum” fractals). With regard to the applications of this approach, we gave only two of their examples, concerning electron mass and planetary systems. Let us recall, nevertheless, that it could be applied successfully to a large number of problems of physics and astrophysics which were unresolved with the usual methods, and that it also allowed theoretical prediction of structures and of new relations [NOT 96a, NOT 08]. Thus, the transformation of fundamental dynamic equation into a Schrödinger equation under very general conditions (loss of information on the individual paths and irreversibility) leads to a renewed comprehension of the formation and evolution of gravitational structures. This method, besides semi-major axes and eccentricities of planets discovered around solar-type stars, briefly considered earlier, was also applied successfully to the three planets observed around pulsar PSR 1257+12 [NOT 96b], to obliquities and inclinations of planets and satellites of the solar system [NOT 98a], satellites of giant planets [HER 98], double stars, double galaxies, distribution of the galaxies on a very large scale and other gravitational structures [NOT 96a, NOT 98b, DAR 03]. 14.8. Bibliography [AIT 82] A ITCHISON I., An Informal Introduction to Gauge Field Theories, Cambridge University Press, Cambridge, 1982. [BEN 00] B EN A DDA F., C RESSON J., “Divergence d’échelle et différentiabilité”, Comptes rendus de l’Académie des sciences de Paris, série I, vol. 330, p. 261–264, 2000. [BER 96] B ERRY M.V., “Quantum fractals in boxes”, J. Phys. A: Math. Gen., vol. 29, p. 6617–6629, 1996. [CAS 02] C ASH R., C HALINE J., N OTTALE L., G ROU P., “Human development and log-periodic laws”, C.R. Biologies, vol. 325, p. 585, 2002.
496
Scaling, Fractals and Wavelets
[CEL 04] C ÉLÉRIER M.N., N OTTALE L., “Quantum-classical transition in scale relativity”, J. Phys. A: Math. Gen. vol. 37, p. 931, 2004. [CHA 99] C HALINE J., N OTTALE L., G ROU P., “Is the evolutionary tree a fractal structure?”, Comptes rendus de l’Académie des sciences de Paris, vol. 328, p. 717, 1999. [DAR 03] DA ROCHA D., N OTTALE L., “Gravitational structure formation in scale-relativity”, Chaos, Solitons & Fractals, vol. 16, p. 565, 2003. [DIR 73] D IRAC P.A.M., “Long range forces and broken symmetries”, Proc. Roy. Soc. Lond., vol. A333, p. 403–418, 1973. [DUB 96] D UBRULLE B., G RANER F., “Possible statistics of scale invariant systems”, J. Phys. (Fr.), vol. 6, p. 797–816, 1996. [EIN 05] E INSTEIN A., “Zur Elektrodynamik bewegter Körper”, Annalen der Physik, vol. 17, p. 891–921, 1905. [EIN 16] E INSTEIN A., “Die Grundlage der allgemeinen Relativitätstheorie” Annalen der Physik, vol. 49, p. 769–822, 1916. [ELN 95] E L NASCHIE M.S., ROSSLER O.E., P RIGOGINE I. (Eds.), Quantum Mechanics, Diffusion, and Chaotic Fractals, Pergamon, Cambridge, p. 93, 1995. [HER 98] H ERMANN R., S CHUMACHER G., G UYARD R., “Scale relativity and quantization of the solar system. Orbit quantization of the planet’s satellites”, Astronomy and Astrophysics, vol. 335, p. 281, 1998. [MAN 75] M ANDELBROT B., Les objets fractals, Flammarion, Paris, 1975. [MAN 82] M ANDELBROT B., The Fractal Geometry of Nature, Freeman, San Francisco, 1982. [MAY 95] M AYOR M., Q UELOZ D., “A Jupiter-mass companion to a solar-type star”, Nature, vol. 378, p. 355–359, 1995. [NOT 84] N OTTALE L., S CHNEIDER J., “Fractals and non-standard analysis”, Journal of Mathematical Physics, vol. 25, p. 1296, 1984. [NOT 89] N OTTALE L., “Fractals and the quantum theory of space-time”, International Journal of Modern Physics, vol. A4, p. 5047, 1989. [NOT 92] N OTTALE L., “The theory of scale relativity”, International Journal of Modern Physics, vol. A7, p. 4899, 1992. [NOT 93] N OTTALE L., Fractal Space-Time and Microphysics: Towards a Theory of Scale Relativity, World Scientific, Singapore, 1993. [NOT 94] N OTTALE L., “Scale relativity: first steps toward a field theory”, in D IAZ A LONSO J., L ORENTE PARAMO M. (Eds.), Relativity in General, E.R.E.’93, Spanish relativity meetings, Editions Frontières, p. 121, 1994. [NOT 96a] N OTTALE L., “Scale relativity and fractal space-time: application to quantum physics, cosmology and chaotic systems”, Chaos, Solitons, and Fractals, vol. 7, p. 877, 1996. [NOT 96b] N OTTALE L., “Scale-relativity and quantization of extrasolar planetary systems”, Astronomy and Astrophysics Letters, vol. 315, p. L9, 1996.
Scale Relativity, Non-differentiability and Fractal Space-time
497
[NOT 97a] N OTTALE L., “Scale relativity and quantization of the universe. I. Theoretical framework”, Astronomy and Astrophysics, vol. 327, p. 867, 1997. [NOT 97b] N OTTALE L., “Scale relativity”, in D UBRULLE B., G RANER F., S ORNETTE D. (Eds.), Scale Invariance and Beyond, Les Houches workshop, EDP Sciences and Springer, p. 249, 1997. [NOT 97c] N OTTALE L., S CHUMACHER G., G AY J., “Scale relativity and quantization of the solar system”, Astronomy and Astrophysics, vol. 322, p. 1018, 1997. [NOT 98a] N OTTALE L., “Scale relativity and quantization of planet obliquities”, Chaos, Solitons, and Fractals, vol. 9, p. 1035, 1998. [NOT 98b] N OTTALE L., S CHUMACHER G., “Scale relativity, fractal space-time, and gravitational structures”, in N OVAK M.M. (Eds.), Fractals and Beyond: Complexities in the Sciences, World Scientific, p. 149, 1998. [NOT 00a] N OTTALE L., “Scale relativity and non-differentiable fractal space-time”, in S IDHARTH B.G., A LTAISKY M. (Eds.), Frontiers of Fundamental Physics 4, Kluwer Academic and Plenum Publishers, International Symposia on Frontiers of Fundamental Physics, 2000. [NOT 00b] N OTTALE L., “Scale relativity, fractal space-time, and morphogenesis of structures”, in D IEBNER H., D RUCKREY T., W EIBEL P. (Eds.), Sciences of the Interface, Genista, Tübingen, p. 38, 2000. [NOT 00c] N OTTALE L., C HALINE J., G ROU P., Les arbres de l’évolution, Hachette, Paris, 2000. [NOT 00d] N OTTALE L., S CHUMACHER G., L EFÊVRE E.T., “Scale relativity and quantization of exoplanet orbital semi-major axes”, Astronomy and Astrophysics, vol. 361, p. 379, 2000. [NOT 01a] N OTTALE L., C HALINE J., G ROU P., “On the fractal structure of evolutionary trees”, in L OSA G. (Eds.), Fractals in Biology and Medicine, Birkhäuser Press, Mathematics and Biosciences in Interaction, 2001. [NOT 01b] N OTTALE L., T RAN M INH N., “Theoretical prediction of orbits of planets and exoplanets”, Scientific News, Paris Observatory, 2002, http://www.obspm.fr/actual/ nouvelle/nottale/nouv.fr.shtml. [NOT 03] N OTTALE L., “Scale-relativistic cosmology”, Chaos, Solitons & Fractals, vol. 16, p. 539, 2003. [NOT 05] N OTTALE L., “Origin of complex and quaternionic wavefunctions in quantum mechanics: the scale-relativistic view”, in A NGLÈS P. (Ed.), Proceedings of 7th International Colloquium on Clifford Algebra and their Applications, 19-29 May 2005, Toulouse, Birkhäuser. [NOT 06] N OTTALE L., C ÉLÉRIER M.N., L EHNER T., “Non-Abelian gauge field theories in scale relativity”, J. Math. Phys., vol. 47, p. 032303, 2006. [NOT 07] N OTTALE L., C ÉLÉRIER M.N., “Derivation of the postulates of quantum mechanics form the first principles of scale relativity”, J. Phys. A: Math. Theor., vol. 40, p. 14471, 2007. [NOT 08] N OTTALE L., The Theory of Scale Relativity, 528 pp., 2008, forthcoming.
498
Scaling, Fractals and Wavelets
[ORD 83] O RD G.N., “Fractal space-time: a geometric analogue of relativistic quantum mechanics” Journal of Physics A: Mathematical and General, vol. 16, p. 1869, 1983. [POC 97] P OCHEAU A., “From scale-invariance to scale covariance”, in D UBRULLE B., G RANER F., S ORNETTE D. (Eds.), Scale Invariance and Beyond, Les Houches workshop, EDP Sciences and Springer, p. 209, 1997. [POI 05] P OINCARÉ H., “Sur la dynamique de l’electron”, Comptes Rendus de l’Académie des Sciences de Paris, vol. 140, p. 1504–1508, 1905. [QUE 00] Q UEIROS -C ONDÉ D., “Principle of flux entropy conservation for species evolution Principe de conservation du flux d’entropie pour l’évolution des espèces”, Comptes Rendus de l’Académie des Sciences de Paris, vol. 330, p. 445–449, 2000. [SOR 97] S ORNETTE D., “Discrete scale invariance”, in D UBRULLE B., G RANER F., S ORNETTE D. (Eds.), Scale Invariance and Beyond, Les Houches workshop, EDP Sciences and Springer, p. 235, 1997. [SOR 98] S ORNETTE D., “Discrete scale invariance and complex dimensions”, Physics Reports, vol. 297, p. 239–270, 1998. [SOR 01] S ORNETTE D., J OHANSEN A., “Finite-time singularity in the dynamics of the world population and economic indices”, Physica, vol. A 294, p. 465–502, 2001. [TRI 93] T RICOT C., Courbes et dimensions fractales, Springer-Verlag, Paris, 1993. [WIL 83] W ILSON K.G., “The Renormalization Group and critical phenomena”, American Journal of Physics, vol. 55, p. 583–600, 1983. [WOL 94] W OLSZCZAN A., “Confirmation of Earth-mass planets orbiting the millisecond pulsar PSR B1257+12” Science, vol. 264, p. 538, 1994.
List of Authors
Patrice A BRY Laboratoire de physique de l’ENS CNRS Lyon France Liliane B EL Laboratoire de mathématiques Paris-Sud University Orsay France Albert B ENASSI Department of Mathematics Blaise Pascal University Clermont-Ferrand France Jean-Marc C HASSERY LIS CNRS Grenoble France Serge C OHEN LSP Paul Sabatier University Toulouse France
500
Scaling, Fractals and Wavelets
Khalid DAOUDI INRIA Nancy France Franck DAVOINE Heudiasyc University of Technology of Compiègne France Patrick F LANDRIN Laboratoire de physique de l’ENS CNRS Lyon France Paulo G ONÇALVES Laboratoire de l’Informatique du Parallélisme de l’ENS INRIA Lyon France Jacques I STAS Département IMSS Pierre Mendès France University Grenoble France Stéphane JAFFARD Department of Mathematics University of Paris XII Créteil France Pierrick L EGRAND IMB University of Bordeaux 1 Talence France Jacques L ÉVY V ÉHEL INRIA Centre de recherche Saclay - Île-de France Orsay France
List of Authors
Denis M ATIGNON LTCI, CNRS Ecole nationale supérieure des télécommunications Paris France Laurent N OTTALE Observatoire de Paris-Meudon CNRS Meudon France Georges O PPENHEIM Laboratoire de mathématiques Paris-Sud University Orsay France Rudolf R IEDI Department of Telecommunications University of Applied Sciences of Western Switzerland Fribourg Switzerland Luc ROBBIANO Laboratoire de mathématiques Paris-Sud University Orsay France Claude T RICOT LMP Blaise Pascal University Clermont-Ferrand France Darryl V EITCH Department of Electrical and Electronic Engineering University of Melbourne Victoria Australia
501
502
Scaling, Fractals and Wavelets
Marie-Claude V IANO Laboratoire de mathématiques Paris-Sud University Orsay France Christian WALTER PricewaterhouseCoopers, Paris University of Evry France
Index
A aggregation, 421, 422, 432, 443
C cascade, 80, 81, 92, 125, 156, 158, 425, 430 binomial, 140, 156, 162, 169 computer network traffic bursty, 415, 417, 419, 420 fractal, 420, 424, 426, 430, 431
Schrödinger, 488–493, 495 exponent Hölder, 33–48, 61, 67, 77, 105–107, 112–116, 119, 123, 127, 132, 222, 223, 226, 301, 303, 304, 306, 308, 311–316, 320, 322, 326, 328, 368, 369, 372, 373, 383–387, 396, 398, 406, 407 pointwise, 33, 37, 38, 53, 226, 311, 315, 369, 372, 384, 386, 390, 392, 394 Hurst, 77 oscillation, 39, 127, 132, 140
D diffusive representation, 238–240, 258, 262–264, 271, 273 dimension fractal, 145, 163, 428, 457, 458, 473, 475–477, 479–482 Hausdorff, 41–45, 53–55, 63, 78, 79, 106, 123, 128, 132, 145, 146, 281, 294, 302, 303, 335, 369, 370 distribution processes, 294–297 Weibull, 453
F fractal space-time, 466, 488 fractional derivative, 113, 124, 237–240, 242–251, 267, 273 filter, 279, 281–284, 296 integration, 115, 211, 240–242, 439 function partition, 79, 126, 128, 131, 132, 147–149, 151, 153, 163, 166, 302 Weierstrass, 32, 41, 44, 65, 66, 185, 188, 302, 316, 317, 336
E equation fractional differential, 239, 240, 251–273 fractional partial differential, 239, 240, 266–273
G gravitational structure, 492, 495
B binomial measure, 62–65, 68, 154, 157, 158, 160, 164, 166
H Hausdorff measure, 27, 61
504
Scaling, Fractals and Wavelets
heavy tail, 423, 427–430, 432, 433 I increment r-stationary, 189–191 stationary, 73–77, 79, 86–88, 91, 96, 163, 181, 189, 190, 194–196, 198–200, 205, 209, 212, 431 L large deviations, 57, 127, 320, 371 long range dependence, 75, 76, 88, 89, 92, 93, 95–97, 139, 167, 200, 422, 425, 427–429, 431–433 M motion fractional Brownian, 77, 103, 116, 169, 170, 172, 191, 193, 194, 205, 207, 214, 222, 223, 280, 302, 388, 431, 432, 455 multifractional Brownian, 116, 218–226, 232, 375, 396–398 multiplexing, 414, 416, 417, 430 multiresolution analysis, 83–85, 95, 207, 211 N noise filtered white, 201, 214, 215, 217, 218, 228, 230 fractional Gaussian, 78, 99, 167, 173, 421, 424, 425 P process α-stable, 88, 181, 186, 196, 197 Censov, 198 distribution, 294–297 increments, 73, 75–77, 88, 156, 160, 167, 281 Lévy, 104, 117–120 non-differentiable, 469 self-similar, 73–75, 86, 88, 96, 179–202, 213, 222, 446
semi-stable, 186 Takenaka, 195, 198 pseudo-differential operator, 38, 40, 112, 190, 192, 197, 215, 216, 238–240, 263, 273 Q quadratic variations, 188, 193, 194, 226–228, 231, 233 R renormalization of sums of random variables, 186, 187 S sample path regularity, 91, 181, 281 scale change of, 205, 465 dynamics, 473, 476, 478–480, 482 equation, 473, 478, 480 invariance, 71–81, 92–100, 179, 180, 420–434 relativity, 465–493 scaling law, 76, 80, 87, 92, 93, 96, 100, 168, 269, 270, 419–430, 439 segmentation, 66, 67, 222, 301, 324–328 self-similarity local, 206, 213, 215, 218, 220 source on/off, 427–429, 431, 432 space Besov, 123–126, 129, 144, 295, 377 Sobolev, 124, 144, 295 spectrum large deviation, 51, 52, 54, 60, 65, 67 127, 370 Legendre, 54, 60, 128, 147–149 singularity, 106, 112, 118, 119, 123, 125, 128, 129, 131 system iterated function, 23, 32, 301–320 hyperbolic, 335 W wavelet analysis, 85–92, 98, 107, 159, 322