The Strength of Nonstandard Analysis

imme van den berg vitor neves (eds.) .fl SpringerWienNewYork fl SpringerWienNewYork Imme van den Berg Vitor Neves ...

Author: Imme van den Berg | Vitor Neves

109 downloads 2807 Views 7MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Report copyright / DMCA form

DOWNLOAD PDF

imme van den berg vitor neves (eds.)

.fl SpringerWienNewYork

fl

SpringerWienNewYork

Imme van den Berg Vitor Neves (eds.) The Strength of Nonstandard Analysis

SpringerWienNewYork

Imme van den Berg Departamento de Matematica Universidade de Evora, Evora, Portugal

Vitor Neves Departamento de Matematica Universidade de Aveiro, Aveiro, Portugal

This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically those of translation, reprinting, re-use of illustrations, broadcasting, reproduction by photocopying machines or similar means, and storage in data banks. Product Liability: The publisher can give no guarantee for the information contained in this book. This also refers to that on drug dosage and application thereof. In each individual case the respective user must check the accuracy of the information given by consulting other pharmaceutical literature. The use of registered names, trademarks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regu lations and therefore free for general use. © 2007 Springer-Verlag Wien Printed in Germany

SpringerWienNewYork is part of Springer Science+ Business Media springeronline.com Typesetting: Camera ready by authors Printing: Strauss GmbH, Mi:irlenbach, Germany Cover image: Gottfried Wilhelm Freiherr von Leibniz

(1646-1716),

tinted lithograph

by Josef Anton Seib. ©ONE, Vienna

Printed on acid-free and chlorine-free bleached paper SPIN

11923817

With 16 Figures

Library of Congress Control Number

2006938803

ISBN-10 3-211-49904-0 SpringerWienNewYork ISBN-13 978-3-211-49904-7 SpringerWienNewYork

Foreword Willst du ins Unendliche schreiten? Gch nur im Endlichcn nach allc Scitcn! Willst du dich am Ganzen erquicken, So must du das Ganze im Kleinsten erblicken.

J. W. Goethe

(Gott, Gemut

und

Welt, 1815)

Forty-five years ago, an article appeared in the Proceedings of the Royal Academy of Sciences of the Netherlands Series A, 64, 432-440 and lndagationes Math. 23 ( 4) , 196 1 , with the mysterious title "Non-standard Analysis" authored by the eminent mathematician and logician Abraham Robinson ( 1 908- 1974) . The title of the paper turned out to be a contraction of the two terms "Non standard Model" used in model theory and "Analysis". It presents a treatment of classical analysis based on a theory of infinitesimals in the context of a non-standard model of the real number system R In the Introduction of the article, Robinson states: "It is our main purpose to show that the models provide a natural approach to the age old problem of producing a calculus involving infinitesimal (infinitely small) and infinitely large quantities. As is well-known the use of infinitesimals strongly advocated by Leibniz and unhesitatingly accepted by Euler fell into disrepute after the advent of Cauchy ' s methods which put Mathematical Analysis on a firm foundation". To bring out more clearly the importance of Robinson ' s creation of a rigor ous theory of infinitesimals and their reciprocals, the infinitely large quantities, that has changed the landscape of analysis, I will briefly share with the reader a few highlights of the historical facts that arc involved. The invention of the "Infinitesimal Calculus" in the second half of the sev enteenth century by Newton and Lcibniz can be looked upon as the first funda-

vi

Foreword

mental new discovery in mathematics of revolutionary nature since the death of Archimedes in 2 1 2 BC. The fundamental discovery that the operations of differentiation (flux) and integration (sums of infinitesimal increments) are inverse operations using the intuitive idea that infinitesimals of higher order compared to those of lower order may be neglected became an object of severe criticism. In the "Analyst", section 35, Bishop G. Berkeley states: "And what are these fluxions? The velocities of evanescent incre ments. And what are these same evanescent increments? They are neither finite quantities, nor quantities infinitesimally small, nor yet nothing. May we call them ghosts of departed quantities?" The unrest and criticism concerning the lack of a rigorous foundation of the in finitesimal calculus led the Academy of Sciences of Berlin, at its public meeting on June 3, 1774, and well on the insistence of the Head of the rviathematics Sec tion, .J. L. Lagrange, to call upon the mathematical community to solve this important problem. To this end, it announced a prize contest dealing with the problem of "Infinity" in the broadest sense possible in mathematics. The announcement read: "The utility derived from Mathematics, the esteem it is held in and the honorable name of 'exact science' par excellence, that it justly deserves, are all due to the clarity of its principles, the rigor of its proofs and the precision of its theorems. In order to ensure the continuation of these valuable attributes in this important part of our knowledge the prize of a 50 ducat gold medal is for: A clear and precise theory of what is known as 'Infinity' in Mathe matics. It is well-known that higher mathematics regularly makes use of the infinitely large and infinitely small. The geometers of antiquity and even the ancient analysts, however, took great pains to avoid anything approaching the infinity, whereas today's emi nent modern analysts admit to the statement 'infinite magnitude' is a contradiction in terms. For this reason the Academy desires an explanation why it is that so many correct theorems have been deduced from a contradictory assumption, together with a formula tion of a truly clear mathematical principle that may replace that of infinity without , however, rendering investigations by its usc overly difficult and overly lengthy. It is requested that the sub ject be treated in all possible generality and with all possible rigor, clarity and simplicity."

Foreword

vii

Twenty-three answers were received before the deadline of January 1 , 1 786. The prize was awarded to the Swiss mathematician Simon L'Huilier for his essay with motto: "The infinite is the abyss in which our thoughts are engulfed." The members of the "Prize Committee" made the following noteworthy points: None of the submitted essays dealt with the question raised "why so many correct theorems have been derived from a contradictory assumption?" Furthermore , the request for clarity, simplicity and, above all , rigor was not met by the contenders, and almost all of them did not address the request for a newly formulated principle of infinity that would reach beyond the infinitesimal calculus to be meaningful also for algebra and geometry. For a detailed account of the prize contest we refer the reader to the interest ing biography of Lazare Nicolas M. Carnot ( 1 753-1823) , the father of the ther modynamicist Sadi Carnot , entitled "Lazare Carnot Savant" by Ch. C . Gille spie ( Princeton Univ. Press, 1971 ) , which contains a thorough discussion of Carnot's entry "Dissertation sur la theorie de l'infini mathematique", received by the Academy after the deadline. The above text of the query was adapted from the biography. In retrospect , the outcome of the contest is not surprising. Nevertheless around that time the understanding of infinitesimals had reached a more so phisticated level as the books of J. L. Lagrange and L. N. Carnot published in Paris in 1797 show. From our present state of the art, it seems that the natural place to look for a "general principle of infinity" is set theory. Not however for an "intrinsic" definition of infinity. Indeed, as Gian- Carlo Rota expressed not too long ago: "God created infinity and man, unable to understand it , had to invent finite sets." At this point let me digress a little for further clarification about the infinity we are dealing with. During the early development of Cantor's creation of set theory, it was E. Zermelo who realized that the attempts to prove the existence of "infinite" sets, short of assuming there is an "infinite" set or a non-finite set as in Proposition 66 of Dedekind's famous "Was sind und was sollen die Zahlen?" were fallacious. For this reason, Zermelo in his important paper "Sur les ensembles finis et le principe de !'induction complete", Acta Math. 32 ( 1909) , 185-193 ( submitted in 1907) , introduced an axiom of"infinity" by postulating the existence of a set , say A , non-empty, and that for each of its elements x, the singleton { x} is an element of it. Returning to the request of the Academy: To discover a property that all infinite sets would have in common with the finite sets that would facilitate

viii

Foreword

their use in all branches of mathematics. What comes to mind is Zermelo's well ordering principle. Needless to say that this principle and the manifold results and consequences in all branches of mathematics have had an enormous impact on the development of mathematics since its introduction. One may ask what has this to do with the topic at hand? It so happens that the existence of non-standard models depends essentially on it as well and consequently non standard analysis too. The construction of the real number system (linear continuum) by Cantor and Dedekind in 1872 and the Weierstrass c:-0 technique gradually replaced the use of infinitesimals. Hilbert's characterization in 1899 of the real num ber system as a (Dedekind) complete field led to the discovery, in 1907, by H. Hahn, of non-archimedian totally ordered field extensions of the reals. This development brought about a renewed interest in the theory of infinitesimals. The resulting "calculus", certainly of interest by itself, lacked a process of defin ing extensions of the elementary and special functions, etc. , of the objects of classical analysis. It is interesting that Cantor strongly rejected the existence of non-archimedian totally ordered fields. He expressed the view that no ac tual infinities could exist other than his transfinite cardinal numbers and that, other than 0 , infinitesimals did not exist . He also offered a "proof" in which he actually assumed order completeness. It took one hundred and seventy-five years from the time of the deadline of the Berlin Academy contest to the publication of Robinson's paper "Non standard Analysis". As Robinson told us, his discovery did not come about as a result of his efforts to solve Leibniz ' problem; far from it . Working on a paper on formal languages where the length of the sentences could be countable, it occurred to him to look up again the important paper by T. Skolem ''tJber die Nichtcharakterisierbarkeit der Zahlenreihe mittcls endlich oder abzahlbar unendlich vieler Aussagen mit ausschliesslich Zahlenvariablen, Fund. Math. 23 ( 1 934) , 1 50-161 * . Briefly, Skolem showed in his paper the existence of models of Peano arith metic having "infinitely large numbers". Nevertheless in his models the prin ciple of induction holds only for subsets determined by admissible formulas from the chosen formal language used to describe Peano's axiom system. The non-empty set of the infinitely large numbers has no smallest element and so cannot be determined by a formula of the formal language and is called an external set; those that can were baptized as internal sets of the model. Robinson, rereading Skolem's paper, wondered what systems of numbers would emerge if he would apply Skolem 's method to the axiom system of the real numbers . In doing so, Robinson immediately realized that the real number *

sec abo: T. Skolcm "Pcano'� Axiom� and Modcb of Arithmetic". in Sympo�ium on the

Mathematical Interpretation of Formal Systems, North-Holland, Amsterdam

1955, 1-14.

Foreword

ix

system was a non-archimedian totally ordered field extension of the reals whose structure satisfies all the properties of the reals, and that , in particular, the set of infinitesimals lacking a least upper bound was an external set. This is how it all started and the Academy would certainly award Robinson the gold medal. At the end of the fifties at Caltech (California Institute of Technology) Arthur Erdelyi FRS ( 1908- 1977) conducted a lively seminar entitled "Gener alized Functions". It dealt with various areas of current research at that time in such fields as J. Mikusinski's rigorous foundation of the so-called Heaviside operational calculus and L. Schwartz' t heory of distributions. In connection with Schwartz' distribution theory, Erdelyi urged us to read the just appeared papers by Laugwitz and Schmieden dealing with the representations of the Dirac-delta functions by sequences of point-functions converging to 0 point wise except at 0 where they run to infinity. Robinson's paper fully clarified this phenomenon. Reduced powers of � instead of ultrapowers, as in Robin son's paper, were at play here. In my 1962 Notes on Non-standard Analysis the ultrapower construction was used, but at that time without using explicitly the Transfer Principle. In 1967 the first International Symposium on Non-standard Analysis took place at Caltech with the support of the U.S. Office of Naval Research. At the time the use of non-standard models in other branches of mathematics started to blossom. This is the reason that the Proceedings of the Symposium carries the title: Applications of Model Theory to Algebra. Analysis and Probability. A little anecdote about the meeting. When I opened the newspaper one morning during the week of the meeting, I discovered to my surprise that it had attracted the attention of the Managing Editor of the Pasadena Star News; his daily "Conversation Piece" read: "A Stanford Professor spoke in Pasadena this week on the subject 'Axiomatizations of Non-standard Analysis which are Conservative Extensions of Formal Systems for Standard Classical Analysis', a fact which I shall tuck away for reassurance on those days when I despair of communicating clearly." may add here that from the beginning Robinson was very interested in the formulation of an axiom system catching his non-standard methodology. Unfortunately he did not live to see the solution of his problem by E. Nelson presented in the 1977 paper entitled "Internal Set Theory". A presentation by Nelson, "The virtue of Simplicity", can be found in this book. A final observation. During the last sixty years we have all seen come about the solutions of a number of outstanding problems and conjectures,

X

Foreword

some centuries old, that have enriched mathematics. The century-old problem to create a rigorous theory of infinitesimals no doubt belongs in this category. It is somewhat surprising that the appreciation of Robinson's creation was slow in coming. Is it possible that the finding of the solution in model theory, a branch of mathematical logic, had something to do with that? The answer may perhaps have been given by Augustus de Morgan ( 1 806-1871 ) , who is well-known from De 1Iorgan's Law , and who in collabo ration with George Boolc ( 1 805- 1864) reestablished formal logic as a branch of exact science in the nineteenth century, when he wrote: "We know that mathematicians care no more for logic than logicians for mathematics. The two eyes of exact sciences are mathematics and logic: the mathematical sect puts out the logical eye, the logical sect puts out the mathematical eye; each believing that it can sec better with one eye than with two." We owe Abraham Robinson a great deal for having taught us the use of both eyes. This book shows clearly that we have learned our lesson well. All the contributors are to be commended for the way they have made an effort to make their contributions that are based on the talks at the meeting "Nonstandard Mathematics 2004" as self-contained as can be expected. For further facilitating the readers, the editors have divided the papers in categories according to the subject. The whole presents a very rich assortment of the non standard approach to diverse areas of mathematical analysis. I wish it many readers. Wilhelmus A. .J. Luxemburg Pasadena, California September 2006

Acknowledgements

The wide range of applicability of Mathematical Logic to classical Mathe matics, beyond Analysis - as Robinson's terminology Non-standard Analysis might imply - is already apparent in the very first symposium on the area, held in Pasadena in 1967, as mentioned in the foreword. Important indicators of the maturity of this field are the high level of foun dational and pure or applied mathematics presented in this book, congresses which take place approximately every two years , as well as the experiences of teaching, which proliferated at graduate , undergraduate and secondary school level all over the world. This book, made of peer reviewed contributions, grew out of the meet ing Non Standard Mathematics 2004, which took place in .July 2004 at the Department of Mathematics of the University of Aveiro (Portugal) . The ar ticles are organized into five groups, ( 1 ) Foundations, (2) Number theory, (3) Statistics, probability and measures, ( 4) Dynamical systems and equations and (5) lnfinitcsimals and education. Its cohesion is enhanced by many cross overs. We thank the authors for the care they took with their contributions as well as the three Portuguese Research Institutes which funded the production cost: the Centro lnternacional de Matematica CIM at Coimbra, the Centro de Estudos em Optimiza.:;ao e Controlo CEO C of the University of Aveiro and the Centro de lnvestiga.:;ao em Matematica e Aplica.:;oes CIMA-UE of the University of Evora. Moreover, NS1I2004 would not have taken place were it not for the support of the CIM, CEOC and CIMA themselves , Funda.:;ao Luso-Americana para o Desenvolvimento, the Centro de Analise Matematica, Geomctria e Sistemas Dinamicos of the lnstituto Superior Tccnic:o of Lisbon and the Mathematics Department at Aveiro. Our special thanks go to Wilhclmus Luxemburg of Caltech, USA, who, being one of the first to recognize the importance of Robinson's discovery of

xii

Acknowledgements

Nonstandard Analysis , honoured us by writing the foreword. Last but not least , Salvador Sanchez-Pedreiio of the University of Murcia, Spain, joined his expert knowledge of Nonstandard Analysis to a high dominion of the 11\'JEX editing possibilities in order to improve the consistency of the text and to obtain an outstanding typographic result. Imme van den Berg Vftor Neves Editors

Contents I

Foundations

1 The strength of nonstandard analysis by H . .JEROME KEISLER 1.1 Introduction . . . . . 1.2 The theory PRAw . . 1.3 The theory NPRAw . 1.4 The theory WNA . . 1.5 Bounded minima and overs pill . 1.6 Standard parts . . . . . . . . 1.7 Liftings of formulas . . . . . . 1.8 Choice principles in L(PRAw) 1 .9 Saturation principles . . . . 1 . 1 0 Saturation and choice 1 . 1 1 Second order standard parts 1 . 1 2 Functional choice and (3 2 ) 1 . 13 Conclusion . . . . . . . 2 The virtue of simplicity by EDWARD NELSON Part I. Technical Part II. General .

1

3 3 4 6 7 9 11 14 16 17 20 21 23 25 27 27 30

3 Analysis of various practices of referring in classical or non standard mathematics by YVES PERAIRE 3. 1 Introduction . . . . . . . . . . . . . . . 3. 2 Generalites sur la referentiation . . . . 3.3 Le calcul de Dirac. L'egalite de Dirac . 3.4 Calcul de Heaviside sans transformee de Laplace. L'egalite de Laplace . 3 . 5 Exemples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

33 33 34 37 41 44

xiv

Contents

4 Stratified analysis? by KAREL H RBACEK 4.1 The Robinsonian framework . . . . . . . . . . 4.2 Stratified analysis . . . . . . . . . . . . . . . . 4.3 An axiomatic system for stratified set theory

47

5 ERNA at work by C . IMPENS and S . SANDERS 5.1 Introduction . . . . . . 5.2 The system . . . . . . 5.2.1 The language 5.2.2 The axioms . .

64

6 The Sousa Pinto approach to nonstandard generalised functions by R. F. HOSKINS 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . 6.1.1 Generalised functions and N.S.A. . . . . . . 6.2 Distributions, ultradistributions and hyperfunctions . 6.2.1 Schwartz distributions . . . . . . . . . . . 6.2.2 The Silva axioms 6.2.3 Fourier transforms and ultradistributions 6.2.4 Sato hyperfunctions . . . . . . . . . . . . 6.2.5 Harmonic representation of hyperfunctions 6.3 Prehyperfunctions and predistributions . 6.4 The differential algebra A(D,J ....... . 6.4.1 Predistributions of finite order . . . 6.4.2 Predistributions of local finite order Predistributions of infinite order 6.4.3 6.5 Conclusion . . . . . . . . . . . 7 N eutrices in more dimensions by IMME VAN DEN BERG 7.1 Introduction . . . . . . . . . . 7.1.1 Motivation and objective 7.1.2 Setting . . . . . . . . . 7.1.3 Structure of this article . 7.2 The decomposition theorem . . . 7.3 Geometry of neutrices in � 2 and proof of the decomposition theorem . . . . . . . . . . . . . . . . . . . . . . . Thickness, width and length of neutrices 7.3.1 On the division of neutrices 7.3.2 Proof of the decomposition theorem . . . 7.3.3

48 53 58

64 65 65 67 76 76 77 77 77 78 80 81 83 84 86 86 88 88 90 92 92 92 94 95 95 96 97 103 111

Contents

II

Number theory

8 Nonstandard methods for additive and combinatorial number theory. A survey by RENLING .liN 8.1 The beginning . . . . . . 8.2 Duality between null ideal and meager ideal 8.3 Buy-one-get-one-free scheme . . . . . . . . . 8.4 From Kneser to Banach . . . . . . . . . . . 8.5 Inverse problem for upper asymptotic density 8.6 Freiman's 3k- 3 + b conjecture . . . . . . . .

XV

117 119 119 120 121 124 125 129

9 Nonstandard methods and the Erdos-Turan conjecture by STEVEN C . LETH 9.1 Introduction . . . . . . . . . . . 9.2 Near arithmetic progressions . . 9.3 The interval-measure property .

133

III

143

Statistics, probability and measures

133 134 140

10 Nonstandard likeliho od ratio test in exponential families 145 by JACQUES BosciRAUD 10.1 Introduction . . . . . . . . . . . . . . . . . 145 145 10 .1.1 A most powerful nonstandard test 10.2 Some basic concepts of statistics 146 10.2.1 Main definitions 146 146 10.2.2 Tests . . . . . 10.3 Exponential families . . 147 147 10.3.1 Basic concepts . 148 10.3.2 Kullback-Leibler information number 10.3.3 The nonstandard test . 149 150 10.3.4 Large deviations for X . . . . . . . . 151 10.3.5 n-regular sets . . . . . . . . . . . . . 10.3.6 n-regular sets defined by Kullback-Leibler information . 156 159 10.4 The nonstandard likelihood ratio test . a 162 10.4 .1 lnn infinitesimal 10.4.2 0 � � l ln ex I �co . . . . . . . . 163 10.4.3 � l ln ex I� co . . . . . . . . . . . 164 10.5 Comparison with nonstandard tests based on X . 165 165 10.5.1 Regular non ;:;tandard tests 10.5.2 Case when 8o is convex . . . . . . . . . . 167

xvi

Contents

1 1 A finitary approach for the representation of the infinitesimal generator of a markovian semigroup by S CHERAZADE BENHABIB 11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . 11.2 Construction of the least upper bound of sums in 1ST 11.3 The global part of the infinitesimal generator 11.4 Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . 12 On two recent applications of nonstandard analysis to the theory of financial markets by FREDERIK S . HERZBERG 12.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 12.2 A fair price for a multiply traded asset . . . . . . . . . . 12.3 Fairness-enhancing effects of a currency transaction tax 12.4 How to minimize "unfairness" . . . . . . . . . . 13 Quantum Bernoulli experiments and quantum stochastic pro cesses by MANFRED WOLFF 13.1 Introduction . . . . . . . . . . . . . . 13.2 Abstract quantum probability spaces 13. 3 Quantum Bernoulli experiments . . . 13.4 The internal quantum processes . . . 13.5 From the internal to the standard world 13.5.1 Brownian motion . . . . . . . . 13.5.2 The nonstandard hulls of the basic internal processes 13.6 The symmetric Fock space and its embedding into £ . . . . . 14 Applications of rich measure spaces formed from nonstandard models by PETER LOEB 14.1 Introduction . . . . . . . . . . . . . . 14.2 Recent work of Yeneng Sun . . . . . 14.3 Purification of measure-valued maps 15 More on S-measures by DAVID A . Ross 15.1 Introduction . . . . 15.2 Loeb measures and S-measures 15.3 Egoroff"s Theorem . . . . . . . 15.4 A Theorem of Riesz . . . . . . 15.4.1 Conditional expectation .

170 170 172 174 176

177 177 178 180 181

189 189 191 192 194 196 197 198 202

206 206 207 210

2 17 217 217 221 222 225

Contents 16 A Radon-Nikodym theorem for a vector-valued reference measure by G . BEATE ZIMMER 16.1 Introduction and notation 16.2 The existing literature . . 16.3 The nonstandard approach . 16.4 A nonstandard vector-vector integral 16.5 Uniform convexity . . . . . . . . . . 16.6 Vector-vector derivatives without uniform convexity . 16.7 Remarks . . . . . . . . . . . . . . 17 Differentiability of Loeb measures by EVA AIGNER 17.1 Introduction . . . . . . . . . . . . 17.2 S-differentiability of internal measures 17.3 Differentiability of Loeb measures . . .

xvii

227 227 228 230 232 233 234 236

238 238 240 243

Differential systems and equations

2 51

18 The power of Gateaux differentiability by VfTOR NEVES 18.1 Preliminaries . . . . . . . . . 18.2 Smoothness . . . . . . . . . . 18.3 Smoothness and finite points 18.4 Smoothness and the nonstandard hull 18.4.1 Strong uniform differentiability . 18.4.2 The non-standard hull

253

19 Nonstandard Palais-Smale conditions by NATALIA MARTINS and VfTOR NEVES 19.1 Preliminaries . . . . . . . . . . . . . 19.2 The Palais-Smale condition . . . . . 19.3 Nonstandard Palais-Smale conditions 19.4 Palais-Smale conditions per level . . 19.5 Nonstandard variants of Palais-Smale conditions per level 19.6 Mountain Pass Theorems . . . . . . . . . . . . . .

271

IV

20 Averaging for ordinary differential equations and functional differential equations by TEWFIK SARI 20.1 Introduction . . . . . . . . . . .

253 257 261 263 264 266

271 273 274 280 281 282

286 286

xviii

Contents

20. 2 Deformations and perturbations . 20.2.1 Deformations . . . . . . . 20.2.2 Perturbations . . . . . . 20.3 Averaging in ordinary differential equations 20.3.1 KBM vector fields . . . . . . . . . . 20.3.2 Almost solutions . . . . . . . . . . . 20.3.3 The stroboscopic method for ODEs 20.3.4 Proof of Theorem 2 for almost periodic vector fields 20.3.5 Proof of Theorem 2 for KBM vector fields . . . . 20.4 Functional differential equations . . . . . . . . . . . . . . . 20.4.1 Averaging for FDEs in the formz1(T) = cj (T,zT) 20.4.2 The stroboscopic method for ODEs revisited . . . 20.4.3 Averaging for FDEs in the form x(t) = f ( t/ c , Xt ) 20.4.4 The stroboscopic method for FDEs . . . . . . . .

21 Path-space measure for stochastic differential equation with a co efficient of polynomial growth by T ORU NAKAMURA 21.1 Heuristic arguments and definitions . . . . . . . . . . 21. 2 Bounds for the *-measure and the *-Green function . 21.3 Solution to the Fokker-Planck equation . . . . 22 Optimal control for Navier-Stokes equations by NIGEL J . CuTLAND and K ATARZYNA G RZESIAK 22.1 Introduction . . . . . . . . . . 22.2 Preliminaries . . . . . . . . . . . . . . . . . . . 22.2.1 Nonstandard analysis . . . . . . . . . . 22.2. 2 The stochastic Navier-Stokes equations 22.2.3 Controls . . . . . . . . . . 22.3 Optimal control for d = 2 . . . . . 22.3.1 Controls with no feedback 22.3.2 Costs . . . . . . . . . . . . 22.3.3 Solutions for internal controls 22.3.4 Optimal controls . . . . . . . . 22.3.5 Holder continuous feedback controls (d = 2) 22.3.6 Controls based on digital observations (d = 2) 22.3.7 The space 1-i . . . . . . . . . . . . . . . . . . . 22.3.8 The observations . . . . . . . . . . . . . . . . . 22.3.9 Ordinary and relaxed feedback controls for digital observations . . . . . . . . . . . . . 22.3.10 Costs for digitally observed controls . . . . . . . .

287 287 288 290 291 292 294 294 295 297 298 299 300 301

306 306 309 310

3 17 317 319 319 319 323 325 325 326 327 328 329 331 331 332 332 334

Contents 22.3. 1 1 Solution of the equations 22.3. 12 Optimal control . . . . . 22.4 Optimal control for d = 3 . . . . 22.4. 1 Existence of solutions for any control 22.4.2 The control problem for 3D stochastic Navier-Stokes equations . . . . . . . . 22.4.3 The space 0 . . . . . . 22.4.4 Approximate solutions 22.4.5 Optimal control . . . . 22.4.6 Holder continuous feedback controls (d = 3) 22.4.7 Approximate solutions for Holder continuous controls Appendix: Nonstandard representations of the spaces Hr 23 Lo cal-in-time existence of strong solutions of the n-dimensional Burgers equation via discretizations by .JOAO PAULO TEIXEIRA 23. 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . 23.2 A discretization for the diffusion-advection equations in the torus . . . . . . . . . . . . . . . . . . . . 23.3 Some standard estimates for the solution of the discrete problem . . . . . . . . . . . . . . . . . 23.4 Main estimates on the hyperfinite discrete problem 23.5 Existence and uniqueness of solution . . . . . . . .

V

Infinitesimals and education

24 Calculus with infinitesimals by KEITH D. STROYAN 24. 1 Intuitive proofs with "small" quantities 24. 1 . 1 Continuity and extreme values 24. 1 . 2 Microscopic tangency in one variable 24. 1 .3 The Fundamental Theorem of Integral Calculus 24. 1 . 4 Telescoping sums and derivatives . . . . . . . . 24. 1 . 5 Continuity of the derivative . . . . . . . . . . . . 24. 1 . 6 Trig, polar coordinates, and Holditch's formula . 24. 1 .7 The polar area differential . . . . . . . . 24. 1.8 Leibniz's formula for radius of curvature 24. 1 . 9 Changes . . . . . . . . . 24. 1 . 10 Small changes . . . . . . 24. 1 . 1 1 The natural exponential

xix 334 336 337 338 338 339 340 342 343 344 345 349 349 351 353 356 361 367 369 369 369 370 372 372 373 375 376 379 379 380 381

XX

Contents 24. 1 . 1 2 Concerning the history of the calculus . . . . 24.2 Keisler's axioms . . . . . . . . . . . . . . . . . . . . . 24.2. 1 Small, medium, and large hyperreal numbers 24.2.2 Keisler's algebra axiom . . . . . . 24.2.3 The uniform derivative of x 3 . . . 24.2.4 Keisler's function extension axiom 24.2.5 Logical real expressions 24.2.6 Logical real formulas . . . . . 24.2. 7 Logical real statements . . . . 24.2.8 Continuity and extreme values 24.2.9 Microscopic tangency in one variable 24. 2 . 1 0 The Fundamental Theorem of Integral Calculus 24.2 . 1 1 The Local Inverse Function Theorem . . . . . . 24. 2 . 1 2 Second differences and higher order smoothness

25 Pre-University Analysis by RICHARD O' DONOVAN 25. 1 Introduction . . . . 25.2 Standard part . . . 25.3 Stratified analysis 25.4 Derivative . . . . . 25.5 Transfer and closure .

382 382 382 383 385 385 386 386 387 388 389 390 391 392 395 395 396 397 399 400

The strength of nonstandard analysis H. Jerome Keisler *

A

Abstract

weak theory nonstandard analysis, witL types at. all finite levels over

both the integers and hyperintegers, is developed

as a

possible framework

for reverse mathematics. In this weak theory we investigate .

of standard part principles and saturation principles which

the strength

are

often used

in practice along with first order reasoning about the hyperintegers to obtain secon d order conclusions about the integers.

1.1

Introduction

In this paper we revisit the work in [5] and [6], where the strength of nonstandard analysis is studied. In those papers it was shown that ·weak fragments of set theory become stronger when one adds saturation principles commonly used in nonstandard analysis. The purpose of this paper is to develop a framework for reverse mathe matics in nonstandard analysis. Vvc ·will introduce a base theory, "weal<: non standard analysis" ( WNA), which is proof theoretically weak but has types at all finite levels over both the integers and the hyperintegers. In WNA we study the strength of two principles that an• prominent in nonstandard analy sis, the standard part principle in Section 1.6, and the saturation principle in Section 1.9. These principles arc ofteu used in practice along with first order reasoning about the hyperintegers to obtain second order conclusions about the integers, and for this reason they can lead to the discovery of new results. The standard part principle (STP) says that a function on the integers exists if and only if it is coded by a hyperiuteger. Our main results show that in WNA, STP implies the axiom of choice for quantifier-free formulas (Theorem 17), STP+saturation for quantifier-free formulas implies choice for arithmetical formulas (Theorem 23), and STP+satmation for formulas with ·

university of Wisconsin, Madison. [email protected]

4

1 . The strength of nonstandard analysis

first order quantifiers implies choice for formulas with second order quanti fiers (Theorem 25) . The last result might be used to identify theorems that are proved using nonstandard analysis but cannot be proved by the methods commonly used in classical mathematics. The natural models of WNA will have a superstructure over the standard integers N, a superstructure over the hyperintegers *N, and an inclusion map j : N --+ *N. With the two superstructures, it makes sense to ask whether a higher order statement over the hyperintegers implies a higher order state ment over the integers. As is commonly done in the standard literature on weak theories in higher types, we use functional superstructures with types of functions rather than sets. The base theory WNA is neutral between the internal set theory approach and the superstructure approach to nonstandard analysis, and the standard part and saturation principles considered here arise in both approaches. For background in model theory, see [2, Section 4.4] . The theory WNA is related to the weak nonstandard theory NPRAw of Avigad [ 1 ] , and the base theory RCA0 for higher order reverse mathematics proposed by Kohlenbach [7] . The paper [ 1] shows that the theory NPRAw is weak in the sense that it is conservative over primitive recursive arithmetic (PRA) for Ih sentences, but is still sufficient for the development of much of analysis. The theory WNA is also conservative over PRA for Ib sentences, but has more expressive power. In Sections 1 . 1 1 and 1.12 we will introduce a stronger, second order Standard Part Principle, and give some relationships between this principle and the theories NPRAw and RCA0 .

1.2

The theory PRAw

Our starting point is the theory PRA of primitive recursive arithmetic, introduced by Skolcm. It is a first order theory which has function symbols for each primitive recursive function (in finitely many variables) , and the equality relation = . The axioms arc the rules defining each primitive recursive function, and induction for quantifier-free formulas. This theory is much weaker than Peano arithmetic, which has induction for all first order formulas. An extension of PRA with all finite types was introduced by Godel [4] , and several variations of this extension have been studied in the literature. Here we use the finite type theory PRAw as defined in Avigad [1] . There is a rich literature on constructive theories in intuitionistic logic that are very similar to PRAw , such as the finite type theory HAw over Heyting arithmetic (see, for example, [9] ) . However, in this paper we work exclusively in classical logic. We first introduce a formal object N and define a collection of formal objects called types over N.

1 . 2 . The theory

PRAw

5

( 1 ) The base type over N is N. (2) If J, T are types over N, then

O"-+

T is a type over N.

We now build the formal language L(PRAw) . L(PRAw) is a many-sorted first order language with countably many variables of each type O" over N, and the equality symbol = at the base type N only. It has the usual rules of many sorted logic, including the rule 3f\lu f( u) = t( u, . . .) where u, f are variables of type J, O"-+ N and t( u, . . .) is a term of type N in which f does not occur. We first describe the symbols and then the corresponding axioms. L(PRAw) has the following function symbols: • •

•

•

•

A function symbol for each primitive recursive function. n

The primitive recursion operator which builds a term N from terms of type N, N -+ N, and N.

R( m, j, ) of type

The definition by cases operator which builds a term from terms of type N, O", and O".

c( n, 11, v ) of type O"

The >. operator which builds a term v of type O" and a term t of type T.

>.v.t of type O" -+ T from a variable

The application operator which builds a term s of type O" and t of type O" -+ T.

t( s) of type T from terms

Given terms r, t and a variable v of the appropriate types, r (t j v) denotes the result of substituting t for v in r. Given two terms s, t of type O", s = t will denote the infinite scheme of formulas r ( s/ v ) = r(tjv) where v is a variable of type O" and r ( v ) is an arbitrary term of type N. = is a substitute for the missing equality relations at higher types. The axioms for PRAw are as follows . •

Each axiom of

•

The induction scheme for quantifier-free formulas of L(PRAw).

•

Primitive recursion:

PRA. = m,

=

R(m, f, s(n )) f (n, R(m, j, n)). R(m, j, 0) Cases: c(O, u, v ) = 11, c(s(m). u, v ) = v. Lambda abstraction: (>.u.t)(s) = t(sju) . The order relations < , ::; on type N can be defined in the usual way by •

•

quantifier-free formulas. In [ 1] additional types O" x T, and term-building operations for pairing and projections with corresponding axioms were also included in the language, but

6

1 . The strength of nonstandard analysis

as explained in [ 1 ] , these symbols are redundant and are often omitted in the literature. On the other hand, in [ 1] the symbols for primitive recursive functions are not included in the language. These symbols are redundant because they can be defined from the primitive recursive operator R, but they are included here for convenience. We state a conservative extension result from [ 1] , which shows that PRAw is very weak. Proposition 1 PRAw is a co nservative extensio n of PRA, that is, PRAw and PRA have the same co nsequences in L(PRA) . The natural model of PRAw is the full functional superstructure V (N), which is defined as follows. N is the set of natural numbers. Define VN (N) = N , and inductively define Va_,T (N) t o b e the set of all mappings from Va (N) into VT (N). Finally, V (N) = Ua Va (N). The superstructure V (N) becomes a model of PRAw when each of the symbols of L(PRAw) is interpreted in the obvious way indicated by the axioms. In fact. V (N) is a model of much stronger theories than PRAw, since it satisfies full induction and higher order choice and comprehension principles.

1.3

The theory NPRAw

In [1] , Avigad introduced a weak nonstandard counterpart of PRAw, called NPRAw. NPRAw adds to PRAw a new predicate symbol S(·) for the standard integers (and S-rclativizcd quantifiers V5 , 3 5), and a constant H for an infinite integer, axioms saying that S(·) is an initial segment not containing H and is closed under each primitive recursive function, and a transfer axiom scheme for universal formulas. In the following sections we will use a weakening of NPRAw as a part of our base theory. In order to make NPRAw fit better with the present paper, we will build the formal language L(NPRAw) with types over a new formal object *N instead of over N. The base type over *N is *N, and if J, T are types over *N then O"--+ T is a type over *N. For each type O" over N, let *J be the type over *N built in the same way. For each function symbol u in L(PRAW) from types a to type T, L(NPRAW) has a corresponding function symbol *u from types *(J to type *T . L(NPRAw) also has the equality relation = for the base type *N, and the extra constant symbol H and the standardness predicate symbol S of type *N. We will usc the following conventions throughout this paper. When we write a formula A ( v) , it is understood that vis a tuple of variables that contains

1 .4.

The theory

WNA

7

all the free variables of A. If we want to allow additional free variables we write We will always let:

A(v, . . .) . •

m, n, . . . be variables of type N, x, y, . . . be variables of type *N, j, g, . . . be variables of type N --+ N. To describe the axioms of NPRAw we introduce the star of a formula of L(PRAw). Given a formula A of L(PRAw), a star of A is a formula *A of L(NPRAw) which is obtained from A by replacing each variable of type J in A by a variable of type *J in a one to one fashion, and replacing each function symbol in A by its star. The order relations on *N will be written <, :::_: •

•

without stars.

NPRAw are as follows: The star of each axiom of PRAw.

The axioms of • •

S is an initial segment:

•

S is closed

-,S(H) 1\ Vx\ly [S(x) 1\ y :::_: x--+ S(y)] .

under primitive recursion .

V5x *A(x)--+ Vx *A(x) , A(m) quantifier-free in L(PRAw) . It is shown in [1] that if A( m , n ) is quantifier-free in L(PRA) and NPRAw proves V5 x3y *A(x, y), then PRA proves Vm3n A(m, n) . It follows that NPRAW is conservative over PRA for Ih sentences. The natural models of NPRAw arc the internal structures *V(N), which arc •

Transfer:

proper elementary extensions of V (N) in the many-sorted sense, with additional symbols S for N and H for an clement of * N \ N.

1 .4

The theory WNA

We now introduce our base theory WNA, weak nonstandard analysis . The idea is to combine the theory PRAw with types over N with a weakening of the theory NPRAw with types over *N, and form a link between the two by identifying the standardness predicate S of NPRAw with the lowest type N of PRAw . In this setting, it will make sense to ask whether a formula with types over *N implies a formula with types over N. The language L( WNA) of WNA has both types over N and types over *N. It has all of the symbols of L(PRAw), all the symbols of L(NPRAw) except the primitive recursion operator *R, and has one more function symbol j which goes from type N to type *N.

1.

8

The strength o f nonstandard analysis

We make the axioms of WNA as weak as we can so as to serve as a blank screen for viewing the relative strengths of additional statements which arise in nonstandard analysis. The axioms of

WNA arc as follows:

•

The axioms of

•

The star o f each axiom o f PRA.

•

The stars of the Cases and Lambda abstraction axioms of

•

S is an initial segment: ,s(H) 1\ Vx\ly

•

S is closed under primitive recursion.

•

j maps S onto N:

•

Lifting: j (a (m))

PRAw. PRAw.

[S(x) 1\ y :::; x--+ S(y)] .

Vx [S(x) f--7 ::lm x j(m)] .

=

=

*a (j (m)) for each primitive recursive function a.

The star of a quantifier-free formula of L ( PRA ) , possibly with some vari ables replaced by H, will be called an internal quantifier-free formula. The stars of the axioms of PRA include the star of the defining rule for each prim itive recursive function, and the induction scheme for internal quantifier-free formulas (which we will call internal induction) . The axioms of NPRAw that are left out of WNA are the star of the Primitive Recursion scheme, the star of the quantifier-free induction scheme of PRAw, and Transfer. These axioms are statements about the hyperintegers which involve terms of higher type. Note that WNA is noncommittal on whether the characteristic function of S exists in type *N --+ *N, while the quantifier-free induction scheme of NPRAw precludes this possibility. In practice, nonstandard analysis uses very strong transfer axioms, and extends the mapping j to higher types. Strong axioms of this type will not be considered here.

WNA + NPRAw is a co nservative extensio n of NPRAw, that is, NPRAw and WNA + NPRAw have the same co nsequences in L ( NPRAw ) .

Theorem 2

Proof. Let M be a model of NPRAw, and let M 5 be the restriction of M to the standardncss predicate S. Then M 5 is a model of PRA. By Proposition 1 , the complete theory of M 5 is consistent with PRAw . Therefore PRAw has a model K whose restriction K N to type N is elementarily equivalent to A1 5 . By the compactness theorem for many-sorted logic, there is a model M1 elementarily

1.5. Bounded minima and overspill

9

equivalent to M and a model K1 elementarily equivalent to K with an iso morphism j: Mf � K{", such that (K1 , M1 , j) is a model of WNA + NPRAw . Thus every complete extension of NPRAw is consistent with WNA + NPRAw , and the theorem follows. D

WNA is a conservative extension of PRA for Ih formulas. That zs, if A(m, n) is quantifier-free in L(PRA) and WNA f- \im3n A(m, n), then PRA f- \im3n A(m , n) .

Corollary 3

Suppose WNA f- \im3n A(m, n). By the Lifting Axiom, WNA f 5 5 \i x3 y *A(x, y). By Theorem 2, NPRAw f- \i5 x3 5 y *A(x, y) . Then PRA f \im3n A(m, n) by Corollary 2.3 in [1] . D Each model of WNA has a V(N) part formed by restricting to the objects with types over N, and a V(*N) part formed by restricting to the objects with types over *N. Intuitively, the V(N) and V(*N) parts of WNA are completely Proof.

independent of each other, except for the inclusion map j at the zeroth level. The standard part principles introduced later in this paper will provide links between types N ----+ N and (N ----+ N) ----+ N in the V(N) part and types *N and *N ----+ *N in the V(*N) part. WNA has two natural models, the "internal model" (V(N) , *V(N) , j) which contains the natural model *V(N) of NPRAw, and the "full model" (V(N) , V(*N) , j) which contains the full superstructure V(*N) over *N. In both models, j is the inclusion map from N into *N. The full natural model (V(N) , V(*N) , j) of WNA does not satisfy the axioms NPRAw. In particular, the star of quantifier-free induction fails in this modeL because the character istic function of S exists an object of type *N ----+ *N. as

1.5

Bounded minima and overs pill

In this section we prove some useful consequences of the WNA axioms. Given a formula A(x, ...) of L ( WNA), the bounded minimum operator is defined by

u

=

(tLx < y) A(x, .. .)

----+

[u ::; y 1\ (\ix < u) - ,A(x, ... ) 1\ [A(u, . . .) V u

=

yl ] ,

where u is a new variable. By this we mean that the expression to the left of the symbol is an abbreviation for the formula to the right of the symbol. In particular, if z docs not occur in A, (tLZ < 1) A( . . .) is the (inverted) characteristic function of A, which has the value 0 when A is true and the value 1 when A is false. In PRA , the bounded minimum operator is defined similarly. ----+

----+

1. The strength of nonstandard analysis

10

Let A(m, n) be a quantifier-free formula of L(PRA) and let a(p, n) be the primitive recursive function such that in PRA,

Lemma 4

a(p, r7)

=

(p,m < p) A( m, n).

Then (i) WNA f- *a(y, Z)

=

(p,x < y) *A(x, z).

(ii) In WNA, there is a quantifier-free formula B (p, ...) such that (Vm < p) A(m, .. . )

�

B (p, . . . ) ,

Similarly for (3x < y) *A(x, ...), and u

(Vx < y ) *A(x, . .. ) =

(p,x

<

y) *A(x, ...)

�

*B (y, .. . ).

.

Proof. ( i ) By the axioms of WNA, the defining rule for *a is the star of the defining rule for a. (ii) Apply ( i ) and observe that in WNA,

(Vx < y) *A(x, ...)

�

y

=

(p,x < y) - , *A(x, ...).

Let us write \/00x A(x, . .. ) for Vx [- ,S(x) 3x [- ,S (x) 1\ A(x, ...) ] . Lemma 5

In WNA,

____,

D

A(x, . . . )] and 300x A(x, ... ) for

(Overspill) Let A(x, ...) be an internal quantifier-free forrn·ula.

Work in WNA. If A(H, .. . ) we may take x = H. Assume V5 xA(x, .. . ) and -,A(H, ...). By Lemma 4 ( ii ) we may take u (p,x < H) -,A(x . ...). Then -,S (u) . Let x u - L We have x < u, so A(x, . .. ). Since S is closed under the successor function, -,S (x) . D Proof.

=

=

We now give a consequence of WNA in the language of PRA which is similar to Proposition 4.3 in [1] for NPRAw. I:1 -collection in L(PRA) is the scheme

(Vm < p)3n B (m, n, f')

____,

::Jk(Vm < p) (3n < k)B (m, n, r)

where B is a formula of L(PRA) of the form 3q C, C quantifier-free. Proposition 6

l::1 -collection in L(PRA) is provable in WNA.

1.6.

11

Standard parts

Proof. We work in WNA . By pairing existential quantifiers, we may assume that B (m, n, T) is quantifier-free. Assume (Vm < p)3 n B (m, n, f') . Let *B be the formula obtained by starring each function symbol in B and replacing variables of type N by variables of type *N. By the Lifting Axiom and the axiom that S is an initial segment , (Vx < p) 3 5 y *B (x, y , j (r) ) . Then

v= w (Vx < p)(3y < w) *B (x, y, j(r) ) .

B y Lemma 4 and Overspill, 3 5 w (Vx < p) (3y < w) *B (x, y, j (T) ) . B y the Lifting Axiom again, 3k(Vm < p) (3 n < k) B (m, n, r) .

1.6

D

Standard parts

This section introduces a standard part notion which formalizes a construc tion commonly used in nonstandard analysis. and provides a link between the type N ----+ N and the type *N. In type N let ( n ) k be the power of the k-th prime in n, and in type *N let (x)y be the power of the y-th prime in x. The function (n, k) r--+ ( n ) k is primitive recursive, and its star is the function (x, y) r--+ (x)y . HereafteT, when it is cleaT fmrn the context, we will WTite t instead of j ( t)

in formulas of L ( WNA).

Intuitively, we identify j ( t) with t , but officially, they are different because t has type N while j (t) has type *N. This will make formulas easier to read. When a term t of type N appears in a place of type *N, it really is j ( t ) . In the theory WNA , we say that x is near-standard, i n symbols ns(x) , if \15 z S ( (x)z ) . Note that this is equivalent to VnS( (x)n) · We employ the usual convention for relativized quantifiers, so that vns x B means Vx [ ns ( x) ----+ B] and 3nsx B means 3x [ns(x) 1\ B] . We write x ::::::; y if

ns (x) 1\ \15 z (x) z = (y) 2 •

This i s equivalent t o ns(x) 1\ \In (x)n = (y)n. We write the standard part of x and x is a lifting of f. if

ns(x) 1\ Vnf(n)

=

(x)n.

f

=

0 X , and say f is

12

1. The strength of nonstandard analysis r--+

Note that the operation x 0x goes from type *N to type N ----+ N. In nonstandard analysis, this often allows one to obtain results about functions of type N ----+ N by reasoning about hyperintegers of type *N. Lemma 7

In WNA, suppose that x is near-standard. Then

(i) If x � y then ns(y) and y � x. (ii) (3y < H) x � y. =

(i) Suppose x � y. I f S(z) then S((x)z) and ( Y )z ( x )z , so S((y)z). Therefore ns(y), and y � x follows trivially. (ii) Let (3 be the primitive recursive function (J(m, n) = ITi<m P�n ) , . By Lifting and defining rules for (3 and *{3 , \ix\iu\iz [z < u ----+ (x)z (*(J(u, x )) z ] . Therefore v= u \f5 z ( x )z (*(J(u, x))z, and hence v= u x � *(J(u, x). We have \i5 w w w < H, and by Overspill, there exists w with -,S(w) 1\ w w < H. Since x is near-standard, \i5 u [u :::_: w 1\ (\iz < ) pix ) z < w] . By Overspill, Proof.

=

=

u

Let y

=

*(J(u, x). Then x � y. By internal induction, D

We now state the Standard Part Principle, which says that every near standard x has a standard part and every f has a lifting. Standard Part Principle (STP ) :

The following corollary is an easy consequence of Lemma 7. Corollary 8

In WNA, STP is equivalent to

The Weak Koenig Lemma is the statement that every infinite binary tree has an infinite branch. The work in reverse mathematics shows that many classical mathematical statements arc equivalent to the Weak Koenig Lemma. Theorem 9

The Weak Koenig Lemma is provable in WNA + STP.

1.6. Standard parts Proof.

13

Work in WNA + STP. Let B(n) be the formula (Vm < n) [( n) m < 3 1\ (Vk < rn ) [ (n) k

=

0 ---+ ( n ) m

=

Ol] .

B(n) says that n codes a finite sequence of l ' s and 2 ' s. Write rn 0 ---+ (rn) k = (n) k ] ·

This says the sequence coded by rn is an initial segment of the sequence coded by n. Suppose that {n : f(n) = 0 } codes an infinite binary tree T, that is, Vrn3n [m < n 1\ f(n) = OJ 1\ \In [f(n) = 0 ---+ B(n) 1\ Vm[m <1 n ---+ f(rn) = Ol] .

The formulas B ( n) and rn
=

Ol] .

Then ns(y) , and by the STP there exists g = 0 y . It follows that g codes an infinite branch of T. D The next proposition gives a necessary and sufficient condition for STP in WNA + NPRAw. Let cjJ be a variable of type *N ---+ *N, and write f C cjJ for Vnf(n) cj;(n). =

Proposition

10 In WNA + NPRAw, STP is equivalent to Vf3cj;f

c

cjJ 1\ V¢ 3f [V5 x S(cj;(x)) ---+ f c ¢] .

Work in WNA + NPRAw. Call the displayed sentence STP'. Assume STP. Take any f. By STP, f has a lifting u. Since ( u, y) ---+ (u)y is primitive recursive, 3¢\/ycj;(y) = (u)y . Then Vn f(n) = ( u ) n = cj;(n), so f C ¢ . Now take any cjJ and assume that V5x S(cj;(x) ) . Using the star of the primitive recursion scheme in NPRAw , there exists 7/J such that Vx(Vy < x) cj;(y) = (7/J(x))y · Let u = '1/J (H) . We then have (Vy < H) cj;(y) = ( u ) y , so V5y cj;(y) (u)y. It follows that u is near-standard, and by STP there exists f with f = 0U and hence f C ¢ . Now assume STP' . Take any f. By STP' there exist cjJ with f C ¢ . As before there exists 7/J such that Vx(Vy < x) cj;(y) = (7/J(x))y. Let u = 7/J( H ) . Then \In ( u ) n = cj;(n) = f(n), so u is a lifting of f. Now let u be near-standard. Since ( u, y) ---+ ( u )y is primitive recursive, 3 ¢\/y cj;(y) = (u)y . Then V5x S(cj;(x)), so by STP' there exists f with f c ¢ . Then \In f(n) = ( u ) n = cj;(n), so f = 0U. D Proof.

=

1. The strength of nonstandard analysis

14 1.7

Liftings of formulas

In this section we will define some hierarchies of formulas with variables of type N and N N, and corresponding hierarchies of formulas with variables of type *N. We will then define the lifting of a formula and show that liftings preserve the hierarchy levels and truth values of formulas. In the following we restrict ourselves to formulas of L(PRAw) with variables of types N and N N. We now introduce a restricted class of terms, the basic terms, which behave well with respect to liftings. By a basic term over N we mean a term of the form a(u 1 , . . . , uk ) where a is a primitive recursive function of k variables and each u2 is either a variable n of type N or an expression of the form f ( n). These basic terms capture all primitive recursive functionals (3( m, f) in the sense that there is a basic term t(m, f� n) over N which gives the nth value in the computation of (J(m , [) for each input m, f� n. Let Q F be the set of Boolean combinations of equations between basic terms over N. If A E QF, then (Vm < n) A, (3m < n) A, and u (p,x < y) A are ?RAw-equivalent to formulas in QF. The set 115 = I:6 of arithmetical formulas is the set of all formulas which are built from formulas in QF using first order quantifiers Vm, 3m and propositional connectives. For each natural number k, 111 + 1 is the set of formulas of the form Vf A where A E I:k, and I:1 + l is the set of formulas of the form 3f A where A E 111 . We observe that up to ?RAW-equivalence, 111 c::; 111 + 1 n I:1 + 1 ' 111 is closed under finite conjunction and disjunction, and that negations of sentences in 111 belong to I:1 (and vice versa) . ____,

____,

=

In the following we restrict our attention to formulas with variables of type *N. We build a hierarchy of formulas of this kind. By a basic term over *N we mean a term of the form *a(u 1 , . . . , u k ) where a is a primitive recursive function of k variables and each Ui is either a variable of type *N or the constant symbol H. NQF is the set of finite Boolean combinations of equations s t and formulas S(t) where s, t are basic terms over *N. Note that the constant symbol H and the predicate symbol S are allowed in formulas of NQF, but the symbol j is not allowed. The internal quantifier-free formulas are just the formulas B E NQF in which the symbol S does not occur. Let Nl18 = NI:g be the set of formulas which are built from formulas in NQF using the rclativized quantifiers V5 , 3 5 and propositional connectives. Note that the relations ns(x) and x � y are definable by Nl18 formulas. =

1. 7. Liftings of formulas

15

For each natural number k, NIT�+1 is the set of formulas of the form vnsx A where A E N'E.� . N'E.�+l is the set of formulas of the form 3n s x A where A E NIT� . Up to WNA-equivalence, NIT� � NIT�+l n N'E.�+l , NIT� is closed under finite conjunction and disjunction, and negations of sentences in NIT� belong to N'E.� (and vice versa) . We now define the lifting mapping on formulas, which sends ITl to NIT� .

Let A ( m, n be a formula in ITL where m, f contain all the variables of A, both free and bound. The lifting A(z, x) is defined as follows, where z and x are tuples of variables of type *N of the same length as m, {

Definition 1 1

•

Replace each primitive recursive function symbol in A by its star.

•

Replace each mi by Zi .

•

Replace each quantifier Vmi by \15 z�, and similarly for 3 .

•

Replace each quantifier vfi by vnsXi, and similarly for 3.

Lemma 12 (Zeroth Order Lifting) For each formula A(m, [) A(z, x) E NITg, and

E

IT6, we have

WNA f- o x = f---+ [A ( m, /) ---+ A (m, x) ] .

Moreover, if A E QF then A(z, x) is an internal quantifier-free formula. Proof. It is clear from the definition that A(z, x) E NITg , and if A E QF then A(z, x) is an internal quantifier-free formula. In the case that A is an equation

between basic terms, the lemma follows from the Lifting Axiom. The general case is then proved by induction on the complexity of A, using the axiom that j maps N onto S. D Lemma 13 (Fir-s t A(z, x) E NIT� and

Order- Lifting) For- each for·mula A(m, n

E

ITL we have

WNA + STP f- 0x = f ---+ [A(m, l) ---+ A ( m , x)] .

Proof. Zeroth Order Lifting gives the result for k by induction on k, using STP.

=

0.

The general case follows D

1. The strength of nonstandard analysis

16

1.8

Choice principles in L ( PRAw )

In this section we state two choice principles in the language L (PRAw), and show that for quantifier-free formulas they are consequences of the Standard Part Principle. Given a function g of type N ----+ N, let g (m ) be the function

g (m) (n)

=

g(2m 3n ) .

I n each principle, r is a class of formulas with variables of types N and N ----+ N, and A(m, n, . . . ) denotes an arbitrary formula in r. (r, 0 ) -choice \im3n A(m, n, . . . ) ----+ 3g\im A(m, g(m) , . . . ) . (r, 1 ) -choice \im3f A(m, j, . . . ) ----+ 3g\im A(m, g (m ) , . . . ) . When r is the set of all quantifier-free formulas of PRAw, [ 7] calls these schemes QF - AC 0 · 0 and QF - AC0 ) respectively. A related principle is r-comprehension 3j\im j(m) ( p,z < 1) A(m, . . . ) . lib -comprehension is called Arithmetical Comprehension. The follow =

ing well-known fact is proved by pairing existential quantifiers. Proposition 14

In PRAw :

(II � , 0)-choice, (116 , 0)-choice, and Arithmetical Comprehension are equivalent; ( II � , 1)-choice is equivalent to (I:�+1 , 1)-choice, and implies (I:� + 1 , 0)-choice; (Ilk , 1 )-choice is equivalent to (I:i+ 1 , 1) -choice and implies (I:i+ 1 , 0) -choice; ( I:i+ 1 , 0) -choice implies II i -comprehension. In PRAw , one can define a subset of N to be a function f such that \in f (n) ::; 1, and define n E f as f(n) 0. With these definitions, (Ilk , I ) choice implies II i -choice and IIi -comprehension in the sense of second order number theory (see [8] ) . =

FaT each intenwl quantijieT-jree joTmula A(x, y, z), WNA f-- \i5 x3 5 y A(x, y, z) ----+ (3nsy < H)\i5 x A (x, (y) x , z) . Proof. Work in WNA. Assume that \i5x3 5 y A (x, y, z) . By Lemma 4, there is a primitive recmsive function a such that *a( u, z, w) ( p,v < w) A( u, v, Z) . By internal induction there exists w such that w w < H 1\ -,S(w) . Then \i5 u S(*a(u, z, w)) and \i5 x(3y < H) (\iu < x) ( Y )u *a (u, z, w). By internal induction, there exists an x such that -,S (x) and a y < H such that (\iu < x) (y)u *a(u, z, w). It follows that y is near-standard, and by the definition of a, (\iu < X ) A( u, (y ) u , z) . Then Lemma 1 5

=

=

=

D

1 .9.

17

Saturation principles

Theorem 16 (QF, 0 ) -choice

is provable in WNA + STP. Proof. We work in WNA + STP. Let A(m, n, r, h ) E QF and assume \im::ln A(m, n, r, h) . Then A is an internal quantifier-free formula. By STP, h has a lifting z. By First Order Lifting,

A(m, n, r, h) <---+ A(m, n, r, z) .

Then \i5 u3 5 v A(u, v, r, Z) . By Lemma 15, there is a near-standard y such that \i5 u A(u, (y) u , r, Z) . By STP, 3g g = 0y. Then by First Order Lifting,

\im A(m, g(m)J, h).

D

Theorem 1 7 (QF, 1 ) -choice

is provable in WNA + STP. Proof. We use (QF, 0)-choice. Let A(m, j, r, h ) E QF. Assume for simplicity that the tuple r is a single variable r. Suppose that \im::lf A(m, j, r, h ) . By the definition of Q F formulas, f occurs in A only in terms of the form f ( m) and f(r) . Then A(m, j, r, h ) B (m, f(m) , f(r) , r, h ) where B E QF. Hence \im3k B (m, (k) m , (k) r , r, h) . By ( QF, 0)-choice, 3 f\im B (m , (f(m)) m , (f (m)) r , r, h ). <---+

Applying (QF, O)-choice to the formula \ip 3q q =

3g\ip g (p)

=

(f((p)o)) (p )l - we have

(f((P)o )) (p) p

3g\im\in g (m) (n) g(2m 3n ) (f(m)) n . Then \im B (m, g (ml (m) , g (m) (r) , r, h) , and \im A(m, g (ml , r, h ) . =

1. 9

=

D

Saturation principles

We state two saturation principles which formalize methods commonly used in nonstandard analysis. In each principle, r is a class of formulas with vari ables of type *N, and A(x, y, 11) denotes an arbitrary formula in the class r. (r, 0)-saturation (r, I)-saturation

\in'11 [\i5 x3 5 y A (x, y, 11) vns 11 [\i5 x3 n sy A(x, y, 11)

3y\i5 x A(x, (y) x . 11) ] . 3y\i5 x A(x, ( Y )x, 11)] . (r, 0)-saturation. (NIT� , I)-saturation ___,

___,

Note that (r, !)-saturation implies is weaker than the *ilk-saturation principle in the paper [5] . *ilk-saturation

1.

18

The strength of nonstandard analysis

is the same as (NIT� , 1 )-saturation except that the quantifiers vns, 3ns are replaced by V, 3. In the rest of this section we prove some consequences of (NQF, D) saturation. Proposition 18

Let us write w = st( v) for S(v) ----+ w

In WNA,

(NQF, D)-saturation

In WNA + STP,

=

v 1\ -,S(v) ----+ w = D.

implies that

(NQF, D) -saturation

implies that

Vx3f\lm [f(m)

=

st((x) m )].

Proof. Work in WNA. Note that w = st(v) stands for a formula in NQF. Take any X . We have V5 z3 5 w w = st( (x) z ) . Then by (NQF, D)-saturation, ::Jy\15 z (y)z = st( (x)2), and it follows that ns (y) . The second assertion follows by taking f = 0y. D Lemma 19 WNA f-

(Vv > 1) (3w < v 211 H ) (Vx < v) (w) x

=

(p,u < H) [(y) 2x 3u

=

D] .

Proof. Use internal induction on v . The result is clear for v = 2. Let a ( x) = (p,u < H) [(y) 2x 3 u = D] . Assume the result holds for v, that is, w < v 2v H 1\ (\/x < v) (w)x = a(x) . Let z = w * p� ( v ) . We have P u < v 2 and a(v) < H, so z < v 2vH * v 2H ::; (v + 1) 2 (v + l ) H and (Vx < v + 1) ( z) x = a(x) . This proves the result for v + 1 and completes the induction. D Lemma 20 In WNA, (NQF, D)-saturation implies that for A(x, il) E NIIg , vnsil (3y < H)V5 x (y) x = (p,z < 1)A(x, il) .

every formula

Proof. Work in WNA and assume (NQF, D)-saturation. Let be the set of formulas A(x, il) such that vnsil (3y < H)V5 x (Y) x = (p,z < 1)A(x. il) . We prove that NII 8 C::: by induction on quantifier rank. Suppose first that A E NQF. Let C (x , w, il) be the formula w = (p,z < 1) A(x, il) . Then C is a propositional combination of A, w = D, and w = 1 , so C E NQF. We clearly have vns i1 V5 x 3 5 w C(x, w, il) . By (NQF, D)-saturation and Lemma 7 (ii) ,

so

A E .

1.9. Saturation principles

19

It is clear that the set of formulas is closed under propositional connec tives. Suppose all formulas of NII8 of quantifier rank at most n belong to , and A(x, u) = 35wB (x, w, u) where B (x, w, u) E NII8 has quantifier rank at most n. There is a formula D (v, u) with the same quantifier rank as B such that in WNA, D ( 2x 3w, u) B (x, w, u) . Then D E , so <----7

vnsu (3t < H )\i5 v (t) v

Then

=

(p,z < 1) D(v , u) . =

vnsa (3t < H )\7'5 x\7'5 w (t ) 2:r3w (p,z < 1) B (x, w, u) . Assume that ns ( u) and take t as in the above formula. By Lemma 19 there exists s such that (\ix < H ) ( s ) x = (p,w < H ) [ ( t ) 2x3w

=

a] .

It is trivial that \i5x35y y = (p,z < 1) S( (s) x ) . By (NQF, a)-saturation and Lemma 7, (3y < H )\i5 x (y) x = (p,z < 1) S( (s) x ) . Thus whenever S(x) , It follows that so A E . Theorem 21

D

In WNA, (NQF, a) -saturation implies (NITg, a ) -saturation.

Proof. We continue to work in WNA and assume (NQF, a)-saturation and ns ( u) . Assume that A(x, y, u) E NII8 and \i5x35y A(x, y, u) . There is a for mula B(v, u) E NII8 with B (2 x 3y , u) WNA-equivalent to A(x, y, u) . Applying

Lemma 2a to B, we obtain w such that

\7'5 ( w ) = (p,z < 1) B ( u) , v

v

v,

so By Lemma 19 there exists w' such that (\ix < H) (w ' ) x = (p,y < H ) [(w) 2 x3y = a] .

Then \i5x A(x, (w') x , u) , and w is near-standard because \i5x35y A(x, y, u) .

o

20

1.

The strength of nonstandard analysis

Theorem 22 In WNA, (NQF, a) -saturation implies that for every formula A(u) E NIT�, there is a formula B E NQF such that vnsu [A(u) vns x35 y B (x , y . i1) ] . �

Work in WNA + (NQF, a)-saturation. Suppose A E NIT�. Then there is a least k such that A is equivalent to a formula vnsx C where C is a prenex formula in NIT8 of quantifier rank k. If C has the form \i5 y D , the quantifier \i5y can be absorbed into the quantifier vnsx, contradicting the assumption that k is minimal. Suppose C has the form 35y \i5 z D (x, y, z, u) and assume that ns(u) . Then -C is equivalent to \i5y 35z -,D (x, y, z, i1) . By (NITg, a)-saturation, -,c is equivalent to 3n8z\i5y -,D(x, y, (z)y , i1) . Then A is equivalent to vnsx\ins z35y D(x, y, (z)y, u) . By combining the quantifiers vnsxvns z, we contradict the assumption that k is minimal. Therefore C must have the form 35y B where B E NQF. as required. D Proof.

1 . 10

Saturation and choice

In this section we prove results showing that in WNA + STP, saturation principles with quantifiers of type *N imply the corresponding choice principles with quantifiers of type N ----+ N. Theorem 23

prehension.

In WNA+STP, (NQF, a)-saturation implies Arithmetical Com

Proof. Work in WNA + STP and assume (NQF, a)-saturation. By Proposi tion 14, Arithmetical Comprehension is equivalent to (IT6, a)-choice. By Theo rem 2 1 , (NITg, a)-saturation holds. Let A (m, n, i, h) be an arithmetical formula such that \im3n A(m, n, i, h) . By STP, h has a lifting u. By First Order Lift

ing, we have \i5x35y A(x, y, r7, u) . and A E NITg . By (NITg, a)-saturation, there exists y such that \i5x [S( (Y) x ) 1\ A(x, (Y) x , i, U:)] . Then y is near-standard, and by STP there exists g = 0y. By First Order Lifting again, \imA(m, g(m) , i, h) . D We remark that the axioms of Pcano Arithmetic arc consequences of Arith metical Comprehension, so (NQF, a)-saturation implies Peano Arithmetic.

In WNA+STP, (NIT� , a) -saturation implies (ITk , a)-choice, and (N'£� . a) -saturatwn implies ('£1 . a)-chmce.

Theorem 24

Work in WNA + STP. For the IT1 case, assume (NIT� , a)-saturation. Let A(m, n, i, h) E IT1 and suppose that \im3n A(m, n, i, h) . Now argue as in the proof of Theorem 23. The '£1 case is similar. D Proof.

1 . 1 1 . Second order standard parts Theorem 25

21

In WNA + STP, (NIT� , !)-saturation implies

(I:�+l' 1)-choice.

Work in WNA + STP and assume (NIT� , I)-saturation. It suffices to prove (Ilk , 1)-choice. Let A(m, j, f, h) E Ill and suppose that Vm3f A(m, f, f, h) . By STP, h has a lifting il. By First Order Lifting, V5x3nsy A(x, y, f, il) and A E NIT� . We may rewrite this as V5x3n8y [ns(y) 1\ A ( x , y , f, il)] and note that ns(y) /\ A E NIT� . By (NIT� , 1 )-saturation, there exists y such that Proof.

V5 x [ns ( (Y) x ) 1\ A(x, (Y)x, r, 11) ] .

Applying ( NIT� , 0)-saturation to the formula V5x35 z z = ( (Y) (x ) o ) (x) l , we get a near-standard z such that V5x (z) x = ( (Y) ( x) o ) ( x h . Then By STP, there exists g = 0Z. Then for each m, n, g (m ) (n) = g(2m 3n ) = (z) 2m3n ( (Y) m ) n · Therefore g ( m) 0 ( (Y) m ) for each m . By First Order Lifting, we get the desired conclusion Vm A( m, g (m) , f, il) . D The literature in reverse mathematics shows that IIi-comprehension is strong enough for almost all of classical mathematics (see [8] ) . Let us work in WNA + STP and aim for IIi-comprehension. By The orem 25, ( NIT� , 1 )-saturation implies IIi-comprehension. By Theorem 22, ( NIT� , I)-saturation is equivalent to (r, I)-saturation where r is the set of formulas of the form vnsv35w B with B E NQF, so (r, I)-saturation also im plies IIi-comprehension. By Theorem 25 at the next level, ( NIIg , 1 )-saturation implies II�-comprehension, which is stronger than the methods used in most of classical mathematics. =

1.11

=

Second order standard parts

In this section we introduce second order standard parts, which provide a link between the second level of V (N) (type ( N ---+ N) ---+ N) , and the first level of V ( * N) (type *N ---+ *N) . We will use F, G, . . . for variables of type ( N ---+ N ) ---+ N, and ¢, '1/J , . . . for variables of type *N ---+ *N. ¢ is called near-standard, in symbols ns ( ¢) , if vnsx S(cjJ(x)) 1\ Vx\ly [x � y ---+ cjJ(x)

We write

=

¢ ( y )] .

cP � '1/J if n s ( cP) 1\ \In s X cjJ ( X ) = '1/J ( X ) .

22

1. The strength of nonstandard analysis

We write G = 0 c/J , and say that G is the standard part of ¢ and that ¢ is a lifting of G, if

Nate that the operation ¢ f---7 °¢ goes from type *N N) ----+ N. The following lemma is straightforward. Lemma 26 lf ns (¢)

f---7

*N to type ( N ----+

and ¢ � 'ljJ then n s ( ?jJ ) and 'ljJ � ¢ .

We now state the Second Order Standard Part Principle, which says that every near-standard ¢ has a standard part and every F has a lifting. Second Order Standard Part Principle:

By WNA +STP(2) we mean the theory WNA plus both the first and second order standard part principles. We now take a brief look at the consequences of STP(2) in WNA + NPRAw . Roughly speaking, in WNA + NPRAw , the second order standard part principle imposes restrictions of the set of functionals which are reminiscent of construc tive analysis. Besides the axioms of WNA , the only axiom of NPRAw that will be used in this section is the star of quantifier-free induction. A functional G is continuous if it is continuous in the Baire topology, that is,

Vf3nVh [[(Vm < n) h(m)

Proposition 27 Proof. Work in f . Then

=

f ( m ) ] ----+ G(h)

=

G(f)] .

WNA + NPRAw + STP(2) f- VG G is continuous. WNA + NPRAw + STP (2).

Suppose G is not continuous at

Vn3 h [[(Vm < n) h(m) f (m)] 1\ G(f) # G( h ) ] . By STP(2) there arc liftings ¢ of G and x of f . By Lemma 7 and STP, =

Vn(3 y < H ) [[(Vm < n) ( Y ) m = ( x ) m ] 1\ cjJ( x) =/: ¢( y) ] . By the star of QF induction,

300w(3y < H ) [ [ (Vu < w) (x)u

=

( Y)u] 1\ c/J(x ) # ¢(y)] .

But then y � x, contradicting the assumption that ¢ is ncar-standard. This result is closely related to Proposition 5.2 NPRAw , every function f E � ----+ � is continuous.

D

in [ 1 ] , which says that in

1 . 12. Functional choice and (3 2 )

23

The sentence ( 32 )

=

3 GVf [G (f)

=

0

+--+

3 nf (n)

=

OJ

played a central role in the paper [7] . where many statements are shown to be equivalent to (3 2 ) in RCA0 . Similar sentences are prominent in earlier papers, such as Fcferman [3] . It is well-known that

PRAw f- (3 2 ) ----+ 3 G G is not continuous. Corollary 28

1.12

WNA + NPRAw + STP(2) f- ---, (3 2 ) .

Functional choice and ( :3 2 )

In this section we obtain connections between WNA and two statements which play a central role in the paper of Kohlenbach [7] , the statement (3 2 ) and the functional choice principle QF - AC1 0 . In [7] , Kohlenbach proposed a base theory RCA0 for higher order reverse mathematics which is somewhat stronger than PRAw , and is a conservative extension of the second order base theory RCAo. Its main axioms are the axioms of PRAw and the scheme

QF - AC 10 :

Vf3n A(f, n, . . . )

____,

3 GVJ A (f, G(f) , . . . )

where A (f, n, . . . ) is quantifier-free. In [7] , the formula A in the Q F - AC1• 0 scheme is allowed to be an arbitrary quantifier-free formula in the language L (PRAw). Here we will make the addi tional restriction that A (f, n, . . . ) is in the class QF as defined in Section 1 . 7, that is, A(f, n, . . . ) is a Boolean combination of equations and inequalities be tween basic terms. These formulas only have variables of type N and N ----+ N, and do not have functional variables. We show now that QF - AC 1 • 0 restricted in this way follows from WNA plus the standard part principles. Theorem 29

WNA + STP(2) f- QF - AC1• 0 .

Proof. Work in WNA + STP (2) . Assume Vf3n A (f, n, m, h) . By Zeroth Order Lifting, A( x, u, v, z) is an internal quantifier-free formula, and

0x = f A o z = h

____,

[A (f, n, m, h)

+--+

A ( x, n, m, z) ] .

By Lemma 4 there is a primitive recursive function a such that

*a ( x, w, v, z) = (p,u < w) A ( x, u, v, z) .

24 By

1. The strength of nonstandard analysis

STP,

there exists

z such that h = 0 z. 3¢\/x cjJ(x)

Then

=

By the Lambda Abstraction axiom,

*a ( x, H, m , z).

vnsx [ S(¢(x)) A A(x, ¢(x) , m , z)] .

I t follows that ¢ is near-standard. By STP(2), there exists G such that G = 0 c/J . Therefore Vf A(f, G(f), m , h) . D

One of the advantages of WNA over NPRAw is that one can add hypotheses which produce external functions and still keep the standard part principles. The simplest hypothesis of this kind is the following statement , which says that the characteristic function of S exists: ( l s exists) :

It is clear that

3¢\/y cjJ (y) = (tLZ < 1 ) S(y) .

NPRAw f- -. ( 1 s exists)

because by the star of quantifier-free induction. V5y cjJ(y)

=

0 implies

3y [-. S(y) 1\ cjJ(y) = OJ . However, ( 1 s exists) is true in the full natural model ( V (N) , V(*N) , j) of We now connect this principle with the statement (3 2 ) . Theorem 3 0

WNA + STP(2)

f- ( 1 s

WNA.

exists) ----+ (3 2 ) .

Proof. Work in WNA + STP(2). Let a be the primitive recursive function such that *a(x, w) = (tLu < w) (x)u = 0. Let ¢ be the function 1s, so that Vy cjJ ( y) = (tLZ < 1) S(y) . Then there exists 'ljJ such that Vx '1/J(x) = cjJ(*a(x, H ) ) . Observe that

cjJ(*a(x, H)) = 0 ----+ 3 5 u (x)u = 0, 35u (x)u = 0. Moreover , Vx 'ljJ(x) < 2. We show that '1/J is

so 'ljJ (x) = 0 ----+ near-standard. Suppose ns(x) and x � y. We always have S(cjJ(x)) since ¢( x) < 2. If '1/J(x) = 0 then there exists u such that S(u) and (x)u = 0, so (Y)u = 0 and hence ?jJ(y) = 0. This shows that ns('lj1 ) . By STP(2) there exists G such that G = 0 '1/J . Consider any f. By STP, f has a lifting x. Then G(f) = 0 iff ?jJ ( x) = 0 iff 35u(x)u = 0 iff 3n f(n) = 0, and thus ( 3 2 ) holds. D Let us now go back to Section 1 . 7 and redefine the set QF of formulas by allowing basic terms of the form G2 (fk ) in addition to the previous basic terms, and redefining the hierarchy IIl by starting with the new QF. Also redefine

25

1 . 13. Conclusion

the set NQ F and the hierarchy NIT� by allowing additional basic terms of the form cPi (xk). When STP(2) is assumed, the lifting lemmas from Section 1 . 7 and the results of Section 1 . 9 can b e extended t o the larger classes of formulas just defined. The hierarchies Ilk and NII1 at the next level can now be defined in the natural way. One can then obtain the following result, with a proof similar to the proofs in Section 1 .9. Theorem 3 1 In WNA + STP(2) , (NII1, D ) -saturation and (NIIk, ! ) -saturation implies (Ilk , 1 ) -choice.

1.13

implies (Ilk, 0 ) -choice,

Conclusion

We have proposed weak nonstandard analysis, reverse mathematics in nonstandard analysis. In WNA + STP, one can prove:

WNA,

as a base theory for

The Weak Koenig Lemma,

( Q F, 0)-choice and ( Q F, 1 )-choice, (NQ F. 0)-saturation implies (II6 , 0)-choice. (NIT�, i)-saturation implies (Ilk, i)-choice, i In

=

0, 1 .

=

0, 1.

WNA + STP(2) one can prove: Q F - AC 1 • 0 , NPRAw implies VG G is continuous, ls exists implies (3 2 ) ,

(NIIk, i)-saturation implies (Ilk, i) -choice, i

We envision the use o f these results t o calibrate the strength o f particular theorems proved using nonstandard analysis. At the higher levels. this could give a way to show that a theorem cannot be proved with methods commonly used in classical mathematics. Look again at the natural models of WNA discussed at the end of Sec tion 1 . 4. Let * V (N) be an N 1 -saturated elementary extension of V (N) in the model-theoretic sense , and consider the internal natural model (V(N) ,*V(N) , j) and the full natural model (V (N) , V (*N) , j ) . Both of these models satisfy the axioms of WNA, the STP, the statement (3 2 ) , and (NIIk, I)-saturation. In view of Corollary 28, in the internal natural model the axioms of NPRAw hold and STP(2) fails, while in the full natural model STP(2) holds and the axioms of NPRAw fail.

26

1 . The strength o f nonstandard analysis

References [1] .J . AviGAD, Weak theories of nonstandard arithmetic and analysis, to ap pear in Reverse Mathematics, ed. by S.C. Simpson. [2] C . C . CHANG and H . .J . KEISLER, 1 990.

Model Theory.

Third Edition, Elsevier

[3] S . FEFERMAN. Theories of finite type related to mathematical practice, in Handbook of Mathematical Logic, ed. by J . Barwise, 1977. [4] K . C6DEL. " Uber eine bisher noch nicht benutzte Erweiterung des finiten Standpunktes", Dialectica, 12 ( 1 958) 280- 287. [5] C. W. HENSON , M. KAUFJ\IANN and H . .J . KEISLER, "The strength of proof theoretic methods in arithmetic", J. Symb. Logic, 49 ( 1984) 1039-1058. [6] C . W . HENSON and H . .J . KEISLER, "The strength of nonstandard analysis", J. Symb. Logic, 51 ( 1986) 377-386. [7] U. KoHLENBACH, Higher order reverse mathematics, to appear in Mathematzcs, ed. by S . C . Simpson.

Reverse

Subsystems of Second Order A rithrnetic. Perspectives in Mathematical Logic, Springer-Verlag, 1999.

[8] S . C . SIMPSON,

[9] A . S . TROELSTRA and D. VAN DALEN . Constructivism An Introduction, Volume I I , North-Holland, 1988.

in Mathematics.

The virtue of simplicity Edward Nelson*

Part I. Technical

It is known that IST (internal set theory) is a conservative extension of ZFC (Zermelo-Fraenkel set theory with the axiom of choice) ; see for example the appendix to [21 for a proof using ultrapowers and ultralimits. But these semantic constructions leave one wondering what actually makes the theory work-what arc the inner mechanisms of Abraham Robinson's new logic. Let us examine the question syntactically. Notational conventions: we use x to stand for a variable and other lowercase letters to stand for a sequence of zero or more variables; variables with a prime 1 range over finite sets; variables with a tilde - range over functions. We t'.a.ke as the axioms of IST the axioms of ZFC together with the following, in which A is an internal formula: (T) vstt [\lst xA -+ VxA] , where A has free variables x and the ww:iables of t, ( I ) vsty13xVyE y1 A +---> 3x\15t y A , (S) VStx3sty A (x, y) -+ 35tyvstx A (x, y(x)) .

"'lle

have written the standardization principle (S) in functional form and re quired A to be internal; we call this the restricted standardization principle. It can be shown that the general standardization principle is a consequence. All fim<:tions must have a domain. There is a neat way, using t.he rcHce tion principle of set theory, to ensure that y has a domain, but let me avoid discussion of this point. vVe do not take the predicate symbol standard as basic, but introduce it by x

is standard

+-r

35t y[y

=

x] .

this way vst and 3st. are new logical symbols and (I) , (S) , (T) arc logical axioms of Abraham Robinson 's new logic. In

*

Department of .Mathematics, Princeton University.

nelson@math . princeton. edu

2.

28

The virtue of simplicity

For any formula A of IST we define a formula A+ , called the partial reduc tion of A . It will always be of the form vstu38t vA. where A. is internal. It is

defined recursively as follows:

if A is internal, A+ lS A ( , A)+ is vst v3 8t u ,A ( u, v(u)) (A 1 v A 2 )+ lS vst u l U2 38t Vl V2 [A i v A2 ] (\ixA)+ is vst u35 t v1\ix3vEV1 A. (\i5t xA)+ lS vst xuvst vA •

• .

(We take ', V , and \i as the basic logical operators-the others can b e defined in terms of them.) It is understood when forming (A 1 V A 2 )+ that a variant may be taken (bound variables changed) to avoid colliding variables. If z are the free variables of A, then the reduction of A, denoted by A is the internal formula \iu3v1\iz3vEv1 A • . This is the same as the partial reduction of the closure of A with vst and 3 st replaced by \i and 3. We need only show that if A is an axiom of IST. then A is a theorem, and that for every rule of inference with premise A 1 (or premises A 1 and A 2 ) and conclusion B , if A� is a theorem (or A� and A2 arc theorems) , then B0 is a theorem. This turns out to be quite straightforward in the main, but there is one exception. When I spoke in Aveiro I thought I could present a truly simple syntactical proof of conservativity, but I was mistaken. This remains a desirable goal. So the first part of this paper celebrates the virtue of simplicity by its absence. The complication lies with the rule of detachment. or modus ponens. First we need a purely internal lemma. o,

o

Lemma 1

(Cross-section) Let A be internal. Then

3.;,\iu13z\ivE.;, (u) A(u, v, z)

+--+

3v\iu13z\iuEU1 A (u, v(u) , z) .

The backward direction is trivial: let .;, (u) forward direction, fix .;, and let n = IJ .;, (u) . Proof.

u

=

{v(u) } . To prove the

; Then n is the set of all cross-sections of ;1 • Each .;, (u) is a finite set; give it the discrete topology, so it is compact. Give n the product topology, so it is compact by Tychonov ' s theorem.

29

Part I. Technical

By hypothesis, for each u' there exists an element Vu' of n such that we have 3z'v'uEu1 A ( u, v(u) , z) (let Vu' be arbitrary outside u') . The u' are a directed set under inclusion, so u' vu' is a net in D . Since n is compact, this net has a limit point v, which has the desired property. D c--+

Corollary

2 (Dual form of cross-section) Again let A be internal. Then

v-0 ::Ju' \iz::lvE ;1 (u) A(u, v, z) Theorem 3

+--+

\iv::Ju'\iz::luEU1 A (u, v(u) , z) .

(Detachment) If A0 and (A ____, B)0 are theorems, so is B0 .

Proof. Let y be the free variables common to A and B, let w be the remaining free variables of A, and let z be the remaining free variables of B . We shall derive a contradiction from A0, (A ____, B)0, and '(B0) . These formulas arc (1) \iuo::lv0\iwoyo::lvoEv0 A " (uo , vo , wo , yo) (2) \fvi r i::JU�s�\fWlYIZI::JUI EU�SIES� [-,A" (ui , vl (ul) , wi , y1) V B• (r i , s i , YI , z l )] (3) ::Jr2 \is� ::Jy2 z2 \is 2ES� -,B• (r2 , S2 , Y2 , Z2 ) . Fix r2 and let r i r2 . (That is, delete ::lr2 in (3) , replace the variables r2 by constants also denoted by r2 , delete \iri in (2) and replace each occurrence of I"I by r2 .) �ow apply choice to (1) to pull out v0 as an existentially quantified function v0 of uo . Notice that (2) has the form of the right hand side of the dual form of the cross-section lemma, so replace it by the left hand side. In this way we obtain ( 1') 3v0\iuozo::lvoEv0 (uo) A• (uo, vo, wo, Yo) (21) \fv� ::Ju� S� \fzaui Eli� SI ES�\fvi EV� ( lii , sl) [-,A • ( lii , VI , WI , yl ) V B • (r2 , s 1 , YI , z l )] (3') \is�::Jz2\iS2ES� -, B• (r2 , S2 , Y2 , Z2 ) . =

=

Fix v0; let v� b e defined by v� (u , s ) v0 (u ) for all u and s ; fix u� and s� ; let s� s� ; fix Y2 and z2 ; let YI Y2 and ZI z2 , and let WI be arbitrary; let wo = WI and zo = z2 ; fix lii and s i ; let uo = lii and s2 = si ; fix vo; let VI = vo . Then we have ( 1") A• ( lii, vo, WI , Y2 ) (2") -, A" (ui , vo , WI , Y2 ) V B • (r2 . s 1 . Y2 , z 2 ) (3") -, B• (r2 , SI , Y2 , z2 ), which is a contradiction. D =

=

=

2. The virtue of simplicity

30

I have sketched the main step in a syntactical proof of the conservativity of IST over ZFC . But a better argument is needed, one that gives a practical method for converting external proofs into internal proofs. This should be possible. Whenever one uses an ideal object, such as an infinitesimal or a finite set of unlimited cardinal, it depends on the free variables in only a finite way. I expect it to be possible to develop a syntactical procedure that examines the external proof and establishes this dependence in an internal fashion.

Part II. General Much of mathematics is intrinsically complex, and there is delight to be found in mastering complexity. But there can also be an extrinsic complexity arising from unnecessarily complicated ways of expressing intuitive mathemati cal ideas. Heretofore nonstandard analysis has been used primarily to simplify proofs of theorems. But it can also be used to simplify theories. There arc several reasons for doing this. First and foremost is the aesthetic impulse, to create beauty. Second and very important is our obligation to the larger scien tific community, to make our theories more accessible to those who need to use them. To simplify theories we need to have the courage to leave results in sim ple, external form-fully to embrace nonstandard analysis as a new paradigm for mathematics. Much can be done with what may be called minimal nonstandard analysis. Introduce a new predicate symbol standard applying only to natural numbers, with the axioms: is standard, (2) if n is standard then n + 1 is standard, (3) there exists a nonstandard number, (4) if A (O) and if for all standard n whenever A (n) then A (n + 1 ) , then for all standard n, A(n) . (1)

0

A prime example of unnecessary complication in mathematics is, in my opinion, Kolmogorov ' s foundational work on probability expressed in terms of Cantor ' s set theory and Lebesgue ' s measure theory. A beautiful treatise using these methods is [1] , but some probabilists find the alternate treatment in [3] more transparent. Please do not misunderstand what I am saying; these re marks are not polemical. Simplicity is not the only virtue in mathematics and I wish in no way to discount other approaches to the usc of nonstandard analysis in probability. I just want to encourage a few others to explore the possibility of using minimal nonstandard analysis in probability theory, functional analysis, differential geometry, or whatever field engages your passion.

Part II. General

31

I n this spirit I shall give a few examples from [3] . A finite probability space is a finite set n and a strictly positive function pr on n such that

L pr ( w )

w Ell

=

1.

( The set n is finite but we do not require its cardinal to be standard. ) An event is a subset lvf of n, and its probability is Pr ( M )

=

:

A random variable is a function X n

L pr ( w ) .

w EM --+

� ' and its expectation is

Ex = L x ( w)pr ( w). w Ell

If a E � ' we define

x ( a) (w)

=

{

x(w) , lx(w) l :S a 0, otherwise.

A random variable x is L 1 in case

Elx - x ( a) I � 0 for all a � oo. (Radon-Nikodym) A random variable x is L 1 if and only if we have E lx l « oo and for all events M with Pr ( M ) 0 we have ElxlxM 0.

Theorem 4

�

�

Suppose that x is L 1 . We have E lx - x ( a) I :S 1 for all a oo, so by overspill this is true for some a « oo. Then E l xl :S E l x - x ( a) I + Elx (a ) I :S 1 + a « oo. Now let Pr ( M ) 0. Let a oo be such that aPr ( M ) 0-for example, let a = 1/ JPr ( M ) . Then �

Proof.

�

�

�

Elx lxM :S Elx ( a ) IX M + E lx - x (a ) I X M :S aPr ( M ) + Elx - x (a ) I � 0.

Conversely, suppose that E lxl « oo and that for all M with Pr ( M ) 0 we have E lxl x111 � 0. Let a � oo and let M = { l x l > a}. Then we have Pr ( M ) :S E lxl /a � 0, so that Elx i Xli.J � 0; that is, E lx - x (a ) I � 0. D A property holds almost everywhere ( a.e. ) in case for all » 0 there in an event N with Pr ( N) :S such that the property holds everywhere except possibly on N. c;

Theorem 5

�

c;

(Lebesgue) If x and y are L 1 and x � y a. e., then Ex � Ey.

32

2. =

The virtue of simplicity

::o

Proof. Let z x-y. Then z 0 a.e. For all A » 0 we have Pr( { l z l so by overspill this holds for some infinitesimal A. But then

:::0:

.A})

<::; A,

and since z is L l , E l z l 0 by the previous theorem. Hence Ex Ey. D One final example, useful in probability theory but more general. Let I be a finite subset of [0, 1] of the form ::o

0 = to < t 1

::o

·

·

·

<

tv- 1

<

tv = 1

::o

such that t11 tfl+ l for all 0 <::; p, < v. To the naked eye, I looks just like [0, 1] . Although I is finite, it is "uncountable" in the following sense:

(Cantor) For any sequence x : N --+ I there exists t E I such that t is not infinitely close to any X n with n standard.

Theorem 6

Construct to by changing the nth decimal digit of Xn , so that l to - x n l :::0: w- n for all n. Let t be the greatest clement of I that is less than to; then t is in I and has the desired property. D

Proof.

References [1]

PATRI C K BILLINGSLEY, Convergence of Probability Measures, second edi tion, Wiley & Sons, Inc., New York ( 1 999) . "Internal set theory: A new approach to nonstandard analysis", Bull. Amer. Math. Soc., 83 ( 1 987) 1 165-1 198.

[2] EDWARD N ELSON ,

Radically Elementary Pmbability Theory, Annals of Mathematics Studies # 1 17, Princeton University Press, Princeton, New Jersey ( 1987) . http : I /www . math . princeton . edu/�ne lson/books . html

[3] EDWARD NELSON,

Analysis of various practices of referring in classical or non standard mathematics Yves Pera.ire *

3.1

Introduction

The thesis underlying this text is that the various approaches of mal:he matics, both the conventional or the diverse non standard approaches, pure or applied, are characterized primarily by their mode of referring and in particular by the more or less important usc of the reconstructed reference, the reference to the sets and collections, that I will distinguish from the direct reference, the reference to the world of the facts in a broad sense The direct reference, in traditional mathematics as well as in non standard mathematics ( for the main part) is ritually performed in the dassical form of modelling, consisting in confronting tlw facts to a small paradise ( a set ) correctly structured. So the discourse on the model acts like a metaphor of the modelized reality. I will defend another approac:h, the relative approach, which consists in using the mathematical languages like genuine languages of communication, referring directly to the facts, but accepting the usage of the metaphor of sets ( rcveleacl as such) too strongly culturally established. My thesis will be illustrated by the presentation of two articles .

1. A mathematical framework for Dirac's calculus 1141. 2. Heaviside calculus with no Laplace transform 1 151. The first one starts wit.h a semantical analysis of Dirac's article introducing the Delta function. Dirac said: "o is improper, o' is more improper". If we give it the meaning: "o is uot completely known, and 81 is less known", it works. So it is nec:essary to translate it in Relative Mathematic:;' language. We know that the non relative attitude consists in the formal a.Ifirmation that a rnodel of the delta function is perfectly determined in a paradise, a space of generalized *Dcpartement de Ma1.h6matiques, Universitc Blaise Pascal Yves . PERAIRE@math . univ-bpclermont . fr

(Clermont TT) .

3.

34

Analysis of various practices of referring

functions. The continuation of the metaphor requires to provide the paradise with a topology. In the relative approach, we prefer to explore more deeply the concepts of a point, infinity, equality with words of relative mathematics. Finally the opposition NON STANDARD/ S TANDARD

is replaced by the antagonism RELATIVE/NON RELATIVE.

3.2

G€meralites sur l a referentiation

Cette conference presente sous un autre angle, en les precisant, certaines des idees exposees au congres PILM 2002 et qui paraitront dans [13] . Les consi derations generales concernant la semantique des langues mathematiques, qui sous-tendent mon expose, ne sont pas nouvelles. Toutefois, il est necessaire de les reactualiser pour tenir compte des pratiques specifiques des mathematiciens non standard. On a dit quelquefois que la demarcation entre les diverses pratiques des mathematiques non standard, se ferait a partir du materiel linguistique mis en oeuvre. En gros, les differentes ecoles non standard parleraient des meme chases mais utiliseraient des dialectes differents. Nous savons bien pourtant que le langage des mathematiques. classiques ou non standard, est contenu dans le langage du premier ordre du calcul des predicats. Il est vrai cependant que les differentes ecoles non standard n ' uti lisent pas la meme partie de ce langage ; le seul predicat binaire E » pour les classiques, le predicat E » et le predicat unaire st » pour l ' ecole ' Nelson-Reeb, j utilise personnellement le predicat E » et un autre predicat binaire, le predicat de Wallet « SR » . En realite, ces differents choix induisent (aut ant qu ' ils sont induits par) des pratiques sernantiques differenciees et c ' est l ' analyse de ce rapport au sens qui permet de comprendre la nature des differentes ecoles mathematiques, standard ou non standard qu ' elles se veuillent appliquees ou se pretendent pures. Je distinguerai schematiquement trois mode d ' attribution du sens, que je designerai par les terrnes reference directe, - reference reconstruite ou - reference directe elargie. Ce que j ' appelle reference directe, c ' est la reference aux faits au sens large, ne se lirnitant pas a la description des phenomenes, s ' autorisant aussi la des cription des concepts. C'est en gros le mode de reference des physiciens. La reference reconstruite, c ' est la reference aux entites mentales stables engendrees par le discours en langue mathematique les ensembles introduits formellement par le quantificateur 3. les collections. «

·

·

·

·

«

·

·

:

«

·

«

·

·

3.2. Generalites sur la referentiation

35

On pourrait dire que le referent des mathematiques pures consiste essentiel lement en ces entites, ejectees a partir des axiomes de la theorie des ensembles. Dans les faits, les chases sont beaucoup plus melangees. En effet, meme si elle n'est pas mise en avant, la reference au monde des "faits elargis" aux concepts naturels, concept d ' ensemble, concept naturel de nombre, a une certaine idee du fini ct de l ' infini, n ' est pas absente de la pensee des mathematiciens purs. La pratique des mathematiques appliques constitue un pas vers le 3eme mode d ' attribution du sens ; la modelisation en est la forme la plus classique. Le refe rent est bien le mode des faits, mais le mode d ' expression est systematiquement celui de la metaphore, prise dans l'univers reconstruit. Les lignes precedentes illustrent ce que j ' entend par referent « ce a quai le signe linguistique renvoie dans la "realite extra-linguistique" soit l ' univers "reel", soit l'univers imaginaire ». Mais de quels signes parlons-nous ? A quelle langue appartiennent-ils ? Il est tout a fait clair que c ' est de la langue de communication, l'anglais ou le fran<;ais, dont je parle et non pas des languages bien construites, de ZF de IST de RIST ou autre. En realite ces langues sont utilisees comme matrices pour produire des entites structurees et egalement comme une ecriture abregee de la langue naturelle. une sorte de stenographie. La pratique que je vais essayer de decrire maintenant, que j ' appelle refe rence directe elargie, prend acte des pratiques semantiques d ' une partie du reseau Georges Reeb, les rend explicites et ouvre la voie a une generalisation. Ce qui caracterise cette approche, ce n ' est pas la double reference - au monde des faits d ' une part, au monde reconstruit de l ' autre - qui est une pratique courante des mathematiques appliquees, mais le fait que ce processus de desi gnation du sens concerne non plus seulement la langue vernaculaire, mais aussi la langue mathematique, que l ' on se met a pratiquer alors, comme le fran<;ais ou l ' anglais, en alternant style direct et style metaphorique en quantite variable selon ses preferences. Voici deux exemples de pratiques semantiques utilisant une mctaphore. Considerons l ' enonce en fran<;ais « grand pere est mort » . Il possede clai rement une reference dans le monde des faits. On peut transcrire cet enonce par un autre, metaphorique, « grand-pere est au paradis ». Admettons maintenant que ces deux enonces disent la meme chose et que l ' on veuille a partir de l ' un ou l ' autre explorer plus finement le concept de la mort. On peut le faire de plusieurs manicres. La methode la plus directe consiste a utiliser le mot « mort » et a preciser encore le concept par ! ' introduction d ' autre mots. On peut aussi tenter de « pousser la metaphore » un peu plus loin et proposer une description du paradis. Le second exemple provient des mathematiques de la physique. Le point de depart est !'article de Paul Dirac dans [ 2] , dans lequcl est intro duite la fonction cl'. Si on ne regarde que les formules et les calculs qui figurent dans cct article, alors la definition donnec conduit sans nul doute a une contra:

:

36

3. Analysis of various practices of referring

diction. Il reste a expliquer pourquoi, malgre cela, ils ne conduisent pas a des absurdite physiques et semblent meme avoir une certaine puissance explicative. Le probleme de la contradiction tel qu'il a ete resolu par Laurent Schwartz [19] a consiste a parler de la fonction de Dirac, qui relie des grandeurs physiques, en termes d'un element d'un paradis repute sur, l'espace des dis tributions. C'est une metaphore similaire a celle utilisee precedemment pour parler de la mort de grand-pere, en effet dans les deux cas (a) On veut rendre compte d'une relation entree/sortie : vie/mort d'un cote, [valeur 0] / [valeur 1] de l'autre. Il y a une breve phase de transition de l'etat initial a l'etat final, que l'on ne peut pas decrire. On pense que l'etat des chases pendant cette phase doit expliquer le changement d'etat. (b) Dans chaque exemple l 'indetermination factuelle de la phase de transi tion est exprimee par une determination dans un univers mythique, qui pretend expliquer le phenomene. Il faut rernarquer toutefois que le mythe utilise dans la demarche mathematique est d'une bien plus grande cohe rence, peut-ctre parce que la langue mathematique, qui l'a engendre, est exempte d'ambigui'te. ( c) Il y a dans les deux cas une tentation de pousser trap loin la metaphore. La metaphore de l'ensemble des distributions fonctionne assez bien, elle per met de faire correspondre a certaines fonnules posees par Dirac des transcrip tions acceptables et utilisables mais on sait que certains faits ne sont pas pris en compte par ce modele, la multiplication n'est pas perrnise, l'egalite au sens des distributions s'eloigne assez de la realite de l'egalite physique . . . D ' autres espaces abstraits, fonctions generalisees de Colombeau, ultrapuissances satu rees . . . permettent de tenir un discours formellement coherent sur la multipli cation des fonctions de Dirac. Cependant, malgre la reelle efficacite mathematique de ces modelisations, on peut souhaiter une approche plus directe, dans laquelle le champ seman tique serait mains encombre par les representations ensemblistes. Il ne s'agit pas de renoncer complctement a ce type de methodes, mais plutot de briser l'automatisme du recours a ces dernieres. Ma position sur ce point diverge done de celle de Gilles Gaston Granger dans [4] pour qui ce qu'il appelle la sortie de l 'irrationnel ne peut se faire que par l 'introduction d'un espace fonction nel. J'ai choisi une approche plus directe du rneme problerne ; elle commence par une analyse semantique du texte de Paul Dirac. Cette analyse permet de decouvrir plusieurs faits, qu'il faut introduire dans la description en langue mathematique. En voici quelques uns. le point materiel a une epaisseur non nulle, physiquement infinitesimale, - la fonction cl' est partiellement indeterminee dans le point,

3. 3. Le calcul de Dirac. L'egalite de Dirac

37

- l'egalite dans les formules d e Dirac n' est pas l'egalite classique des en sembles. Notre travail consistera done a introduire, dans une langue bien construite, les mots pour exprimer ces faits (par une sorte de traduction de la langue naturelle) . En ce qui concerne l'egalite de Dirac, l'analyse du texte fait appa raitre que Dirac identifie a cl' toute fonction nulle en dehors du point origine et d'integrale egale a 1, je donnerais done une definition de l'egalite qui integre ce fait . Jusqu'a present , le langage de la theorie relative des ensembles (voir [ 1 2] ) m ' a suffi pour fabriquer tout l e lexique necessaire. Je propose d'appeller MATHEMATIQUES RELATIVES la pratique que je viens de decrire. En consequence les mathematiques classiques devront etre quali fiees , pour une large part , de non relatives. En guise d'illustration je vais main tenant presenter les resultats et la philosophie de deux articles concernant les mathematiques de la physique. 1 . A mathematical framework for Dirac's calculus [14] . 2. Heaviside calculus with no Laplace transform [ 1 5] . Une partie importante de chacun de ces articles consiste a mettre en place une definition ad hoc de l'egalite.

3.3

Le calcul de Dirac. L 'egalite de Dirac

Une description precise des notions evoquees plus bas, niveaux d 'impro priete, derivees observees, egalite de Dirac est disponible dans /14}, chapitres 1 et 2 ainsi que de nombreux exemples de couples de fonctions Dirac-egales. Dans la discussion qui va suivre nous utiliserons des definitions incompletes pour ne pas cacher l 'essentiel, qui est la philosophie de ce travail. En parti culieT la defim:tion de l'egalite de Dirac, Definition 4 de la section 2.4, est plus complexe. Comment justifier les calculs de Dirac ? On a compris qu'il fallait recon naitre un sens physique a la notion de point, eclairer aussi la signification du signe « = » et exprimer cela en langue mathematique. L'etape initiale neces saire pour y parvenir consiste a introd uire des predicats pour dire l 'indetermi nation. Cette expression de !'indetermination est cachee dans les formulations de Dirac. En effet Considerons l'enonce tire de [2] : « Strictly of course, c\'(x) is not a proper function of x, . . . cl'' (x) , cl'" ( x) . . . are even more discontinuous and less proper than cl' ( x) itself. »

.J'ignore quelle signification precise Dirac donnait au mot « improper » , cependant si on l' interprete, en fon;ant un peu le sens, par « partiellement

3. Analysis of various practices of referring

38

indetermine », la transcription fonctionne. Par contre, la traduction par le mot « anormal » qui a ete donnee dans [3[, pout provoquer m1 bloquage. Voici rapidement cmmncnt, dans l'articlc l14j , j ' ai cxprimc les choses. 1 . .J 'ai utilise lc pr6dicat de standardicit6 relative ·SR- pour definir les ni veaux d'impropri6tc. On pourra trouver dans [14] , premiere sect ion , la definition precbe des niveatDc d'impropriete. Rapidcment , nous clirons que plus un objet est impropre moins il est standard. 2 . A tout nombre p j 'associe le point analyse 1 p 1 , c'est un halo dont le clegTc d'infinit6simalite est d'autant plus petit que p est improprc. 3 . .Jc definis Ia fonction de Dirac principale o de la maniere suivante : (a) Je fixe Ull infinitesimal h1 strictement positif et impropre, 1 (b) je pose o 2 1 Ind [-h1 ' h.] . =

tl

obt ient. pour o le graphe suivant, que l'on pout t;rouvcr aussi dans les livres de physique . .J'ai reprcsent 6 SUI' lc memc graphiquc la jonction echelon de Heavis•ide, elle vaut 1 pour les valems positives de la variable et 0 partout ailleurs. On

1/2h1

h1

Figure 3.1: Dirac

La fm1ction

de

Heaviside ct

sa derivee observee, la fonction

de

Le rcftexe de tout matbematicien sera d'obscrver que rna definition de o n'est pas indcpcndante de la valeur de h1 . La r6ponse a cette objection n eccssite une meilleure investigation de la notion d 'egalite.

4 . .Je definis la Dirac-egalite, « g » sur un ensemble de fonctions c= par morceaux, standard on pas. de telle sortc que toute fondion ayant (;outes ses d erivees infinitesirnales pour les vaJeurs appreciablcs et d 'intcgTale infiniment prochcs de 1 sur tout. int.ervallc [x, y] contentmL 0 ct; ayant des bornes appreciables, soit Dirac-egale a la fonction de Dirac principale.

39

3.3. Le calcul de Dirac. L'egalite de Dirac

5. J ' ai defini la derivee observee d'une fonction, standard ou pas, pouvant presenter des discontinuites du premier ordre par une fonnule de la forme

f(x + h) - f(x - h) 2h qui, a premiere vue, depend de h, ou h est infiniment petit d'un ordre de petitesse adapte au niveau d'impropriete de f. Apres avoir pose cette jl'l (x)

=

definition,

il est pn?jerable de resister a la tentation de cherche·r 1Lne limite quelque-part quand h tend vers 0. On repousse ainsi !'intrusion dans le champ semantique d'un objet encom brant. La derivee observee de la fonction de Heaviside est precisement notre fonction de Dirac principale. On montre facilement que la valeur de jl ' l est Dirac-independante de h :

deux valeurs distinctes de h donnent des valeurs Dirac-egales de la derivee observee. Cela repond a !'objection evoquee a l'item 3 et explique aussi pourquoi dans la notation de la derivee observee je n'ai pas fait appara1tre l'accroissement h. O n s e convaincra facilement que cette approche donne une description plus proche de la realite physique. Le chapitre 3 presente les proprietes de base de la derivation observee. Panni celles-ci nous avons :

Compatibilite avec la derivee classique. Si

f est derivable f' g jl'l .

Linearite au sens de Dime, Si

f et g ant le meme niveau d'impropriete,

alors

Formule de Leibniz au sens de Dirac. Sous la meme condition sur le niveau d'impropriete, alors

(j X g) J ' [ g j X gl ' [ + jl ' i X g , On trouvera dans [14] , section 3 theoreme 4, une formulation d'ecriture plus complexe mais qui s'applique aux cas ou les fonctions f et g ont des niveaux d'impropriete distincts.

40

3. Analysis of various practices of referring

On remarquera que ces egalites ne sont pas testees sur des espaces fonction nels comme chez Schwartz ou Colombeau et que toutes les multiplications sont autorisees. Cela permet de legitimer des formules de physiques ou interviennent des carres de fonction delta, cornme celle-ci dans laquelle 6 est la fonction de Dirac principale :

Il existe une formule analogue dans [1] dans laquelle 6 est representee dans l'espace des fonctions generalisees de Colombeau et ou l'egalite est une egalite faible. La formule doit etre modifiee si on utilise une autre fonction de Dirac que la fonction de Dirac principale. Remarques J 'ai qualifie d'egalite la relation £. On peut preferer voir en cette relation une indiscernabilite ou une approximation de " quelque chose quelque part". Je pretends pourtant que la relation de Dirac merite, autant que l'egalite clas sique des ensembles, le nom d'egalite. En effet que dit la formule precedente ? En dehors du point origine, suppose d 'cpaisseur radicalerncnt indctectable, ct pour toute valeur accessible de la variable, les fonctions de part et d'autre du signe £ ainsi que leur derivees, a tout ordre accessible, prennent des va leurs indistinguables a w -n pres quelle que soit la valeur de n a laquelle on puisse acceder. Les integrales iterees a tout ordre materiellement possible, avec des barnes d'integrations de part et d' autre du point I 0 I sont egalement in distinguables. On peut dire que deux fonctions Dirac-egales representent des phenomenes physiquement identiques. En revanche, si on tourne le regard vers le monde des entites, la relation £ n'est pas une egalite, c'est une relation d'equivalence : mais on sait bien que ce qui est presente comme egalite, dans le monde reconstruit, n'est qu'un avatar de !'equivalence logique. On remarquera que je n'ai pas hesite, dans rna description du calcul de Dirac, a faire usage de !'ensemble des nombres reels, pour parler des grandeurs physiques . .Je n 'ai commence a renoncer a la metaphore ensembliste qu'a partir du moment ou elle m'a semble genante. Kinoshita dans [6] et Grenier dans [5] ont prefere pour leur part utiliser un " continuum discret".

Chacun est libre d 'utiliser la quantite et le type de representations ensem blistes qui lui plait, chacun a droit a son style propre. On remarquera que les forrnules prcccdentes parlent en premier lieu de

faits concernant des grandeurs physiques, pas des ensembles. ll vaut mieux cviter, bien que ccla reste possible, et mcme souhaitable, pour d'autres tra vaux, d ' attribuer un statut ontologique dans la "realite mathematique" aux

3.3. Calcul de Heaviside sans transformee de Laplace

41

infinitesimaux d e differents niveaux dont o n fait usage, cela risque d e devenir vite desagreable. L'enrichissement du vocabulaire que nous avons realise ici avait pour but de rendre plus precis le discours en langue mathematique . . . mais si on tournait les yeux vers les ensembles on pourrait y voir un monde d'entites de plus en plus ideales, un paradis etrange d ' idealites stratifiees . . . La fonction de Dirac principale est celle que l'on obtient . prenant au serieux notre ignorance ce qu'il se passe dans le point I 0 I , quand on applique la definition de la derivee observee a la fonction echelon de Heavisidc. Dans la section suivante, j 'utiliserai des fonctions de Dirac a droite de zero et de classe c= . Cela peut sembler beaucoup de precisions, pour une fonction mal connue a l ' origine. Cela peut se justifier de la maniere suivante : dans l'article que je vais presenter le but n'est pas de decrire une partie de la realite physique mais de donner une plus ample "justification" pour le concept de Hea viside d ' operateur de derivation, ayant pour inverse un operateur d'integration.

3.4

Calcul de Heaviside sans transformee de Laplace. L'egalite de Laplace

Comme pour la section precedente, Finviterai le lecteur a se reporter a un article complet, l'article {1 5}. C'est la philosophique sous-jacente a cet article que je veux presenter ici, je devrai done passer sous silence les demonstrations, et mJ�me donner des definitions incompletes. Le calcul operationnel, quand on l'applique formellement, sans verifier !'existence d'une transformation de Laplace, donne les bonnes solutions. Ces solutions elles meme n'ont le plus souvent pas d'integrale de Laplace, du mains dans les cas que j "ai pu traiter. D"autre part on a le sentiment que ces methodes ne devraient pas dependre de la convergence de l'integrale de Laplace. Aussi ai-je tente (avec succcs) de mettre en place une notion d ' image generalisee qui opere meme quand les fonctions en jeu ne sont pas Laplace-transformables. Bien sur. ce n'est pas la premiere tentative dans ce sens. Aprcs le premier tra vail de Vigneaux publie en 1929 dans [20] d'autres auteurs dont recemment , Komatsu dans [7] , G.Lumer et F. Neubrander dans [8, 9] se sont attaques a ce probleme par des methodes classiques. L 'approche relative permet d'obtenir une solution complete et beaucoup plus directe. Les resultats obtenus sont les suivants. 1 . Les difficultes dues a la divergence de l'integrale de Laplace disparaissent . 2 . Les derivations par rapport aux parametres sont toujours permises. 3. On peut utiliser des series divergentes comme images generalisees. 4. Les calculs peuvent faire intervenir des fonctions de Dirac, des peignes de Dirac, etc. sans qu'il soit necessaire d'introduire des espaces fonctionnels.

3. Analysis of various practices of referring

42

5. L'application des methodes de ce calcul de Heaviside rejustifie, aboutit a une description plus fine des solutions trouvees. En particulier, si il apparalt de !'indetermination dans une equation au derivees partielles, a cause de la presence d'une fonction delta, alors on retrouve la trace de cette indetermination dans les solutions (Voir l'exemple 1 plus bas, a titre d'illustration) . lndiquons maintenant plus precisement le chemin suivi. Notre calcul va s' appliquer a une classe de bonnes fonctions, par exemple de classe c= mais ' pas necessairement integrables. Definition de l ' e galite de Laplace Contrairement a rna definition de l'egalite de Dirac, dont la description s'inspirait de !'observation de l'usage que faisait Dirac de l'egalite, rna definition de ce que j 'appelle egalite de Laplace n 'est pas justifiee par une pratique de Laplace. La definition utilise deux niveaux d'improprietes definis au moyen du predicat SR Cette classification permet de definir des ordres de grandeur relatifs dans R Nous connaissons les definitions suivantes. Un nombre X est infinitesimal et on ecrit X 0 si ·

·.

rv

(1)

Le nombre X est infiniment grand e t on ecrit X

rv

+oo si

vstl l x l > l. Nous dirons maintenant qu'un nombre x est un infinitesimal ccrirons X � 0, si

( 1 ')

relatij, et nous (2)

et que

x est un infiniment grand relatif si vimpl > 0 l x l > l

(2')

ce qui s'ecrira x � +oo. (VImpx F(x) est une abreviation pour Vx (Imp(x) =? F(x)) , ce qui s'oralise « pour tout x impropre, F(x) » ) . On peut montrer que tout nombre relativement infinitesimal est infinitesi mal. Un nombre non relativement infiniment grand est dit relativement limite. On fixe ensuite une fois pour toute : un nombre N relativement infiniment grand, - une fonction de Dirac r-elative a droite de 0, 6.

43

3.4. Calcul de Heaviside sans transformee de Laplace

Cela signifie que 6 reste une fonction de Dirac meme pour un observateur ideal capable de "voir" des fonctions de Dirac impropres. Cette fonction 6 est tres impropre, elle est nulle en dehors d'un intervalle [0, h] avec h � 0. J 'ai defini ensuite dans [15] une collection N de fonctions negligeables de classe c= de deux variables X et t. La generalisation a un nombre plus grand de variables ne pose pas de probleme. On peut meme etendre nos resultats aux cas ou on a un nombre infiniment grand, impropre, de variables. Cela peut etre utile si on a besoin de faire intervenir des parametres qui sont des fonctions standard, representees par un ensemble fini impropre contenant toutes leurs valeurs standard. La classe N vcrifie les proprictcs suivantes. Si a(x, t) E N alors a(x, t ) � 0 pour tout x relativement limite et tout t rclativement appreciable .

t · N c N.

Si * designe le produit de convolution des fonctions, alors

- N * N c N. 1 * N C N pour toute bonne fonction f. an +mN c N, pour taus n E N et m E Z. limites axm atn

On remarque que dans la derniere propriete m peut etre negatif. La derivee d'ordre negatif - k d'une fonction a s'obtient en calculant l'integrale

On integre k fois

Remarque.

N * N, 1 * N ou

La classe N, "n'est pas" un ensemble pas plus que t N, Les inclusions precedentes sont des inclusions de ·

�::';-;!: .

collections. On dcfinit alors l'cgalitc de Laplace £: en posant

F

£

=

G

Def ¢'?

::Ja E N F

-

G

=

!:a

ou !:a est la transformee de Laplace classique de a, qui doit done exister. On pose ensuite

La fonction (J:cl") (p) n'est pas cgalc a 1 , comme pour la distribution de Dirac. On trouve parfois dans les livres de physique, concernant les fonctions de Dirac, l'cgalitc (J:ll) (p) = 1 . En rcalitc pour une fonction de Dirac D. non standard, on a ( 1:6) (p) 1 pour les valeurs limitees de p. Notre fonction 6, fixee plus haut , vcrifie quant a elle rv

3. Analysis of various practices of referring

44

pour tout

p > 0 relativement limite, ( £.6) (p)

�

1.

Dcmontrons ce dcrnier point. La deuxicmc formule de la moyenne donne, pour chaque p > 0,

( L. 6) (p)

=

l

epeh h 6 ( t ) dt, e E [0, 1] .

h � 0 implique Bh � 0, pBh � 0 pour p relativement limite et done, epeh � 1 . Comme J0h 6(t ) dt � 1 , on en deduit que ( L. 6) (p) � 1 pour tout p relative

ment limite.

Notation. On ecrira .f transformation de Laplace.

:::JJ

F si F

£

L.f et .f :::J F si .f admet F comme

Quelques resultats obtenus Pour toutes bonnes fonctions f et g standard ou impropre 1 . Lf est Laplace-independant de 6. Des choix distincts de la fonction donneraient des transformations Laplace-egales. 2. Si f :::JJ F et g :::JJ F, alors f = g. 3. S i f est de type exponentiel, f :::J F =? f :::JJ F .

4. 5. 6.

Ce qui precede implique que toutes les fonctions qui figurent dans les tables de transformations de Laplace sont des images generalisees. (Lf') (p) � p(Lf) (p) - f(O+) · ( L. 6) (p) . t L f(s) ds (p) � (Lf) (p) .

(i

)

j(t )

diverge.

·

L

£

n =O

aj £ aLj .

8 . L ax

�

£ (Lf) (Lg) . += = an t n alors LF

L(f * g)

7. Si

3.5

6

ax

N

L

n =O

an

+= , mcme si la scrie an --n+ L n. l pnn.+ 1 =O p I

I

n

. . etc.

Exemples

Exemple 1 . Recherche d e l a solution d e l'equation aux derivees partielles d€dinies pour X > 0, t > 0. 2 a u (x, t ) ax 2

-

a u (x, t) = at

e t2

45

References

u ( O+ , t )

=

u ( x, 0 + )

� ( t) ,

=

0 pour tout x.

� est une fonction impropre de Dirac. L' application du calcul formel generalise donne la solution "exacte"

u ( x , t)

=

[� (t) - H (t)

*

et2

J 2� e *

-4�2 +

H ( t) et2 • *

Cependant LA REPONSE PHYSIQUE CORRECTE au probleme est la suivante : Pour taus x > 0 et t > 0. (a) Si t est appreciable la solution est indiscernable de

(b) On ne connait pas avec precision le comportement de la solution quand est tres petit . Cette imprecision est heritee du celle de la fonction � .

t

Exemple 2. La fonction e - t2 a une transformation d e Laplace e t admet u n developpe ment en serie de rayon infini, cependant sa transformee de Laplace n ' est pas egale a la somme - qui n'est pas definie - des transformees de Laplace des termes de la serie. On a cependant

Pour la fonction e t 2 il n 'y a pas de transformee de Laplace mais on peut ecrire une image generalisee : t2 "" (2n) !

e :::JJ

6

2 n < txl

n l n+ l "

.p

En conclusion, je dirai que l'examen systematique des questions de seman tique peut accroitre considerablement l'efficacite de l'outil mathematique. Nous venons de le constater pour les mathcmatiques de la physique.

References [1] B . DAMYANOV, "Multiplication of Schwartz distributions and Colombeau Generalized functions", Journal of Appl. Anal . , 5 ( 1 999) , 249-60. [2] P . A . l\1 . DIRAC, "The physical interpretation of the Quantum Dynamics", Proc. of the Royal Society, section A, 1 1 3 ( 1 926-27) 62 1-641. [3] P . A . l\ 1 . DIRAC, Les Gabay, Paris, 1990.

principes de la mecaniq1L e quantique

( 1 93 1 ) , .Jacques

46

3. Analysis of various practices of referring

[4] G . G . G RANGER,

L 'irrationnel,

Editions Odile Jacob, 1 998.

[5] .J . P . G RENIER, "Representation discrete des distributions standard", Osaka J. of Math. , 32 ( 1 995) 799-8 1 5 . [6] l\1 . KINOSHITA, "Non-standard representations of distributions". Osaka .J. Math. , I 25 (1988) 805-824 and II 27 ( 1 990) 843-861 . [7] H . KoMATSU, "Laplace transform of Hyperfunctions a new foundation of the Heaviside Calculus", .J . Fac. Sci. Tokyo, Sect . lA. Math. , 34 ( 1 987) 805-820. [8] G . LUMER and F. NEUBRANDER, The Asymptotic Laplace Transform and evolution equations, Advances in Partial differential equations, Math. Topics Vol. 16. Wiley-VCR 1999. [9] G . LUMER and F. NEUBRANDER, Asymptotic Laplace Transform and re lation to Komatsu's Laplace Transform of Hyperfunctions, preprint 2000. [10] .J . MIKUSINSKI. Bull. Acad. Pol. Scr. Sci. Math. Astron. Phys. , 43 (1966) 5 1 1-13. [ 1 1] E. NELSON, " Internal set theory : a new approach to nonstandard analy sis", Bull. A.M . S . , 83 ( 1 977) 1 165- 1 1 98. [12] Y . P ERAIRE, "Theorie relative des ensembles internes", Osaka .J. Math. , 29 ( 1 992) 267-297. [13] y. P ERAIRE, "Le replacement du referent dans les pratiques de l'analyse issues de Edward Nelson et de Georges Reeb", Archives Henri Poincare, Philosophia Scientiae (2005 ) , a paraitre. [14] Y. PERAIRE, "A mathematical framework for Dirac's calculus", Bulletin of the Belgian Mathematical Society Simon Stevin, (2005 ) , a paraitre. [ 1 5] Y . PERAIRE, Heaviside calculus with no Laplace transform, Integral Transform and Special Functions, a paraitre. [16] C . R A J U , "Product and compositions with the Dirac delta function", J. Phys. A : Marh. Gen . , 1 5 ( 1 982) 381-96. [ 1 7] K . D . STROYAN and W .A . .J . LUXEMBURG, Introduction infinitesimals, Academic Press. New-York, 1976 .

to the theory of

[18] L . ScHWARTZ, "Sur l'irnpossibilite de la multiplication des distributions", C. R. Acad. Sci. Paris, 239 ( 1 954) 847-848. [19] L . ScHWARTZ,

Theorie des distributions,

Hermann, Paris ( 1 966)

[20] J. C . VIGNEAUX, "Sugli integrali di Laplace asintotici", Atti Accad. N az. Lincei, Rend. Cl. Sci. Fis. Math. , 6 (29) (1939) 396-402.

Stratified analysis? Karel Hrbacck*

It is now over forty years since Abraham Robinson Tealizcd that " the con cepts and methods of Mathematical Logic aTe capable of pmvid·ing a suitable .frnmewoTk .fo-r the development of the D·iffemntial o.nd Jntegml Calc·u.l·us by means of infinitely small and i1Jfin'itely laTge numbeTsn ( Robinson [29] , Intro

duction. p. 2). The mag11itude of Robinson's achievement. cannot be overstated. Not only docs his framework allow rigorous paraphrases of many arguments of Leibuiz, Euler a.nd other mathematicians from the classical period of calcu lus; it has enabled the development of entirely new, important mathematical techniques and constructs not anticipated by the classics. Researcl1ers work ing with the methods of nonstandru·d analysis have discovered new significant results in diverse areas of pure aud applied mathematics, from number theory to mathematical physics and economics. It seems fair to say, however, that acceptance of "nonstandard" methods by t.he lcu·gcr mathematical commnnit;y lags far behind their successes. [n particu lru·, the oft-expressed hope that infinitesimals would now replace the notorious c:-t5 mei,hocl in teaching calculus remains umoalized, in spite of notable efi'orts by Keisler [20] , Stroyan [311, I3enci and Di Nasso [4] , and others. Sociological reasons - the inherent conscrvativity of the mathematical community, t.he lack of a concentrated effort at proselytizing - are often mentioned as an explana t-ion. There is also the fact that "nonstandard" methods, at least in the form in which they are usually presented, require heavier reliru1ce on formal logic thru1 is customary in mathematics at large. While acknowledging much truth to all of the above, here I shall concentrate on another contributing difficulty. At the risk of an overstatement, it is this: while it is undoubtedly possible to do calculus by means of infinitcsimals in the Robinsonian framework, it does not seem possible to do calculus only by means of infinitesimals in it. In particular, the promise to replace the c:-6 method by the use of infinitesimals cannot be carried out in full. *

Department o f 1'vlathcmat.ics, The Chy College o f CUNY, New York, NY 1 0031 . khrbacek@ccny . cuny . edu

4. Stratified analysis?

48

In Section 4 . 1 I examine this shortcoming in detail, review earlier relevant work, and propose a general plan for extending the Robinsonian framework with the goal of remedying this problem and - possibly - diminishing the need for formal logic as well. Section 4.2 contains a few examples intended to illustrate how mathematical arguments can be conducted in this extended framework. Section 4.3 presents an axiomatic system in which the techniques of Section 4.2 can be formalized, and discusses the motivation and prospects for its further extension.

4. 1

The Robinsonian framework

Here and in the rest of the paper, by Robinsonian framework I mean any presentation of "nonstandard" methods that postulates a fixed hierarchy of standard, internal and, in most cases, also external sets. Thus the origi nal type-theoretic foundations of Robinson [29] , the superstructure method of Robinson and Zakon [30] ( also Chang and Keisler [6] ) , and direct use of ul trafilters a la Luxemburg [21] , as well as axiomatic nonstandard set theories like HST [13, 18, 19] or Nelson's 1ST [23] , are covered by the term, and the dis cussion in this section applies to all of them. I present the arguments in the "internal picture" employed in 1ST and HST; that is, for example, R denotes the set of all ( internal ) real numbers, and is referred to as the standard set of reals ; if needed, 0R denotes the external set of all standard reals. Superstruc ture afficionados would use *R for R and R for 0R. The same conventions apply to the standard set of natural numbers N and other standard sets. The paradigmatic: example below, the familiar nonstandaTd definition of continuity, illustrates the difficulty I am concerned about . Definition 4. 1 . 1

Let f

:

R ---+ R

be a standard function, and x

ER

a stan

dard real number. (i) f is continuous at x iff for all infinitesimal h, f ( x + h) - f ( x) is infinitesimal. (ii) f is (pointwise) continuous iff for all standard x E R, f is continuous at x. Explicitly: (\istx E R) (\i infinitesimal h) (! ( x+h) -f ( x) is infinitesimal) .

It is a basic and useful fact of nonstandard analysis that the notion of continuity of a standard function at a standard point defined above can be extended in a natural way to the notion of continuity at a nonstandard point, and that a standard continuous function f : R ---+ R is continuous at all x E R, even the nonstandard ones. But, what precisely does this mean in the Robinsonian framework, and how do we know that it is true? Certainly not by transfer! In the Robinsonian framework, for a statement about standard

49

4. 1 . The Robinsonian framework

obj ects to be transferable from the standard to the internal universe, all of its quantifiers have to range over standard sets; formally, it has to be of the form yst where y is an E-formula ( internal formula) . Definition 4. 1 . 1 is not of this form; the quantifier (V infinitesimal h) ranges over internal sets. Briefly, Definition 4 . 1 . 1 is not transferable. A naive attempt to transfer 4. 1 . 1 ( ii ) will likely produce something along the lines of

(Vx E R) (V infinitesimal h) (f( x + h) - f( x) is infinitesimal ) . As is well known. this statement is equivalent to ·uniform continuity of f ( for standard f) . How then do we arrive at our "basic and useful fact"? Every treatment of elementary nonstandard analysis has to answer this question somehow. More over, similar difficulties appear with derivatives, integrals -in fact, with every concept defined by nonstandard methods. There seems to be little explicit attention paid to this issue in the literature. Important exceptions are the writings of Peraire [25]-[27] , Gordon [ 1 1 1 2] , and Andreev's thesis [ 1 ] ; dis cussions of a number of points considered in this paper can be found there. I realized the crucial importance of this issue for teaching of nonstandard anal ysis during O'Donovan's talk in Aveiro. While describing his experiences with the nonstandard definition of derivative, O'Donovan recounted some questions his students typically ask: "Can we use this formula when x is not standard? When f is not standard?" The answer of course is NO -but what then are they supposed to use? After alL a standard function like sin x does have a derivative at all x!1 Three implicit responses applicable i n Robinsonian framework can b e dis cerned. ,

Response I ( Robinson [29] , Goldblatt [ 1 0] ) . Although Definition 4. 1 . 1 is not expressible by an E-formula, it is to an E-formula, namely, to the standar-d definition of continuity.

equivalent

Definition 4 . 1 .2 Let f : R ---+ R be a standard function, and x E R standard. f is continuous at x iff ( V8t c > 0) (38tcl > O) (V8t y E R) ( I Y - x i < c5 =?

l f(y ) - f( x) l < c ) .

The formula on the right side of Definition 4 . 1 . 2 is transferable, and yields a natural notion of continuity for all f : R ---+ R and all x E R that agrees with Definition 4. 1 . 1 for standard f and x. 1 I a m grateful t o

R.

O 'Donovan for many subsequent email exchanges that have been

extremely helpful in further clarification of the difficulties with using infinitesimals to teach calculus.

50

4. Stratified analysis?

Definition 4 . 1 . 3 Let f : R --+ R real. f is continuous at x iff

(Vc > 0) (3 el

>

be any (internal) function, and x E R any

O ) (Vy E R) ( I Y - x i < el ==;. i f(y) - f(x) i < c ).

The problem i s resolved, but at the cost o f a relapse t o the usual E-el definition of continuity, at the internal level. This response also illustrates one of the chief reasons why nonstandard methods have to rely so heavily on formal logic. In Definitions 4. 1 . l (i) and 4 . 1 . 2 we have two equivalent formulas, of which one is transferable and the other is not . Transferability is an attribute of formulas; it is a logical, metamathematical concept. Response II (Nelson [23] ) . First, we use Definition 4. 1 . l (i) to define continuity for standard j, x . Then we let C := s { (!, x) : f : R --+ R, x E R, j, x st andard, f continuous at x} (C is a standard set ) , and finally, for any (internal) f and x, define Definition 4. 1 .4 f

is

continuous at x

iff (! , x) E C.

If f is standard and continuous at all standard x E R then (vst x) ( (f. x) E C) , this formula is transferable, and gives (Vx) ( (!, x) E C ) , i.e . , f is continuous at all x E R, as desired. The problem here is that this ("somewhat implicit" [23] ) definition of con tinuity is completely divorced from the usual intuition (captured for standard r X, in different ways, both by the standard E-el definition and by the non standard definition using infinitesimals) : a function f is continuous at x if arguments "ncar" x yield values ..ncar" f(x). According to 4.1 .4, the im plicit meaning of the statement "f is continuous at x" for standard f and unlimited x is "x E 8{y E R : y standard, f continuous at y } " ; it is not re lated to the behavior of f near x in the sense of the order topology on R . Similarly, continuity of f(x) : = xv , where v E N i s unlimited, translates t o " v E 8{n E N : f (x) : = xn i s continuous }"; i.e., xv i s continuous "because" n xn is a continuous function for all finite n! Definition 4 . 1 . 4 can be decoded via Nelson's reduction algorithm; it then gives the usual E-el definition, at the internal level, as in Response I. We note that here too there are two equivalent formulas, one of which transfers and the other does not. Response III. This is a feasible response in an approach based on superstructures, even though I found no discussion of it in the literature. While considering it I switch to the asterisk notation.

51

4. 1 . The Robinsonian framework

We recall that the definition of (pointwise) continuity in terms of infinitesi mals can be generalized to standard f : T1 ----+ T2 , where T1 and T2 are arbitrary standard topological spaces. The internal open sets of * R form a base for an (external) topology on * R , the Q-topology. The meaning of "f is continuous at x " for internal f and x E * R can be given by "f is continuous at x in the Q-topology on * R . " As noted above, this concept has a nonstandard def inition, although applying it requires working with 0 ( * R ) in a "second-order" cnlargemcnt 2 of the superstructure that contains * R . Yet the difficulty is not resolved. In order to prove that the two definitions of continuity are equivalent for standard f and x, one needs to apply transfer to the (equivalent) standard definitions in terms of open neighborhoods, i.e., fall back on the c;-6 method, in topological disguise. The Robinsonian framework does not provide for direct transfer of nonstandard definitions. This response also begs the question, how do we know that ® ( * f ) is continuous, and so on. Clearly, an infinite sequence of consecutive enlargements would be needed. It seems that every attempt to define continuity ultimately has to be grounded on the E-6 method. As remarked above, the same difficulty appears with derivatives, integrals -in fact, with all standard concepts introduced by nonstandard methods. I sec it as a serious problem for the Robinsonian frame work, if not as a research tool, surely as a teaching tool and, fundamentally, as a satisfactory answer to the question about the place of infinitcsimals, and nonstandard objects in general, in mathematics. What is to be done? Contemplation of the three responses suggests some ideas. First, we need to abandon the fixed distinction between standard and nonstandard and be able to treat any (internal) object as if it were standard, and in this capacity subject to application of nonstandard definitions and theorems. This is the idea of relativization of standardness. Second, we need to be able to transfer properties described by arbitrary (external) formulas, not just E-formulas; for emphasis, I refer to this facility as general transfer. Both ideas have some history in the literature of nonstandard analysis. A definition of relative standardness seems to appear first in Cherlin and Hirschfeld [7] , although its model-theoretic roots can be discerned in [6] ; but the subsequent development occurred mostly in the axiomatic setting. Gor don [ 1 1] defined two notions of relative standardness in 1ST (one of them is essentially the same as in [7] ) . Wallet [24] proposed to use a binary relative 2

For some applications of "second-order" enlargements see Molchanov

[22] .

In the alpha

the * -embedding is defined for all sets, but * * w1-saturated, and monads in the Q-topology on * are trivial. theory of Benci and DiNasso

[4] .

R

(

R) is only

52

4.

Stratified analysis?

standardness predicate as a primitive in an axiomatic treatment, an idea that was developed systematically by Peraire in [25] . The notion of relative stan dardness stratifies the universe into levels of standardness. Both Gordon and Peraire give a nonstandard definition of continuity applicable to all f and x (see Definition 4.2.2), and numerous other examples (see also [ 1 , 12, 26, 27, 28] ) . Gordon's approach docs not work as smoothly for concepts whose definitions involve shadows, such as derivative , because standardization does not hold in full. A sufficiently strong (for this purpose) standardization docs hold in Peraire's theory RIST. Another difference is that, unlike Gordon's, Peraire's relative standardness predicate is a total preordering. Yet another stratified nonstandard set theory (not employing a binary relative standardness predi cate) was put forward by Fletcher [9] . None of [9, 11, 25] give an explicit formulation of transfer for more than just E-formulas. To my knowledge, the idea of (more) general transfer appears first in the work of Benninghofen and Richter [5] . The main result of [5] is a transfer theorem for a certain (complicated) class of E-st-formulas; Cutland [8] gives some simpler special cases. Although the class of transferable formulas is limited, it has led to interesting applications (see the proof of l'Hopital rule in Section 4.2, and [5, 32 , 33] ) . However, the idea of relative standardness is not explicit in [5] . Further discussion of the mutual relationship of these various approaches can be found in [16] . In my opinion, the decisive step needed to resolve the difficulty discussed above is to combine relative standardncss with general transfer. This step was taken by Peraire in [26] with his proof (in RIST) of "polytransfer," essentially, transfer for all formulas that do not quantify over levels of standardncss. Many standard concepts have satisfactory nonstandard, transferable definitions in RIST. Nevertheless, there are situations (see Example 4 in Section 4.2) where quantification over levels of standardness is both natural and necessary. More over, the need to single out the special classes of formulas to which principles of RIST apply increases reliance of the framework on formal logic. It is my belief that for a theory of the "nonstandard " to be fully satis factory, both foundationally and practically, all of its principles need to apply uniformly at all levels, and to all formulas. In addition to a complete resolution of the difficulty that is the subject of this discussion, such framework would also diminish the need for appeals to formal logic in practical work: all for mulas would be transferable, and equivalent formulas would have equivalent transfers. The axiomatic system FRIST presented in Section 4.3 achieves these objectives for internal sets. The examples in the next section show some of the power of internal methods extended by relativizcd standardncss and general transfer.

4.2.

53

Stratified analysis

4.2

Stratified analysis

This section gives several examples intended to illustrate "nonstandard" mathematics in stratified framework. The presentation is informal; an ax iomatic system FRIST in which all of the arguments in this section can be formalized is described fully in Section 4.3. Our basic assumption is that the universe of mathematical objects (= sets) is stratified by a binary relation � into "levels of standardness." The notation x � y is to be read "x is standard relative to y" or "x is y-standard", and it is a dense total preordering with a least element 0. The 0-standard sets are called simply standard; they form the lowest level of standardness. The class of x-standard sets is denoted §x . Let a be a set; relativization of standardness to level a consists in regarding §a , rather than §o , as the lowest level of standardness. A statement about standard sets is relativized to level a by replacing all references to "standard" with " a-standard" (more explicitly, by replacing "x is y-standard" with "x is

(y, a)-standard") .

The key principle that governs the stratified universe is general transfer:

All valid statements about standard sets remain valid when relativized to any level a. Mathematical practice proceeds by enriching the language with new defini tions. We make it a general rule that, whenever some standard notion (a new predicate or function) is defined for standard x in terms of some property of x, the definition of the notion is extended to all x by rclativizing the defining statement to level x (the definition has to be fully relativized -see Section 4.3) . In addition, we make the familiar assumptions that •

•

•

all the usual mathematical operations preserve standardness §o of all standard sets satisfies ZFC) ;

( the class

given any property of x and any standard set A, there is a standard set B, whose standard elements are precisely those standard x E A having that property (standardization) : for every level a :::J 0 there exist a-standard unlimited natural numbers (a much stronger idealization is available - see Section 4.3 - but this suffices for calculus) .

The examples that follow illustrate the formulation of relativizations and the general rule. Definition 4.2. 1

(i) r E R is a-limited iff l r l

<

n for some a-standard n E N .

4.

54

(ii) h E R is a-infinitesimal iff I h i

<

Stratified analysis?

� for all a-standard n E

N,

n of. 0.

(iii) r �a s iff r - s is a-infinitesimal.

(iv) For a-limited r E R, the a-shadow ofr, sha (r) , is the unique a-standard s E R such that r �a s . (Remark. As usual, every limited (i. e. , 0-limited) real has a unique 0-shadow, by standardization. By general transfer, every a-limited real has a unique a shadow.) EXAMPLE 1. Continuity (Gordon Definition 4. 1 . 1 relativizes as follows: Definition 4.2.2

Let f

:

R ---+ R

[11, 12] ,

Peraire

[25, 27] ) .

and x E R.

(i) f is continuous at x iff for all (!, x) -infinitesimal h, f ( x + h) - f ( x) is (! , x) -infinitesimal. (ii) f is (pointwise) continuous iff it is continuous at all f -standard x. Ex plicitly: for all f -standard x and all (!, x) -infinitesimal h, f ( x +h) - f ( x ) is (!, x ) -infinitesimal. The closure of (!, x)-standard sets under set-theoretic operations (due to transfer of this property from §o ) implies that f and x are (!, x)-standard. In particular, for !-standard x, " (! , x)-standard" is equivalent to "!-standard", and we have: • f is continuous iff for all !-standard x and all /-infinitesimal h, f(x + h) f ( x) is f- infini tesirnal. If f is a-standard, transfer gives: f is continuous iff it is continuous at all a-standard x E R. But every x E R is a-standard for some a :::J j, so we have also: f is continuous iff it is continuous at all x E R. This reasoning is a special case of a general global transfer principle (Sec tion 4.3, Proposition 4.3. 1 ) . I n contrast with • , relativized definition of uniform continuity is :

Let f R ---+ R. f is uniformly continuous iff for all x and all ! -infinitesimal h, f(x + h) - f(x) is !-infinitesimal.

Definition 4.2.3

EXAMPLE 2. Derivative (Peraire [27] ) . Relativization of the usual nonstandard definition of derivative gives

4.2.

55

Stratified analysis

f is differentiable at x iff there f (x+h2-f (x) L is (!, x) -infinitesimal, sh(f,x) e(x+h2-f (x) ) .

Let f : R --+ R and x E R . is an (!, x) -standard L E R such that for all (!, x) -infinitesimal h of. 0. If this is the case, f' (x) := L = Definition 4.2.4

-

We next give two proofs of an elementary result from calculus, in order to illustrate two styles of work that are supported by the stratified framework.

is differentiable at x then f is continuous at x . Pro of 1 . B y Definition 4.2.4, for any (!, x)-infinitesimal h, f(x + h) f (x) = Lh + kh where k is (!, x)-infinitesimal. The usual arguments show that the product of an (!, x)-limited real and an (!, x)-infinitesimal is an (!, x)-infinitesimal, and that the sum of two (!, x) -infinitesimals is (!, x)

Proposition 4.2.5 If f

D

infinitesimal.

Pro of 2. For any standard f and x and any infinitesimal h, f ( x + h) - f ( x) = Hence a standard function f differentiable at a standard x is continuous at x. By transfer, any function f differentiable at any x is continuous at x. D

L h + kh and Lh + kh is infinitesimal.

The style of the first proof is to give the argument uniformly for all f, x. The advantage is that there is no explicit evocation of transfer; the disadvantage is the need to keep track of the level (!, x) throughout the argument . The second proof is Robinson's, i.e. , for standard f, x; followed by transfer of the result to all f, x. The formula being transferred is not an E-forrnula, but it is easily seen that our general transfer applies to it. This is another instance of the global transfer principle from Proposition 4.3.1 in Section 4.3: any state ment that invokes relative standardness only via previously defined standard notions and is valid for all standard sets remains valid for all (internal) sets. One can, if one so chooses, work in stratified analysis exactly as one would in the Robinsonian framework; that is, give the nonstandard definitions and proofs for standard arguments only. This is what we do in the remaining examples. But in the stratified framework, all such definitions and proofs au tomatically transfer to definitions and proofs that are meaningful and natural for all arguments (as long as no essential use is made of external sets) . Stratified analysis also provides opportunities for proofs and constructions that are not readily available in the Robinsonian framework. They have not been much explored as yet; two examples of what is possible are given below. First we list some general results about infinitesimals. Lemma 4.2.6

(a)

If x E R

is a-infinitesimal and (3 � a then x is (3 -infinitesimal.

4. Stratified analysis?

56

(b) Every a-limited natural number is a-standard. (c) If y is infinitesimal then there is an infinitesimal x such that y is x infinitesimal. Pro of. (a) is trivial from transitivity of � and (b) is just transfer (to level a ) ' of the well-known fact that every limited natural number is standard. (c) is a consequence of (b) and density of � · Let y be a positive infinitesimal and let v E N be such that v ::; � < v + 1 . Then 0 C v; we fix a such that 0 C a C v. By (b) , v is 0'- unlimited and so y < � for all a-standard n. So y is a-infinitesimal, hence x-infinitesimal for any a-standard infinitesimal x. D EXAMPLE 3. l'Hopital Rule Proposition 4.2. 7

(Bcnninghofcn and Richter [5] ; sec also [12] ) .

( l 'H6pital Rule)

. (x) _- d . _- d E R th en l.2m ' x _,a fg' (x) ' x _,a fg(x) /'f l lmx _,a I g ( X ) 1 - 00 and l.2m ' (x) Pro of. It suffices to prove the proposition for standard j, g, a. d. Also, w.l.o.g. we can let a 0 (replace x by x - a) . Let x be infinitesimal and y be x infinitcsimal. By Cauchy ' s Theorem, there is T) between x and y (hence, T) is f (x) f' (TJ ) � d. Now factor (y)--g(x) infinitesimal) such that fg(y) g( ) =

=

d

� �

f (y) -f (x) - f (y) -f (x) g(y) g(y) -g(x)

x

' TJ

g(y) f (x ) g(y) -g( x ) and observe that g(y)

�

0, gg ((x )) y

�

0.

=

(limx_, O lg(x) l oo implies that for all infinitesimal z, g(z) is unlimited. By transfer to x-level, for all x-infinitesimal z, g(z) is x-unlimited. As y is xinfinitesirnal, ��\ and ��\ are x-infinitesirnal.) It follows that the first factor is infinitely close to �/t\ and the second to 1. From properties of infinitesimals we conclude that ��t\ � d. By Lemma 4.2.6(c) , every infinitesimal y is x-infinitesimal for some in finitesimal x. Hence ��t\ � d holds for every infinitesimal y, and we arc donc.D

Remar-k. The assumption of density of levels allows for a simpler argument than that of [5, 12] , but it is not essential. EXAMPLE 4. Higher derivatives.

=

We assume that j, x arc standard and f' (y) exists for all y � x. If f" ( x) L E R exists, then L � f (x+ 2h) - 2hJx + h)+f (x) holds for all h � 0, h of. 0. However, the converse of this statement is false; existence of a standard L E R with the above property does not imply that f"(x) exists. In the stratified framework, we can give a description of f" ( x) in terms of values of j, analogous to Definition 4.2.4, using two levels of standardness.

4.2. Stratified analysis

57

Proposition 4.2.8 Assume that f and x are standard and f' (y) exists for all y � x. Then f" ( x) exists iff there is a standard L E R such that

! ( x- +_h_o_+_h_1-'--)_ x_ f (-'x) --'-_!-'-(x_+_ho--'-)_-_f___c(_ +_h___lc)_+_ L � _-'-_ hoh1

for all ho � 0, h1 �h o 0, ho, h1 of. 0. If this is the case, f" (x) = L. As usual in the stratified framework, the proposition holds for arbitrary f and x provided � is replaced by �(f. x) and "standard" by " (!, x)-standard." Pro of. We assume that f' (y) exists for all y � x; in particular, f' ( x) and f' ( x + ho) exist. Hence

f (x+ho+h�; - f (x+ho)

=

J'( x + ho) + k1 , f (x+h�; - f (x)

=

J'(x) + k2 ,

where k1 , k2 � h 0. From this we get o

f (x+hl)+ f (x) = f' (x+ho)- f' (x) + (k1-k2) · Q ·· = f (x+ho+hl)- f (x+ho)ho ho hoh 1 We note that ( k1 - k2 ) � ho 0 and hence ( k 1f:ok2 ) � ho 0, by transfer to the level ho of the fact that a quotient of an infinitesimal by standard real of. 0 is infinitesimal. In particular, ( k 1f:ok2 ) � 0. If L f"(x) exists, we have f' ( x+h�� - f' ( x ) L + k for k � 0 and hence Q � L. Conversely, if Q � L E 0R for all ho, h 1 as above, we have f' (x+ h�� - f' (x) � D L for all ho � 0, ho of. 0, and hence J"(x) exists and equals to L. By induction we get a characterization of j (n l (x) valid for any standard a

:=

=

n E N.

Assume that f and x are standard and j (n - 1 l (y) exists for all y � x. Then j ( nl (x) exists iff there is a standard L E R s11ch that Proposition 4.2.9

holds for all (ho, . . . , h n - 1) , where i = ( i o , . . . , in - 1 ) E {0, l} n , hi k : = h k if i k = 0, hi k := 0 if i k = 1;n ho � 0, h k �h k - l 0 for 0 < k < n, and all h k of. 0. If this is the case, j ( l (x) L. =

General transfer implies that Proposition 4.2.9 holds for all n, even the unlimited (hyperfinite) ones, provided � is replaced everywhere by �n and

58

4.

Stratified analysis?

"standard L" by "n-standard L". As there are functions that have derivatives of all orders, there have to exist "strongly decreasing" sequences of infinitesimals of any hyperfinite length n: (ho, . . . , hn - 1 ) where each h k is h k 1 -infinitesimal. Such constructions are not available in RIST or in any other framework for nonstandard mathematics. -

4.3

An axiomatic system for stratified set theory

The theory FRIST (Fully Relativized Internal Set Theory) presented here is formalized in first-order logic with equality and two primitive binary predi cates, E (membership) and � (relative standardness) . It postulates the axioms of ZFC (the schemata of separation and replacement for E-formulas only) , and (\ix) (x � x) , (\ix, y, z) (x � y 1\ y � z x � z) , (\ix, y)(x � y V y � x), (\ix ) ( 0 � x), (::Jx) ( -, x � 0 ) . Thus � is a nontrivial total preordering with a least element 0 = 0; x C: y reads "x is standard relative to y" or "x is y standard." We also write x C y for x � y 1\ -, y � x, and postulate that � is ___,

dense: (\ix, y) (x C y ___, (3z) (x C z C y) ) . Let a be a set; we let §a := { x : x C: a} be the class of all a-standard sets; in particular § := §o = { x : x C: 0} is the class of standard sets. If a

is a list a 1 , . . . , an, the statement that X is a-standard is shorthand for " X is ( a 1 , . . . , an)-standard"; it is easy to see that (in FRIST) this is equivalent to "x is /)-standard" for p := max {a 1 , . . . , an } · For any a let x C:a y = ( x � a 1\ y � a ) V ( x C: y). This "relativized" relative standardness predicate treats §a as the "standard" (i.e. "level 0") universe, and keeps the higher levels unchanged. Let be any E-�-formula. a denotes the relativization of to level a, the formula obtained from by replacing each occurrence of � by �a · Informally, each occurrence of "x is y-standard" is replaced by "x is (y, a) standard." Clearly x � 0 y is equivalent to x � y, and <1> 0 to . We now state the principal axioms of FRIST. Transfer:

For all a, (\ix E §0 ) ( 0 (x) __, a (x) ).

For all x, (\ix E §o) (::Jy E §o) (\iz E §o) (z E y

Standardization:

Idealization: For all 0 C a , A, B (\fa E Afin n §o) (::lx E B) (\iy E a)

<-+

Z

E x 1\ 0 (z, x, x) ) .

E §o and x, a(x, y, x ) (3x E B) (\iy E A n §o) a (x, y, x). __,

4.3. An axiomatic system for stratified set theory

59

Transfer captures the idea of full relativization: whatever is true about the standard universe §o is also true about each relativized standard universe §a · Precisely. for any a and any statement of the E-c;;;-language, if FRIST f- then also FRIST f- a. In particular, it follows that we can replace 0 by any a in standardization, and by any (3 C:::: a in the other two schemata. It is also easy to show that each §a satisfies ZFC. Details of these and other results about FRIST, as well as a discussion of its relationship to RIST and 1ST , can be found in [15] . We prove a version of transfer from the standard universe to the entire internal universe. Let (x) be an E-c;;;-forrnula. We define a new predicate P(x) by postulating P( x) x ( x) , and say that the definition of P is fully relativized if it has this form. We note that for any a and any x E §a , P(x) ......, a (x); in particular, P(x) ......, (x) for standard x. Let the definitions of P1 , . . . , Pn be fully relativized. Given any formula w(x) in the E-P-language, we denote the formula obtained by restricting all quantifiers in 1]! to §a by §a F= 1]! ( x) . =

(Vx E §a ) [ ( §a F= w (x)) ......, w (x)] . is a formula in the E-P-languagc, we let w (a ) be the formula

Proposition 4.3.1

( Global Transfer)

If 1]! obtained from 1]! by replacing (first) each occurrence of (::lx) [(Vx), resp.] in 1]! by ( ::Jx E §a) [ (Vx E §a), resp.] , and (then) each occurrence of P, (i) by i (x) . We prove that for all a and all x E §a , (§a F= 1Ji (x)) ......, w(a l (x) ......, 1J! (x), by induction on complexity of W. If w(x) is Pi (x), the claim follows from the definitions and remarks above. If W (x) is "xi E x/' or "x� = x/' the assertion is trivial, as are the induction steps corresponding to logical connectives. Assume that, for all (3 and x, y E §;3 , (§;3 F= 1]! ( x, y ) ) ......, 1]! (;3) ( i, y ) ......, Pro of.

w (x, y ) .

Let x E §a ; we then have (§a F= (Vy ) W (x, y ) ) ......, ( Vy E §a) (§a F= 1Ji (x, y )) ......, (Vy E §a)w( a l (x, y ) ......, [(Vy )w (x, y ) J (a l . By transfer of the penultimate statement to level (3 we get its equiva lence with (V(J :::J a) (Vy E §;3) [w (a l (x, y )] J3 . It is easily seen that, for a c;;; (3, [w (a)] ;3 ......, 1]! (;3 ) ; hence the last statement is further equivalent to (\1(3 :::J a ) (Vy E §;3 )w(J3l (x, y ) ......, (\1(3 :::J a) (Vy E §;3 ) 1J! (x , y ) ......, (Vy ) 1J! (x, y ). (For the last step, note that for every y there is (3 :::J a such that y E §;3 ; e.g. (3 (a, y ) . ) D =

Corollary 4.3.2

For any A, p, {x E A : 1J! (x, A, p) } is a set in §( A, p) ·

Fix a so that A, p E §a . Standardization into §a yields a set B E §a such that §a F= (Vx) (.T E B X E A (\ w(x, A, p)) . By global transfer, X E A (\ 1]! ( X ' A. p) ) . D ( VX ) ( X E B

Pro of.

.......

.......

4. Stratified analysis?

60

This corollary shows that we can use fully relativized predicates in def initions of sets without fear of encountering external sets. For example, f' is a standard function whenever f is, even though we use infinitesimals to define f' (x) . The consistency of FRIST is established by the following theorem, whose proof can be found in [15] , Theorems 4.7, 5.1, 5.2. Theorem 4.3.3 FRIST is

a conservative extension of ZFC. In fact, FRIST has an interpretation in ZFC, in which the class of stan dard sets is ( definably) isomorphic to the class of all ( ZFC) sets.

An interpretation of FRIST in ZFC involves a new method of iterating ultrapowers, where the stages of the iteration used to obtain the final universe *V are not indexed by any a priori given linear ordering (A, :S: ) , but by (* A, :S: * ) , a linear ordering constructed from (A, :S:) simultaneously with *V. FRIST differs from the theory of the same name presented in [15] in two ways. 1 . Idealization has been weakened to "bounded idealization" (see FRBST of [15] ) . Foundational reasons for this move arc discussed at length in [16] in the context of 1ST vs. BST. Without some such weakening of ideal ization, the second part of Theorem 4.3.3 does not hold. 2. The postulate of density of levels has been added. This property holds in models from [15] based on A = Q or any other dense total ordering. There are foundational reasons that seem to favor this assumption, and it also has practical uses (sec the proof of l'Hopital rule') . Like 1ST, FRIST is a theory of internal sets. Nelson [23] exhibited a 1ST ; it takes every (bounded) E-st-formula to an E formula that is equivalent to it for all standard values of the free variables. The question arises, whether such algorithm is also available for FRIST. On one level, the answer is YES; such algorithm is provided by the interpretation of FRIST in ZFC from the proof of Theorem 4.3.3. On the other hand, E formulas yielded by this algorithm are far too complicated to be helpful with practical work. It is possible that a simpler, more natural reduction algorithm can be given, but this matter is still under investigation. As demonstrated by Nelson and other adherents of 1ST, "internal" meth ods, cleverly used, suffice for a large area of "nonstandard" applications. Nev ertheless, I have always maintained [13, 14, 16] that a truly comprehensive nonstandard set theory has to incorporate external sets as well. There are nu merous constructs (nonstandard hulls, Loeb measures, . . . ) where external sets are not just a convenient bookkeeping device but the object of interest per se; a foundational framework that docs not allow these constructs can hardly be

reduction algorithm for

References

61

universally acceptable to practitioners of "nonstandard" methods. The foun dational aspirations of nonstandard set theory also require external sets; "they are there," and have to be accounted for. The guiding "maxim" of this paper, to wit, that all principles should apply uniformly at all levels and to all formulas, gives yet another reason why external sets are necessary. In FRIST , transfer, standardization and idealization do indeed satisfy it , but the axioms of ZFC do not! In particular, separation applies in FRIST only to E-formulas. As soon as we attempt to extend it to all formulas, we introduce external sets (e.g. the set of standard integers { n E N : n c;;; 0}). A thorough discussion of various axiomatic nonstandard set theories for ex ternal sets and of the difficulties they face can be found in Kanovei and Reeken's monograph [19] and in the survey article [16] . Perhaps the only researcher who considered an extension of the idea of relativization of standardness to external sets in an axiomatic framework was David Ballard. In [3] , Ballard proposed an axiomatic system EST, where external sets are allowed and any set can be regarded as "standard." EST was inspired by Fletcher ' s [9] , and employs neither the binary standardness predicate nor general transfer. After the pa per [15] was written, Ballard and I concurrently started to develop an extension of its ideas to external sets. David Ballard died unexpectedly in May 2004 in the midst of his work. My research on this topic is still in progress [17] ; re sults obtained so far indicate that an extension of FRIST to a theory that incorporates external sets is both possible and natural.

References [1] P . V . ANDREEV, The notion of relative standardness in axiomatic systems of nonstandard analysis, Thesis, Dept. 1Iath. Mech., State University of N.I. Lobachevskij in Nizhnij Novgorod, 2002, 96 pp. (Russian ) . [2] P . V . ANDREEV. "On definable standardness predicates in internal set theory", Mathematical Notes, 66 ( 1999) 803 - 809 (Russian ) . [3] D . BALLARD, Foundational Aspects of "Non" standard Mathematics, Con temporary l\Iathematics, vol. 176, American Mathematical Society, Prov idence. R.I., 1994. [4] V. BENCI and M . Dr NASSO, "Alpha-theory: an elementary axiomatics for nonstandard analysis", Expositiones Math., 21 (2003) 355-386. [5] B . BENNINGHO FEN and M . M . RICHTER, ··A general theory of superin finitesimals", Fund. Math., 123 (1987) 199-215. [6] C.C. CHANG and H . .J . KEISLER, Model Theory, 3rd edition, North Holland Publ. Co., 1990.

4. Stratified analysis?

62

[7] G . CHERLIN and J . HIRSCHFELD, Ultrafilters and ultraproducts in non standard analysis, in Contributions to Non-standard Analysis, ed. by W.A.J. Luxemburg and A. Robinson, North Holland, Amsterdam 1972. [8] N . J CUTLAND, "Transfer theorems for Jr-monads". Ann. Pure Appl. Logic, 44 ( 1989) 53-62. .

.

[9] P. FLETCHER, "Nonstandard set theory", J. Symbolic Logic, 54 ( 1989) 1000-1008. [10] R. GoLDBLATT, Lectures on the Hyperreals: An Introduction to Nonstan dard Analysis, Graduate Texts in Math. 188. Springer-Verlag, New York, 1998. [ 1 1] E.I. GORDON, "Relatively nonstandard elements in the theory of inter nal sets of E. Nelson", Siberian Mathematical Journal, 30 (1989) 89-95 ( Russian ) . [12] E.I. GORDON, Nonstandard Methods in Commutative Harmonic Analysis, American Mathematical Society, Providence, Rhode Island, 1997. [13] K. HRBACEK , ·· Axiomatic foundations for nonstandard analysis·· , Funda menta Mathematicae, 98 (1978) 1 - 19; abstract in .J. Symbolic Logic, 4 1 (1976) 285. [14] K . HRBACEK, "Nonstandard set theory", Amer. Math. Monthly, 86 ( 1979) 1 - 19. [15] K. HRBACEK , Internally iterated ultrapowers, in Nonstandard Models of Arithmetic and Set Theory, ed. by A. Enayat and R. Kossak, Contempo rary Math. 36 1, American Mathematical Society, Providence, R.I.. 2004. [16] K . HRBACEK, Nonstandard objects in set theory. in Nonstandard Methods and Applications in Mathematics, ed. by N.J. Cutland, M. Di Nasso and D .A. Ross, Lecture Notes in Logic 25, Association for Symbolic Logic, Pasadena, CA. , 2005, 41 pp. [17] K . HRBACEK , Relative Set Theory, work in progress. [18] V. KANOVEI and l\ I . REEKEN, "Internal approach to external sets and universes", Studia Logica, Part I, 5 5 (1995) 227-235; Part II, 55 ( 1995) 347-376: Part III, 56 (1996) 293-322. [19] V. KANOVEI and M. REEKEN, Nonstandard Analysis, Axiomatically, Springer-Verlag, Berlin, Heidelberg, New York, 2004.

References

63

[20] H . J . KEISLER, Calculus: An Infinitesimal Approach, Prindle, Weber and Scmidt, 1976, 1986. ,

[21] W . A . .J . L UXEMBURG A general theory of monads, in: W.A.J. Luxem burg, ed., Applications of Model Theory to Algebra, Analysis and Proba bility, Holt, Rinehart and Winston 1969. [22] V.A. M oLCHANOV, "On applications of double nonstandard enlargements to topology", Sibirsk. l\Iat. Zh. , 30 (1989) 64-71. [23] E. NELSON, "Internal set theory: a new approach to Nonstandard Anal ysis", Bull. Amer. Math. Soc., 83 ( 1977) 1 165- 1 198. [24] Y. PERAIRE and G. WALLET, "Une theorie relative des ensembles in ternes", C.R. Acad. Sci. Paris, Ser. I, 308 (1989) 301 -304. [25] Y . P ERAIRE, "Theorie relative des ensembles internes", Osaka Journ. Math., 29 ( 1992) 267-297. [26] Y. PERAIRE, "Some extensions of the principles of idealization transfer and choice in the relative internal set theory", Arch. Math. Logic, 34 (1995) 269-277. [27] Y. P ERAIRE, "Formules absolues dans la theorie relative des ensembles internes", Rivista di l\Iatematica Pura ed Applicata, 19 (1996) 27-56. [28] Y. P ERAIRE, "Infinitesimal approach of almost-automorphic functions", Annals of Pure and Applied Logic, 63 ( 1993) 283-297. [29] A. RoBINSON, Non-standard Analysis, Studies in Logic and the Founda tions of Mathematics, North-Holland. Amsterdam, 1966. [30] A. RoBINSON and E. ZAKON, A set-theoretical characterization of en largements, in W.A . .J. Luxemburg, ed . . Apphcations of Model Theory to Algebra, Analysis and Probability, Holt, Rinehart and Winston 1969. [31] K . D . STROYAN, Foundations of Infinitesimal Calculus, 2nd ed. , Academic Press 1997. [32] K . D . STROYAN, B . BENNINGHOFEN and M . l\1 . RICHTER, "Superin finitesimals in topology and functional analysis". Proc. London Math. Soc., 5 9 ( 1989) 153-181. [33] K . D . STROYAN , Superinfinitesimals and inductive limits, in Nonstandard Analysis and its Applications, ed. by N. Cutland, Cambridge University Press, New York, 1988.

ERNA at work * ** C. Impens and S. Sanders

Abstract

Elementary Recursive N on standard Analysis . i n short ER.NA , is a con structive system of nonstandard analysis proposed around 1995 by Chuaqui, Suppes and Sommer. It has been shown to be con sistent and. without stamlard part ftmctiou or continuum, it allows major parts of analysis to be develop ed in an applicable form. vVe briefly discuss ER.N A's foundations and use them to prove a supremum principl e and provide a square root fuuctiou, b oth up t;o iufinit;esimals.

5.1

Introduction

Hilber·t 's Program, proposed in 1921, called for an axiomatic formalization of mathematics, together with a proof that this axiomatization is consistent. The consistency proof itself was to be carried out using only what Hilbert called finitary methods. The special chamct:er of finitru·y reasoning then would justify classical mathematics. In due time, many chmadcrized Hilbert's infor mal notion of '.finitru-y' as that which can be formalized in Primitive Recursive Arithmetic (PRA) , proposed in 1923 by Skolem. In PR.A one finds (a) an absence of explicit quantification, (b) an ability to define primitive recursive functions, (c) a few rules for handling equality, e.g. , substitution of equals for equals, (d) a rnle of instantiation, and ( e) a simple induction principle.

Godel's second incompleteness theorem (1931) it became evident that 7Jar·tial realizations of Hilbert's prognun are possi ble. The system pro posed by Chuaqui and Suppes is sueh a partial realization, in that it pro vide� ru1 a.xiomatic foundation for basic analysis, with a PRA consistency proof ([1], p. 123 and p. 130) . Sommer and Suppes's improved system al lows definition by recursion (which does away with a lot of explicit axioms) By

only

•Department of Pure Mathcmatk-'3 and Computer Algebra, Universi l.y of Gent, Delgiw11. ··

ci@cage . ugent . ac .be sasander@cage . ugent . be

5.2. The system

65

and still has a PRA proof of consistency ([2] , p. 21 ) . This system is called Elementary Recursive Nonstandard Analysis, in short ERNA. Its consistency is proved via Herbrand's Theorem ( 1930) , which is restricted to quantifier-free formulas Q(x1 , . . . , xn ) , usually containing free variables. Alternatively, one might say it is restricted to universal sentences (\ix l ) . . . (\ixn )Q(x l , . . . , Xn ) ,

obtained by closing the open quantifier-free formulas by means of universal quantifiers. Herbrand's theorem states that, if a collection of such formulas resp. sentences is consistent, it has a simple 'Herbrand' model and, if it is not, its inconsistency will show up in some finite procedure. Herbrand's theorem requires that ERNA's axioms be written in a quantifier free form. As a result, some axioms definitely look artificial; fortunately, the orems don't suffer from the quantifier-free restriction. Calculus applications of ERNA have been, so far, scarce and sketchy. Thus, [3] contains an outline of an existence theorem for first-order ordinary differential equations, relying on the property, stated without proof, that a continuous function on a compact interval is bounded. As part of a less anec dotical approach we will provide an ERNA version of the supremum principle and deduce from it a square root function. Both results hold up to infinitesi mals; as ERNA has no standard part function, it is intrinsically impossible to do better. 5.2

The system

The system we are about to describe was first presented in [2] , and all our undocumented results are quoted from that paper. The foundations are also exposed, in a more informal manner, in [ 3] . Notation 5 . 2 . 1 N

=

{ 0, 1 , 2, . . . } consists of the (finite) integers.

Notation 5.2.2 x stands for some finite

(possibly empty) sequence (x 1 , . . . , xk ) .

denotes a term in which x = (x 1 . . . . , xk) is the list of the distinct free variables.

Notation 5.2.3 T(x)

5. 2 . 1

The language

•

connectives:

•

quantifiers: \i , 3

/\,

'

, V, ---+ , --+

5. ERNA at work

66 • •

an infinite set of variables relation symbols: 1 =

binary x y binary x � y unary I(x) , read as 'x is infinitesimal', also written 'x � 0 ' unary N(x) , read as 'x is hypernatural ' . •

individual constant symbols:

0 1 c (The Axiom 3 (6) of 5.2.2 shows that c denotes a positive infinites

imal.) - w (The axioms 3 (7) and 2 (4) of 5.2.2 show that w = 1/c denotes an infinite hypernatural.) - I , read as 'undefined'. Notation 5.2.4

1;o =n •

'x is defined' stands for 'x -/=- I '. (Examples: 1/0 is undefined,

function symbols: 2 (unary) 'absolute value' lxl, 'ceiling ' Ixl , 'weight ' llxll · (For the meaning of llxll , sec Theorem 5.2.3.) (binary) x + y , x - y, x · y, xjy, xAy. (Axiom set 6 and Axiom 12 ( 4) of 5.2.2 show that x A n = xn for hypernatural n , else undefined.) for each k E N, k k-ary function symbols Kk.i (i 1 , . . . , k) . (The Axiom schema 7 of 5.2.2 shows that Kk, i (x) are the projections of the k-tuple x. ) for each formula rp with m + 1 free variables, without quantifiers or terms involving min. an m-ary function symbol minp. (For the meaning of which, see Theorem 5.2.6 and Theorem 5.2.7.) for each triple (k, a(x 1 , . . . , Xm), T(x l , . . . , Xm + 2 )) with 0 < k E N, a and T terms not involving min, an (m+ 1 )-ary function symbol rec�T ' (Axiom schema 9 of 5.2.2 shows that this is the term obtained from a and T by recursion, after the model f(O, x) a(x) , f(n + 1 , x) T(j(n, x), n , x) , if terms are defined and don't weigh too much.) =

=

1 2

For better readibility we expre�s the relations in We denote the values as computed in

x

or

(x, y)

x or

in

(x, y).

according to arity.

according to the arity.

=

5.2. The system 5.2.2

67

The axioms

Axiom set 1

(Logic) . Axioms of first-order logic.

Axiom set 2

(Hypernaturals).

1.

0 is hypeTTwtuml; 2. if x is hypernatuml, so is x + 1; 3 . if x is hypernatuml, then x � 0; 4 - w is hypernatural. 'x is infinite ' stands for 'x -/= 0 1\ 1/x � 0 '; 'x is fi nite ' stands for 'x is not infinite '; 'x is natural ' stands for 'x is hypernatural and finite '.

Definition 5.2.1

Axiom set 3

(Infinitesimals) .

1.

if x and y are infinitesimal, so is x + y; 2 . if x is infinitesimal and y is finite, xy is infinitesimal; 3. an infinitesimal is finite; 4 - if x is infinitesimal and I Y I :::; x, then y is infinitesimal; 5. if x and y are finite, so is x + y; 6. is infinitesimal; 7. 1/ w . c;

c =

(Ordered field) . Axioms expressing that the elements, with I excluded, constitute an ordered field of characteristic zero with absolute value function. These include (quantifier-free)

Axiom set 4

•

• •

=

=

if x is defined, then x + 0 0 + x x; if x is defined, then x + ( 0 - x ) = ( 0 - x ) + x = 0; if x is defined and x -/= 0, then x (1/x) (1/x) x ·

=

·

=

1.

(Archimedean). If x is defined, Ixl is a hypernatural and 1 < x :::.: lxl x l l-

Axiom set 5

Theorem 5 . 2 . 1

If x is defined, then Ix l is the least hypernatural � x.

Theorem 5.2.2

x is finite iff there is a natural n such that l x l

:::;

n.

5. ERNA at work

68 =

Proof. The statement is trivial for x 0. If x -/= 0 is finite, so is lxl because, assuming the opposite, 1/ l x l would be infinitesimal and so would 1/x be by axiom ( 4) of set 3. By axiom (5) of the same set, the hypernatural llxll < l xl + 1 is then also finite. Conversely, let n be natural and lxl :::; n. If 1/ l x l were infinitesimal, so would 1/n be by axiom (4) of set 3 , and this contradicts the assumption that n is finite. D Corollary 5 . 2 . 1 Axiom set 6 1.

2.

x � 0 iff l x l < 1/n for all natural n ? 1 .

(Power) .

if x -/= I, then xA O = 1 ; if x -/= I and n is hypernatural, then xA( n + 1) = (x n) x . A

Axiom schema 7

·

(Projection) .

If x 1 , . . . , Xn are defined, then 1fn , � (x l , . . . , Xn ) = X; for i = 1 , . . . , n. Axiom set 8 1. 2. 3.

(Weight) .

If ll x ll is defined, then ll x ll is a nonzero hypernatural. If l x l = m/n :::; 1 (m and n -/= 0 hypernaturals), then ll x ll is defined, ll xll - lxl is hypernatural and ll x ll :::; n. If lxl = m/n ? 1 (m and n -/= 0 hypenwturals), then ll xll is defined, llxll/lxl is hypernatural and ll x ll :S m .

Definition 5.2.2

hypernatural.

A hyperrational is of the form ±pjq, with p and q -/= 0

Theorem 5.2.3 1. 2.

If x is not a hyperrational, then llxll =1. If x is a hyperrational, say x = ±p/ q with p and q -/= 0 relatively prime hypernat11rals, then II ± p/qll

=

max{ I P I , l q l }.

Remark. In both statements of this theorem, the antecedent can be expressed in a quantifier-free way, but the whole sentence cannot. (This explains why it is a theorem and not part of the axioms.) For instance, N(p) ---+ -.N(plxl) expresses 'x is not hyperrational'. Theorem 5.2.4 1 . II O II

=

1;

5. 2 . The system 2. 3.

4-

69 =

if n ? 1 is hypernatural, li n II n; if llxll is defined, then 11 1/x ll ll x ll and ll lxl l l l lxl l S ll x l l ; if ll x ll and II Y II defined, ll x + Y ll , ll x - Y ll , ll xy ll and ll x/y l l are at most equal to (1 + ll x ll ) ( 1 + II Y II) , and ll xAy ll is at most (1 + ll x ll t ( l + II Y II ) . =

=

Notation 5.2.5

For any 0 < n E N we write

Notation 5.2.6

For any 0 < n E N we write

2's Theorem 5 . 2 . 5 If T(x) is a term not involving w, exists a 0 < k E N such that n

Axiom schema 9

c:,

rec or min, then there

(Recursion) For 0 < k E N, a and T not involving min:

if this is defined, and has weight ::; 2 �:1'11 , if a (x) =I, otherwise. rec�An + 1 , x)

=

{

T(rec�An, x) , n, x) if defined, with weight ::; 2 �x, n+ l ll , if T(rec�T (n. X) , n, X) = I , 1

0

otherwise.

If a is constant , the list x is empty, and the weight requirements mentioned in this axiom schema are void. A few words concerning the restrictions included in this axiom schema. One of ERNA 's main advantages over the Chuaqui-Suppes system is, that it allows some form of recursion while preserving a finitary consistency proof. In achieving this, a crucial role is played by the weight function, introduced axiomatically but given explicitly in theorem 5.2.3. Recursion is an essential feature of PRA, and it is therefore impossible to prove inside PRA the con sistency of a system that has unrestricted recursion. ERNA's axiom schema 9 restricts recursion by truncating objects outgrowing the preset weight stan dard. In view of the huge bounds allowed, it seems unlikely that access to calculus applications will suffer from this restriction; computing weights is the price to be paid in practice.

5. ERNA at work

70

(Internal minimum) . For any quantifier-free formula rp( y , x) not involving min or I we have

Axiom schema 10 1.

min-�' ( x) is a hypematural number; 2. if min-P (x) > 0, then rp(min-P (x) , x) ; 3. if n is a hypematural and 'P ( x) ) then n,

min(£) ::; n and rp(min(x) , x) . -P -P Theorem 5.2.6 If the quantifier-free formula rp( y x) does not involve I or min, and if there are hypematural n 's such that rp( n , x) , then min-P (x) is the least of these. If there are none, min-P (x) 0. ,

=

Corollary 5.2.2

Proofs by hypematural induction.

Example 5 .2 . 1

The sum of two hypematurals is a hypematural.

Fix any hypernatural x. If the theorem is wrong, there exists at least one y with N(y) /v-.N(x + y). By Theorem 5.2.6, there is a least number with these properties, say yo. Then yo -/= 0 since x + 0 x (field axiom) and N( x) (assumption) . From yo -/= 0 , N(yo - 1) (hypernatural axiom) . By leastness, N(x + (yo - 1)). Hence (field axiom) N((x + yo) - 1) and finally N(x + yo) (hypernatural axiom) . This contradiction proves the theorem. D Proof.

=

(External minimum) . For any quantifier-free formula rp(y, x) not involving min, or E We have 1 . min 'I' ( x) is a hypematural number; 2. if min"' (x) > 0, then rp(min"' (x) . x); 3. if n is a natural numbeT, ll x ll is finite and rp( n , x) , then min'l'(x) ::; n and rp(min'l' (x), x) .

Axiom schema 1 1

Remark. I is

W

allowed in rp.

Let rp( y , x) a quantifier-free formula not involving min, w or If ll x ll is fimte and if there are natural n 's such that rp( n , x), then min'l' (x) is the least of these. If there are none, min'l' (x) 0.

Theorem 5.2.7

c.

=

Corollary 5.2.3

Proofs by natural induction.

Axiom schema 12 1.

0, 1,

w,

c

( (Un)defined terms) .

are defined;

5.2. The system

71

lxl, l x l , llx ll are defined iff x is; 3. x + y, x - y, xy are defined iff x and y are; x / y is defined iff x and y are and y -/= 0 ; 4. x A y is defined iff x and y are and y is hypernatural; 5. Kk � ( x l . . . . , x k ) is defined iff x1 , . . . , Xk are; , 6. if x is not a hypernatural, rec�A x, ff) is undefined; 7. min 'P (x1 , . . . , x k ) is defined iff x1 , . . . , Xk are. 2.

Theorem 5 . 2 . 8 (Hypernatural induction)

formula not involving min or I, such that

Let y(x) be a quantifier-free

1.

y(O) holds,

2.

the implication (N(n) 1\ y(n)) ---+ y(n + 1) holds.

Then y(n) holds for all hypernatural n. Suppose, on the contrary, that there is a hypernatural n such that -.y(n) . By Theorem 5.2.6, there is a least hypernatural no such that -.y(no). By our assumption (1) , no > 0. Consequently; y(no - 1) does hold. But then , by our assumption (2) , so would y(no). This contradiction proves the theorem. D Proof.

Example 5.2.2

The sum of two hypernaturals is a hypernatural.

Proof. Fix any hypernatural N and consider the formula N(N + x). Both N(N + 0) and N(N + n) ---+ N(N + n + 1) arc included in axiom set 5.2.2. Hence N(N + n) for every hypernatural n. D Example 5.2.3 Let y(n) be If no < n1 are hypernatu.rals then y(no) = y(n l ) .

a quantifier-free formula not involving min or I . such that no ::; n ::; n1 - 1 ---+ y(n) y(n + 1), =

The formula y(no) = y(no + x ) holds for x = 0. If y(no) = y(no + n) for any hypernatural no + n ::; n1 - 1, then also y(no) y(no + n + 1) by D assumption. Hence y(no) = y(no + n) for 0 ::; n ::; n1 - no. Proof.

=

For further usc we collect here some definable functions, being terms of the language that ( provably in ERNA) have the properties of the function.

1. The identity function id(x)

=

x is 1f l , l ·

72

5.

ERNA at work

2. For each closed term T and each arity k, the constant function

3. The hypersequence r(n) is rec�T with k =

L

•

{�

=

if n 0 if n ::::> 1

a = 0 , T = c2, 1 ·

4. The function

if X = 0 otherwise

is 1 + x - r ( I I x l l ) . 5.

The functions

h(x) are

{�

x + l x l an d 1 + 2 2( (x)

if X > 0 otherwise

and H(x)

( ( l x l) , respect1ve . · 1y. 2( (x )

6. The function

{�

if X 2;, 0 otherwise

if a < x :::; b otherwise

is h(x - a)H(b - x) . Likewise for the characteristic function of any other interval. 7.

For constants a, b and terms p, a, T, the function

d(x)

=

{

a(x) T(x)

if a < x :::; b and p(x) > 0 otherwise

is 1 ( a . b] (x) (h(p(x)) a(x) + ( 1 - h(p(x))) T(x) ) . Likewise for any other type of interval in a < x :::; b and /or any other inequality in p(x) > 0. Any such construction will be called a definition by cases. If no interval is specified, the terms p, a, T and the resulting function can have more than one free variable. The next theorem is to be considered as an ERNA version of the supremum principle for a set of type {x I f(x) < 0}.

5.2. The system Notation 5.2.7

73

We write a « b if a < b and a � b.

Let b < c be constants such that d : = c - b is finite. Further, let f ( x) be a term not involving I or min, such that f ( x) is never undefined for b 5:. x 5:. c. If f(c) ? 0, f(b) < O, then there is a constant 1 with the following properties: f(r) ? 0, iv. for every natural number n ? 1 there are x > 1 - 1/ n such that f(x) < 0. If f ( x) has the extra property Theorem 5.2.9

z.

zz.

zzz.

(f(x) < 0 1\ b < y < x)

�

f(y) < 0

(5. 1)

then 1 is, up to infinitesimals, the only constant > b with the properties (iii) and (iv). In order to apply recursion, we choose c as our term a and use definition by cases to obtain the term

Proof.

T (t ' n)

=

{

t - dj2 n t

if f(t - dj2n ) ? 0 otherwise.

Note that 'otherwise' is equivalent here to 'if f(t - dj2n ) < 0 ' , because we have excluded undefined values for f(x) . ERNA ' s unary function symbol rec�T for this particular a and T will be shortened to rec. Its properties can be stated simply as rec ( O ) c and rec ( n + 1) T ( rec ( n), n ) because undefined terms cannot occur, and there are no weight requirements because T has arity two. If we prove that for any hypernatural n the two properties =

=

f ( rec ( n )) ? 0 f(rec(n ) - d/2n - l ) < 0 ( n ? 1) hold, we are done. It suffices to take

d/2w - l � 0,

1

1 =

(5.2) (5.3)

rec ( w) and to note that, because

rec(w ) - - < rec ( w) - d/2w - l n

5. ERNA at work

74

for any natural number n ? 1 . We prove ( 5 . 2 ) by hypernatural induction. For n = 0 the requirement ( 5 . 2 ) is identical with the assumption (i) . Now let n be a hypernatural for which ( 5 . 2 ) holds. If f(rec(n) - d/2n ) ? 0, the definition of T implies that rec(n + 1) T(rec(n) , n) rec(n) - d/2 n , which translates the assumption into f(rec(n + 1)) ? 0. Otherwise, rec(n + 1) = T(rec(n) , n) = rec(n) , making the induction hypothesis identical with the re quirement f (rec(n + 1)) ? 0. Next we consider (5.3). Our proof demands that n = 1 be treated sep arately. We have rec(1) T(rec(O) , 0) T(c, 0), and this is simply b since f(c - d) f( b ) < 0. Therefore, the property (5.3) is identical with the as sumption (ii) . Now the proof for any hypernatural N ? 2. We consider the formula =

=

=

=

=

N(n) 1\ n :S: N - 2 1\ rec (N - n) =/= rec(N - n - 1) - d/2 N- n - l

(5.4)

and consider two possibilities. First possibility: there are no hypernaturals n satisfying (5.4) . This means that rec(N - n) - d/2N- n - l

=

rec(N - n - 1) - d/2N- n - 2

for 0 :S: n :S: N - 2 , and by example 5.2.3 it follows that rec(N) - d/2 N -l = rec(1) - d = c.

( 5.5 )

As f(c) < 0, we conclude that (5.3 ) holds for our N. Second possibility: there are hypernaturals n satisfying (5.4) . If so, let no be the smallest one, as provided by theorem 5.2.6. Then no :S: N - 2 and rec(N - no) =/= rec(N - no - 1) - d/2 N -no-l l.e. T(rec(N - no - 1), N - no - 1) =/= rec(N - no - 1) - d/2 N -no- l . The definition of T(t, n) shows that then, inevitably, T(rec(N - no - 1), N - no - 1)

=

rec(N - no - 1),

meaning that f(rec(N - no - 1) - d/2 N - no- l )

=

f(rec(N - no) - d/2 N- no- l ) < 0. ( 5.6 )

By the leastness of no, rec(N - n) - d/2N- n- l = rec(N - n - 1) - d/2 N - n - 2

References

75

for 0 <::.: n <::.: no - 1. Hence rec(N) - d/2 N - l = rec(N - no) - d/2 N - no - l by example 5.2.3. Substituting in (5.6 ) gives f(rec(N) - d/2 N - l ) < 0, as was to be proved. Finally. assume the extra property (5. 1). If 1' > b is another constant with the properties (iii) and (iv), we cannot have 1' « 1, as property (iv) for 1 would imply that there are x > 1' satisfying f(x) < 0, which by (5. 1) leads to f( r ') < 0 and contradicts the property (iii) for 1' · Likewise for the possibility 1 « 1' · Therefore 1' � 1. D This theorem allows us to equip ERNA with a square root up to infinitesi mals function. Example 5.2.4 For every finite constant p > 0, ERNA provides a constant

1 > 0, unique up to infinitesirnals, such that 1 2 � p.

=

It follows from the properties of an ordered field that the term f ( x) 2 x - p and the constants b = 0, c = 1 + p satisfy the requirements of theo rem 5.2.9 , including the extra requirement (5. 1). If 1 is the constant resulting from the theorem, then 1 2 � p and for every natural n � 1 there are x > 1- 1/ n with x 2 < p. Moreover, x < 1 + p by the properties of the ordered field. Hence 1 2 < x 2 + 2x/n + 1/n 2 < p + 2(1 +p)/n+ 1/n 2 . By corollary 5.2.1 , we conclude that 0 <::.: 1 2 - p � 0. D Proof.

References

[1] R.

and P. S UPPES "Free-Variable Axiomatic Foundations of Infinitesimal Anaysis: A Fragment with Finitary Consistency Proof", J. Symb. Logic, 6 0 (1995) 122-159. CHUAQUI

[2] R.

,

S oMMER and P. S UPPES "Finite Models of Elementary Recursive Non standard Analysis", Notas de la Sociedad Matematica de Chile, 15 (1996) ,

73-95.

[3] R.

S OMMER and P. S uPPES , "Dispensing with the Continuum", J. Math. Psychology, 41 ( 1997) 3- 10.

[4] P.

S UPPES and R. CHUAQCI, A finitarily consistent free-variable positive fragment of Infinitesimal Anaysis, Proceedings of the IXth Latin Ameri can Symposium on Mathematical Logic, Notas de Logica Mathematica, 3 8 ( 1993) 1 -59 , Universidad Nacional del Sur, Bahia Blanca, Argentina.

The Sousa Pinto approach to nonstandard generalised functions R. F. Hoskins*

Abstract Nonstandard Analysis suggests several ways in which the standard the ori es of distributions and other generalised functions co uld be reformu lated . This paper rcvit>ws t.he contributions of .Jose Sousa Pinto to t;his area up to his untimely death four years ago. Following the original presentation of nonstandard models for the Seba.stiao e Silva axiomatic treatment of distrilmt.ions and ultradistribut;ions he worked on a nonstan dard theory of Sato hyper:functions, using a simplC' ultrapower model of the hyp cr eals. ( Tllis in particular allows noustandru:d representations for gen eralised distributions , such as those of Rounlieu , B em·ling , and so 011.) He also considered a nonstandard theory for (;he generalised func tions of Colombeau, and finally tmned his attention to the hyperfinite repres entation of generali sed functions, following the work of Kinoshita. r

6.1

Introduction

Jose Sousa Pinto of the University of Aveiro, Portugal, died in A ugust

2000

after a prolonged and debilitating illness. His interest in nonstandard methods, particularly in their application to the s tudy of generalised functions, was of long st and ing and he will be especially remembered for his part. in t he organi

International Colloquium of Nonstandard Mathematics h eld at Aveiro 121 iu 1994. A most modest. aud unassuming mathematician, his contributions to NSA are less \vell known than their value

sation of the highly successful

deserves and this paper is concerned to report his work and to stand as some tribute to his memory. From a personal point of view

I would also wish

to take

· Department. of ElecLTonic and Electrical Engineering, Loughborough Uni versity, Lough boTough, U K .

royhosk@aol . com

6.2.

77

Distributions, ultradistributions and hyperfunctions

this opportunity to acknowledge the value and pleasure I have had in working

with him over many years.

6.1.1

Generalised functions and N.S.A.

The theory o f generalised functions is a subject o f major importance in

modern analysis a.nd one that inal presentation

! lSI

has gone

through many changes since the orig

of the theory of distributions in tho forrn given to it. uy

Laurent Schwartz in the

1950s.

Various alternative approaches to the theory

have been explored over the years, and the subject, has been expanded (and complicated) by the introduction of generalised distributions of several types,

ultradistributions, hyperfunctions and so on. In recent years the development of a unifying and simplifying treatment of the whole subject area has become

possible through the use of Nonstandard Analysis (N.S.A.) The applic a ti on of nonstandard methods to distributions and other generalised ftmctions was aheady considered by Abraham

1966, then. a

Robinson

in his classic text

[15!

on N.S.A. in

and various workers have extended and developed this approach since It

was

Sousa Pinto who first considered the possibility of developing

nonstandard realisation of the S e bas t i ao e Silva axioms for Schwartz dis

tributions

!6] ,

and later for nltradistributions

[7] .

His further work on Sato

hyperfunctions remained unpublished at the time of his death and an outline

of this forms the

main part of the present paper.

The first section of the paper briefly reviews standard material on distribu tions, ultradistributions and Sato hyp<:rfunctions, and summarises the earlier nonstandard re-formulation of that material on which the subsequent develop ment is based.

6.2 6.2.1

Distributions, ultradistributions and hyperfunctions Schwartz distributions

vVe recall first some basic facts about distributions. A distribution, in the

functional on the space TJ = TJ(JR) of compact support, equipped with an

sense of Schwartz, is a continuous linear of all infini tely diffenmt.iable functions

appropriate topology. That is to say, a distribution is simply a member

t.h<'

topological dual TJ' (lR) of that space. The

fL E TJ' is the distribution Dp defined by

fL of

distributional derivative

< DfL , ¢ >=< J.t, - ¢' >, V¢ E TJ.

of

6.

78

The Sousa Pinto approach to nonstandard generalised functions

It follows from this definition that all distributions are infinitely differentiable in this sense. Moreover, it can be shown that every distribution is locally a finite-order derivative of a continuous function. Those distributions which are globally representable as finite-order derivatives of continuous functions are called, not unnaturally, finite-order distributions. The space of all such finite-order distributions is denoted by Dfin (I�) . Every locally integrable function f defines a so-called regular distribution p,f according to < f-LJ · ¢

>=

-1+= f(x)cjJ(x ) dx, =

for all ¢ E D.

D' contains elements other than such regular distributions, so that D' is a proper extension of the space Ll oc (�) of all locally integrable functions. In this sense distributions may be legitimately described as generalised functions. However there is no direct sense in which a distribution can be said to have a value at a point. This becomes particularly clear in the case of those distribu tions which are not regular. The delta function is the prime example of such a singular distribution, being defined simply as that functional 6 (obviously linear and continuous) which maps each function ¢ E D into the number ¢(0) . It is a finite-order distribution since it is the second derivative of the continuous function ( t) = tH ( t), where H denotes the Heaviside unit step.

6.2.2

x+

The Silva axioms

The definition of distributions as equivalence classes of nonstandard inter nal functions in a nonstandard universe was already made explicit in Abraham Robinson's original text [15] on N.S.A. Several other nonstandard models for D' (�) have since appeared. In particular such a model was presented by Hoskins and Pinto [6] in 1991, based on the axiomatic treatment of distribu tions given by Sebastiao e Silva [19] in 1 956. The Silva axioms for finite-order distributions on an interval I C � can be stated as follows: Silva axioms for finite order distributions Distributions are elements of a linear space £ (!) for which two linear maps are defined: L : C (I) ---+ £ (!) and D : £(!) ---+ £(!), such that

f E C is a distribution) . corresponds Dv E £ such that, if v = ( j )

Sl L is the injective identity, (every S2 To each v E £ there then Dv = L ( j ' ) S3 For

L

v E £ there exists f E C and r E No such that v = Dr ( f ) . L

E C 1 (I)

6.2.

79

Distributions, ultradistributions and hyperfunctions

S4 Given j, g E C and r E No, the equality Dr L(j) only if ( ! g) is a polynomial of degree < r. -

=

Dr L(g)

holds if and

Silva gives an abstract model for this set of axioms as follows : define an equivalence relation o on No x C by ( r, f ) o ( s , g)

+--+ 3m E No { m ? r, 1\ (I;:' - r f - r;:- s g ) E II m } s

where II m is the set of all complex-valued polynomials of degree less than m and I� is the kth iterated indefinite integral operator with origin at a E I . Now write Then C= is a model for the Silva axioms S l-S4, and every model for the Silva axioms is isomorphic to C= . In particular Dfin (l�) is isomorphic to C= (�) . The extension to global distributions of arbitrary order is straightforward. See, for example, the exposition of the Silva approach to distribution theory given in Campos Ferreira, [3] . A nonstandard model for Coo A nonstandard model for these axioms was presented by Hoskins and Pinto [6] , using a simple ultrapower model *� = �N / for the hyperreals. It may be summarised as follows: The internal set *C= (�) is the nonstandard extension of the standard set coc (�) of all infinitely differentiable functions on �' rv

*Coc (�) = {F = [ ( !77 ) n EN] : fn E c = (�)

for nearly all

n E N}.

This set is a differential algebra. We denote by 5C(�) the (external) set of all functions F E *C= (�) which are finite-valued and S-continuous at each point of *�b· An internal function F E *C = (�) is then said to be a predis tribution if it is a finite-order *derivative of a function in 5C(�). The set of all such pre-distributions is given by,

r 2:0 r = * * (�) : F for some E 5C(�) and some r E N0 }. = {F E c = D We then have the following (strict) inclusions: The members of *D={5C(�) } arc the nonstandard representatives of finite or der distributions on �. Given two such pre-distributions F and G, we say that

80

6. The Sousa Pinto approach to nonstandard generalised functions

they are distributionally equivalent, and write F3 G, if and only if there exists an integer m E No and a polynomial Pm of degree m (with coefficients in *JR) such that

*T;:(F - G) � Pm

where *I;;' denotes the mth-order *indefinite integral operator from a E *Rb· Then for any F E *Doc{5C(IR) } we denote by /-LF = 3 [ F] the equivalence class containing F and call it a finite order Bdistribution. The set of all such equivalence classes is denoted by

3CDO (IR) is a nonstandard model for the Silva axioms and is isomorphic with Djm(IR). 6.2.3

Fourier transforms and ultradistributions

For the classical Fourier transform of sufficiently well-behaved functions we have

j+= j+= -ixy -= f(y)e dy f(y) = -217r ](x)eixy dx and the Parseval relation j+= j+= - = f(x)g (x)dx = DO ](y)g(y)dy. ](x) =

- CXl

To extend the definition of Fourier transform to distributions Schwartz used a generalised form of Parseval relation to define jl as the functional satisfying

jl , cjJ >=< P,, cjJ > . The difficulty here is that if cjJ E D then its Fourier transform J; belongs not to D but to another space Z Z (IR) which comprises all those functions 'ljJ such that '1/J(z) is defined on CC as an entire function satisfying an inequality <

=

of the form

i z k 'ljJ (z) i

0, k = 0, 1, 2, . . . 0 it follows that although jl is well defined as a linear con �

ck exp ( a ly l ) ,

a>

.

Since D n Z = tinuous functional on the space Z it is not necessarily defined on D and may therefore not be a distribution. The members of Z' (IR) are called ultradistri butions, and constitute another class of generalised functions. Although there are functionals which are both distributions and ultradistributions there exist distributions which arc not ultradistributions and ultradistributions which arc not distributions.

6.2. Distributions, ultradistributions and hyperfunctions

81

Nonstandard representation of ultradistributions

In [7] a slight modification of the argument presented in [6] showed that every 3distribution may be represented by an internal function in *D(�) . Ac cordingly we can redefine 3CCXl (�) as follows:

s cCXl (�) = *D CXl ( SD) /2 = U { *Dr ( S D)} /3 r :;;> O where 5D is the *CCb submodule of 5C comprising all infinitely *differentiable

functions of hypercompact support which are finite-valued and S-continuous on *�b· If F = [(Jn ) n EN] is any internal function in *D then its inverse Fourier transform F = *F-1{F} is defined in the obvious way as F = [( fn ) n EN] = [(F-1{fn} ) n EN] and it follows readily that F E *D if and only if F = .r-1{ F} E *Z. Not every internal function in *D represents a 3distribution and similarly not every internal function in *Z represents an ultradistribution. However the following result was established in [7] . Let H(CC) denote the space of all standard complex-valued functions which can be extended into the complex plane as entire functions. For each entire function A(z) = L �=O an zn in H(CC) define the oo-order operator A : Z ---+ Z by setting, for each ¢ E Z,

CXl

CXl

n=O

n= O

Then:

The inverse Fourier transform of a 3 distribution in 3C= (�) is representable as a finite sum of (standard) x-order derivatives of internal func tions in *Z whose standard parts are continuous functions of polynomial growth.

Theorem 1

6.2.4

Sato hyperfunctions

Another approach to the required generalisation of the Fourier transform stems from the work of Carlemann [4] . He observed that if a function f, not necessarily in £1 (�), satisfies a condition of the form

fox l f(y) l dy = O( l x i K )

for some natural number "" '

and if we write

and

82

6. The Sousa Pinto approach to nonstandard generalised functions

then g 1 ( z ) is analytic for all "-S(z) > 0 and g 2 ( z ) is analytic for all "-S(z) < 0. Moreover, for (3 > 0, the function

g (x) ::::: gl (x + i(J) - g 2 (x - i(J) is the classical Fourier transform of the function e -f31 t l f ( t). The original func tion f can be recovered by taking the inverse Fourier transform of g and multiplying by ef31 t 1 . This suggested that a route to a generalisation of the

Fourier transform could be found by associating with f a pair of functions h (z) and h (z) analytic in the upper and lower half-planes respectively. This idea forms the basis of the theory of hyperfunctions developed by M. Sato [17] in 1959/60, (although it was anticipated by several other mathematicians) . In order to give a brief sketch of this theory it is convenient to introduce the following notation: H(C\�) = the space of all functions analytic outside the real axis. H(C) subspace of all functions in H(C\�) which are entire. Hp , l oc (C\�) space of all functions e in H(C\�) which are of arbitrary growth to infinity but locally of polynomial growth to the real axis (that is, such that for each compact K C � there exists CK > 0 and rx E No such that =

=

for all z E C with �(z) E K and sufficiently small "-S(z) i= 0) .

The hyperfunctions of Sato are the members of the quotient space Hs (�) H(C\�) /H(C), that is, the set of all equivalence classes [B] , where B(z) is defined and analytic on C\� and e1 B2 iff B 1 - B2 is entire.

Definition 2

=

rv

Sato hyperfunctions constitute a genuine extension of Schwartz distributions. This is shown by the following crucial result established by Brcmermann [1] , in 1965:

If p, is any distribution in D' then there e.rists a function p,0 (z) defined and analytic in C\� such that

Theorem 3 (Bremmerman)

Conversely, if B E Hp , l oc (C\�) then there exists p, E D' such that B(z) p,0 (z) . Identifying p, E D' with [B] E Hs (�) gives an embedding of D' such that the mapping S D' Hp ,l oc (C\�) is a topological isomorphism. =

:

Remark 4

itly by

--)

Note that if p, has compact support then p,0 (z) is given explic

6.2. Distributions, ultradistributions and hyperfunctions

p,o (z)

=

1 2 .

-

Jr Z

< p,, ( x - z)-

83

1>.

For example, 6° (z) = 2;; < 6, ( x - z) >= - 2;iz and we have < 6, ¢

6.2.5

E

> = lim -

c:--;.0

7r

1+= - CXl

X2

1 ¢ ( x ) dx = ¢(0) . + E2

Harmonic representation o f hyperfunctions

For each Sato hypcrfunction [e] in Hs (�) we can choose some specific function e 0 E [e] as the defining function of the hyperfunction and write X

Then e1[ maps the half-plane rr+ = � �+ into c, and is harmonic on rr+ . Define H (II + ) to be the linear space of all (real or complex-valued) functions defined and harmonic on rr+ , and let r H(C\�) ---+ H (II+) denote the map given by :

Then we have the result

r is an onto map and if r(e 0 ) = r(v0 ) then 0 0 e - v is a complex constant. Now suppose that [e] is the null hyperfunction, so that e0 belongs to H(C) . It is easily shown that e1[ ( x, y) is an entire function in both variables (that is,

Theorem 5 (Li Bang-He, [13] )

can be extended into C x C as an entire function) , is odd in the variable y and such that On the other hand we have immediately from the above theorem,

Let e1[ E H(II+) be an harmonic function entire in both variables and odd in the variable y. Then there exists an entire function e E H(C) such that e1[ ( x , y) = e ( x + i y) - e ( x - iy) for all ( x. y) E rr+ . Corollary 6

Denote by H o ( II+ ) the linear subspace of H(II+) comprising all functions which extend into C x C as entire functions in both variables and which arc odd in the second variable. Then we have Hs (�)

rv

H (II + ) / H o (II+ )

84

6. The Sousa Pinto approach to nonstandard generalised functions

and every equivalence class [B1r (x, y)] E H (II + )/Ho (II + ) is a representation by harmonic functions of the corresponding hyperfunction [B] E Hs (�) . In the sequel we will use analytic or harmonic representation for hyperfunctions as occasion demands.

6.3

Prehyperfunctions and predistributions

Nonstandard representation of hyperfunctions In the present context w E * � denotes the infinite

hypernatural number defined by [(n) n EN] in *Noc . Then to each harmonic function V1r E H(II + ) there corresponds an internal function F{ v } : * � ----+ * C defined by where c; = w - 1 E mon(O) and *v1f *(v1f) is the nonstandard extension of v1r · The internal function F{ v } clearly belongs to *C = (�) . If we define a map w : H(II + ) *C = (�) by setting w(v1f) = F{ v } then we have =

____,

where the inclusion is strict. The set w 'Hs = w (H (II + )) is an external subset of *C = (�) ; however it can be embedded into an internal subset as follows: Consider the nonstandard extension *H(II + ) of H(II + ) comprising all *harmonic functions on *II + and then define,

Then we have =

where w H( �) contains elements which are infinitely close to members of w 'Hs w (H (II + )) and also elements which are far from any internal function in that space. The members of w H ( �) will generally be called prehyperfunctions.

Prehyperfunctions which are near some element of w 'Hs (�) are said to be nearstandard prehyperfunctions, and the others may be called remote prehyperfunctions . The set of all such nearstandard prehyperfunctions will be denoted by The clements of w H( � ) enjoy an important property which is not shared by all internal functions in *C = (�) , namely:

6.3. Prehyperfunctions and predistributions

85

Every prehyperfunction in w H(l�) on the line may be extended into the hypercomplex plane as a * analytic function in the infinitesimal strip D10 = {z E * CC : l lm(z) l < c } .

Theorem 7

Any function is analytic at the centre of an open disc on which it is harmonic. Hence any internal function F(x) , x E *R in w H(l�) extends into the hypercomplex plane z = � + iT) and is *analytic on the disc Proof.

2

T) 2 < 62 . For � = x we have < T) < c and since x may be any hypcrrcal it follows that the internal function F{ v} ( x) extends as a *analytic function into the (�

_

X)

+

-c:

infinitesimal strip in *CC defined by I Im(z ) l < c. D The converse docs not hold: not every *analytic internal function in the infinitesimal strip 010 is a prehyperfunction on the line. The product of two prehyperfunctions, for example, is a *analytic function on the strip nc but need not itself be a prehyperfunction: w H (l�) is a linear space over *CC but not an algebra. Now let A(D10) denote the set of all internal functions in *CDO (�) which may be extended *analytically into the strip nc. A(Dc) is a differential subalgebra of *CDO (�) with respect to the usual operations of addition, multiplication and *differentiation. Moreover we have, The product of any two prehyperfunctions makes sense within the algebra

A(D10) although such a product will not generally be a prehyperfunction. In view of the above inclusions it seems appropriate to call the members of A(D10) generalised prehyperfunctions.

Finally let 0(010) be the subset of all internal functions 8 E A(D10) such that *D k 8(x) � 0 for all x E *�b and for all k E No . 0(010) is a linear *CCb-submodule (but not an ideal) of A( 010) and therefore is a module over *CCb: its elements might conveniently be called generalised Every such generalised 3 -hyperfunction which may be represented by an internal function in w H ns (�) is called a standard 8-hyperfunction or simply a 8 -hyperfunction. The set of all 3-hyperfunctions on the line is defined by 8-hyperfunctions.

and we have the isomorphism

6. The Sousa Pinto approach to nonstandard generalised functions

86 6.4 6.4. 1

The differential algebra A(fl�:: ) Predistributions of finite order

Although not every continuous function f on � may be continued ana lytically into the complex plane, every such function does admit an analytic representation in the sense that there exists a unique hyperfunction [f ( x, y)] in Hs (�) such that f1f (x, y) ---+ f (x) as y l 0 uniformly on compacts. Then the internal function F(x) *f1f (x, ) belongs to wHs (�) C w Hns ( �) C A(Slc:) and is such that for all x E * �b· F(x) � *f(x) , F(x) is S-continuous at every point (standard and nonstandard) of *�b; that is, Vx, y E * �b [x � y =? F(x) � F(y)]. Reciprocally, every internal function F E A(010) which is finite and S continuous at every point x E *�b is infinitely close to a (standard) continuous function f defined on �. We denote by 5C (0c:) the subalgebra of all functions in A(010) which are finite and S-continuous on *�b· An internal function F *� ---+ CC is said to be finitely *differentiable at x E *� if it is *differentiable at x and, in addition, *DF(x) is a bounded number. 7r

=

c

:

8 An internal function F E A( 010) which is finitely * differentiable on *�b belongs to 5C (S1c:) .

Theorem

For any internal functwn F in 5C(S1c:) the standard function f = st (F) will be called the shadow of F while the regular distribution vp = stv (F) E D' (�) generated by F will be called the D- shadow of F.

Definition 9

Now let be an arbitrary function in 5C(010) . Since 5C(0c:) is not a differential algebra the derivative, *D, will not necessarily belong to 5C (S110). However it is easy to see that the functional /-L*Dif> D(�) ---+ CC defined by :

< /-L *Dif> , cjJ > = st( < , - *¢' >) is a well defined distribution. Hence *D has a well-defined D-shadow. If, in addition, *D itself belongs to 5C (Slc:) then V*Dif> is the (regular) distribution generated by the standard function f' = st (*D) which is the standard deriva tive of the function f st () . Let *D0{5C (010) } denote the set of derivatives of functions in 5C (0c:) and for each k E No define =

6.4.

The differential algebra A(Dc:)

87

5 C (De:) . We can now define the following (external) 00 5 Coo (D c: ) U *D k { 5 C(D s ) } . k=O Then for each F E 5C00 (Dc:) there exists an S-continuous function E 5C(Dc:) and an integer r E No such that F *Dr. The functional /-LF D(IR) , defined by where *D0{8C (D10)} subset of *H (D10) :

=

=

:

is a distribution. We call /-LF the D-shadow of the internal function F and write f-LF = stv (F) . That is to say, we extend stv into 8Coo (Dc:) as a map ping with values in D' (IR) . An internal function F E *H(Dc:) is said to be *Svdifferentiable if, for every ¢ E D there exists a standard number b¢ such that < T - 1 {F(x + T) - F (x) }, *¢ >� b¢ for all T � 0, T # 0. As is easily confirmed, every function F in 5C00 (De:) is *Svdifferentiable, and we define the 5 differential order of F to be the number 5o( F) defined by 5 o(F)

=

min{j E No : F

=

*DJ , for some E 5C (D10)}.

Accordingly we call 5C00 (Dc:) the set of all predistributions o f finite

8 differential

order.

Replacing the (standard) concept of distribution of finite order by the (nonstandard) concept of predistribution of finite 5 differential order it is clear that 5Coo (Dc:) with the *D operator constitutes a natural (nonstan dard) model for the axiomatic definition of distributions of finite order given by J .S. Silva. Distributional equivalence

We may now glue together all internal functions in 5C00 (De:) which have the same distributional shadow. That is to say, we define the following equivalence relation on 5 Coo (De:) : P G E 5C00 (D10 ) are distributionally equivalent , written F3G, if and only if they have the same distributional shadow. The quotient space is a *Cb-module which is isomorphic to the space C00 (IR) of .J .S. Silva distribu tions of finite order.

88

6.4.2

6.

The Sousa Pinto approach to nonstandard generalised functions

Predistributions of local finite order

Finite order predistributions in 5C= (�) are not the only prehyperfunc tions which have a distributional shadow. Let F E w1is (�) \5Coc (D s ) be a prehyperfunction such that for every compact K C � there exists an integer rx E No and an internal function W K E w1is (�) which is S-continuous on some *-neighbourhood of *K so that

for all x E *K C �b · The smallest such rx will be called the 5differential order of F on *K, and denoted 5oK (F) . If ¢ E D (R) has support contained in K then < F, * 7r >=< K, ( - l rK *c/J (r K ) > is a bounded number and so F has a distributional shadow in DK (�) C D (�) . Since K may be any compact in R it follows that F has a shadow in D (R) and so p, = stD (F) will be a well defined (standard) distribution in D' (R) . Denote by 5C1f (�) the subset of all prehypcrfunctions which have a distri butional shadow. Then we have the inclusion

}

{

Moreover 5C1f (�)\5Coc (�) is not empty since it contains, for example, the internal function F : *� *C defined by ____,

(

F x)

=

+=

� *D"

1 w 1r l + w2 (x - i) 2

The members of 5C1f (�) will be called predistributions of local finite order or, more simply, predistributions. 6.4.3

Predistributions of infinite order

Let be any internal function in 5C (S110) and suppose that there exists an harmonic function g E H (II + ) such that (x) = *g ( x , ) Let also r = [rn] be an arbitrary infinite hypernatural number. For every n E N the function c:

arn g aXrn (x, y),

.

( x, y) E rr +

is again harmonic on rr + , and so *Dr is a generalised prehyperfunction in *1i(Sl10) . The internal function *Dr is locally bounded by r! wr ; that is to say,

6.4.

The differential algebra A(Dc:)

89

for each compact K C � there exists a bounded constant CK such that for all x E *K we have In general the infinite order derivative *Dr may have neither an ordinary shadow nor a distributional shadow, and stv (*Dr ) may have no meaning. On the other hand, for any ¢ E D(�), we have

< *Dr , *cjJ > = (- 1Y < , *cjJr > . But, since there exist test functions in D(�) whose derivatives may grow arbi

trarily with the order, then

will not in general be a bounded hypercomplex number. Hence stv (*Dr ) may have no meaning. However we can define a family of standard part maps which allow us to attach a type of shadow to derivatives of the form *Dr for infinite r E *N= and internal functions in 5C(D10). These standard part maps are defined on certain subspaces of D(�) , for example on those of so-called Roumieu type which we recall very briefly as follows. Spaces of Roumieu type [16]

Let M denote the set of all positive real sequences (Mp)pENo such that (a) (Mv) 2 ::; Mv- l Mp+ l , p = 0, L . . . ,

(b) Mp ::; AhP min o ::;q::;p{MpMp- q } , p = 0, 1 , . . . , for some positive constants A and h, += (c) L (Mp) - l /p < + oo. Further, let 1J(Mp) (�) be the subset of D(�) comp=O prising all functions ¢ whose derivatives satisfy

p = 0, 1, . . .

'

for some sequence ( lvfp) E M, where A and h are positive constants (generally dependent on ¢) . It can be shown that 1J(Mp) (�) is not empty for every sequence {Mp }v ENo E M . In particular, if there exists p E No such that Mp = + oo for all p � Po then {Mp}vENo belongs to M and 1J(Mp) (�) D(�). The space 1J(Mp) (�) is the union of the family of spaces =

6. The Sousa Pinto approach to nonstandard generalised functions

90

where K runs over the set of all compact subsets of � and h runs over all positive numbers. For each real number h > 0 and compact K C � ' vV(r, ) (�) , contains all functions ¢ E D ( Mp ) (�) with support contained in K and satisfying the above inequality for that particular value of h > 0. Each space vV(r, ' l is a Banach space with respect to the norm ll c/J I I v(Mp ) K,h

= sup p 2' 0

{ h M1 :rEsupK ¢ l (x) } , P

p

I (P

l

and D ( Mp ) ( �) is provided with the inductive limit topology. D' ( Mp ) denotes the topological dual of D ( Mp ) (�) and its elements are some times called generalised distributions in the sense of Roumieu. A linear functional is in D ( Mp ) (�) if and only if it is continuous on that space for every h > 0 and compact K C �.

6.5

Conclusion

The above outline of Sousa Pinto ' s nonstandard treatment of hyperfunc tions is reported in greater detail in [10] . Sousa Pinto ' s later work, developing the hyperfinite approach to distributions initiated by Kinoshita, is given in [8] and [9] , but most comprehensively in his last publication, the book [20] , which is now available in an English translation.

References

[1] [2] [3] [4] [5]

Distributions, Complex Variables and Fourier Tmnsforms, Massachusetts, 1965. N . J . CUTLAND ET AL ., Developments Nonstandard Mathematics, Longman, Essex, 1995. J. CAMPOS FERREIRA, Introdur;ao a Teoria das Distribuir;oes, Calouste Gulbenkian, Lisboa 1993. T. C ARLEJ\IANN, L 'integmle de Four-ier et questions qui s 'y mtachent, Uppsala, 1944. B . FISHER, "The product of the distributions x -r and c) ( r- l ) ( x ) ", Proc. Camb. Phil. Soc., 72 ( 1972) 201 -204. H . J . B REMMERMANN,

m

[6] R . F .

HoSKINS and J. SousA PINTO, "A nonstandard realisation of the .J .S. Silva axiomatic theory of distributions", Portugaliae Mathematica, 48

(1991) 195-216.

References

91

[7] R.F. HoSKINS and

J . SousA PINTO, "A nonstandard definition of finite order ultradistributions", Proc. Indian Acad. Sci. ( Math. Sci. ) , 109 (1999)

389-395.

[8]

R .F.

HoSKINS and J. SousA PINTO, "Hyperfinite representation of dis tributions", Proc. Indian Acad. Sci. ( Math. Sci. ) , 1 1 0 (2000) 363-377.

[9] R . F . HoSKINS and .J . SousA PINTO, "Sampling and IT-sampling expan sions", Proc. Indian Acad. Sci. ( Math. Sci. ) , 110 (2000) 379-392. [10] R . F . HoSKINS and J . SousA PINTO. Theories of Generalised Functions, Horwood Publishing Ltd., 2005. [11] KINOSHITA MoTo-o, "Nonstandard representations of distributions, I", Osaka J. l\Iath., 25 ( 1988) , 805-824. [12] KINOSHITA l\IoT0-0, "Nonstandard representations of distributions, II", Osaka J. l\Iath., 27 ( 1990) 843-861 . [13] L I BANG-HE, "Non-standard Analysis and Multiplication of Distribu tions", Scientia Sinica, XXI-5 ( 1978) 561-585. [14] LI BANG-HE, "On the harmonic and analytic representation of distribu tions", Scientia Sinica ( Series A ) , XXVIII-9 (1985) 923-937. [15] A. RoBINSON, Nonstandar-d Analysis, North Holland, 1966. [16] C . RouMIEU, "Sur quelques extensions de la notion de distribution", Ann. Scient. Ec. Norm. Sup., 3e scrie (1960) 41-121 [17] M. SATO, "Theory of Hyperfunctions", Part I , J. Fac. Sci. Univ. Tokyo, Sect. I, 8 (1959) 139-193 ; Part II, Sect. I, 8 (1960) 387-437. [18] L. ScHWARTZ, Theorie des Distributions I, II, Hermann, 1957/59. [19] J . S . SILVA, Sur l ' axiomatique des distributions et ses possible modeles, Obras de Jose Sebastiao e Szlva, vol III, I.N.I.C., Portugal. [20]

J. SocsA PINTO, Metodos lnfinitesimais de Analise Matematica, Lis boa, 2000. English edition: Infiniteszmal Methods of Mathematical Analysis, Horwood Publishing Ltd., 2004

N eut rices in more dimensions Imme van den Berg*

Abstract

the nonst.andm·d real number system, external sets. They may also he viewed a.s modules over the external set of all limited numbers, as such non-noetherian. B
Neutrices are convex subgroups of most of them

cause of

are

the convexity aJHl the invariauce nuder some t.nwslationA and

mult.iplicatious, the external nentrices are appropriate models of orders

a cal e:ct.eTnal nnmbers has been developped, which includes solving of equatious , and even an analysis, for the structure of external num bers has a property of completeness. This paper contains a further step, of magnitude of numbers. Using their strong algebraic structure culus of

towa1·ds linem· algebra and geometry. trix is the direct

smn

We show that in

of two neutrices of

chosen orthogonal.

�.

�2

every neu

The components may he

Introduction

7.1 7. 1 . 1

Motivation and objective

Consider t.he problem to specify a mathematical model for the intuitive no tion of "order of magnitude". Orders of magnitude have some intrinsic vague ness, haziness or superficiality. They arc bounded and invariant under at least

some additions, or altern a ti v ely, translations. Classically, orden; of magnitude

are modelled with the 0- or a-notation, which can be seen as additive groups of real functions in one variable

[19].

One may ru·gue that there is some friction

between the intuitive notion of order of magnitude, which is about numbers,

and

its model, which concerns functions. If one wishes to preserve the prop

erties of boundcdneAs and invru·iance under some additions in the real number system, one enters into conflict wit.h the ru-chimedian property, a conflict also

·university of Evora, Portugal. :i.vdb
7. 1 . Introduction

93

known as the Sorites paradox [28, 35] . However, this difficulty can be circum vented, if one models within the real number system of nonstandard analysis. Nonstandard analysis disposes of so-called external sets, which do not cor respond to sets of classical analysis. External sets of real numbers may be convex and bounded without having an infimum and supremum, and they are invariant under at least some additions. It may be shown that such an ex ternal set E has a group property: There exists c; > 0 such that whenever x E E one has x + lc: E E for limited (i.e. , bounded by a standard integer) real numbers l. So it is natural to consider convex (external) additive subgroups of �. which have been called neutrices in [29] and [30] . The term is borrowed from Van der Corput [20] , who uses it to designate groups of functions, which may be more general than Oh's and oh's. Within the nonstandard real num ber system there exists a rich variety of neutrices; simple examples are 0, the set of all infinitcsimals, and £, the set of all limited real numbers, for more intricate examples see [29, 30] and also section 7.3.2. External numbers are the sum of a (nonstandard) real number and a neutrix. One could define an order of magnitude simply to be a convex (external) subset of the nonstandard reals; then it is in fact an external interval, bounded by two external numbers (see [5] for a proof) . The external numbers satisfy a calculus, which is rich enough to include addition, subtraction, multiplication, division, order, solution of equations and calculation of integrals. All in all, this calculus of orders of magnitude resembles closely the calculus of the reals. We refer to [29, 30] for definitions, results and notations. We recall here a notation for two orders of magnitudes which are not neutrices. The set of positive appreciable numbers @ corresponds to the external interval (0, £ ] and the set cj:J of positive unlimited numbers corresponds to the external interval ( £, �] . The neutrices and external numbers have been applied in various settings (the terminology not being explicit in earlier papers) : singular perturbation theory ([2, this paper describes the discovery of the "canard" phenomenon, the set of parameters for which it occurs is an external interval)] , [22] , [10] , [11, papers on exponentially small thicknesses of boundary layers] , [12, thick ness of transitions of boundary layers] ), asymptotics [5, sets of numbers hav ing the same (nonstandard) asymptotic expansion, and domains of validity of asymptotic approximations] , probability theory [6, modelling of mass and queue of probability distributions] and psychology [7, inperfect knowledge of maximal utility] . Special mention has to be made of the work of Bosgiraud, who applies the external numbers nontrivially to problems of modelling and calcu lation of insecure statistical events in a series of papers [13, 14, 15, 16, 17, 18] . This paper makes a step towards an external calculus in more variables. The main result (7.2.2, decomposition theorem) states that every neutrix in

94

7.

Neutrices in more dimensions

the two-dimensional real space is the direct sum of two neutrices in the one dimensional nonstandard real line. The neutrices are uniquely determined, and correspond to orthogonal directions. So in a sense the neutrices in the plane possess a dimension, too, which may be interpreted as "length" times "width". In a second paper [8] we extend the decomposition theorem to �k for arbitrary standard k, and also show that the decomposition fails if k is unlimited. One reason for dividing the publication of the result into two separate papers is that the proofs of the two-dimensional case and the k-dimcnsional case arc both rather lengthy. It is to be noted that the proof of the k-dimensional case uses in an essential way the result in two dimensions, but is by no means an extension of its proof by some form of external induction. For standard dimension, the result answers in part a conjecture by Georges Reeb. who suggested that one should be able to recognize the dimension of a space on its external subsets. He also conjectured that there should be a relatively easy nonstandard proof of the topological dimension theorem, or the invariance of domain theorem, but this remains unsettled (some progress has been made by Reveilles [34] ) . One may define an external point to be the sum of a (nonstandard) vector and a neutrix. On the basis of the decomposition theorem it could be inter esting to develop fragments of linear algebra, matrix-calculus, geometry and multivariate analysis and statistics, in order to model, in the case of more vari ables, approximate qualifications ("small", "enormous", "somewhat", "good") , approximate phenomena ("mistbanks", "stains", "spheres of influence", "super ficiality") , and approximate calculus and reasoning. 7. 1 . 2

Setting

Roughly spoken, an internal set is a set defined by a formula of classi cal analysis (which may contain parameters) , and an external set cannot be defined this way. In k-dimensional space, except for its linear subspaces, every neutrix is an external set. The class of external sets differs from one nonstandard model to another, or alternatively, from one nonstandard axiomatics to another. How ever, the particular external sets met in applications are to a large extent the same. Indeed, usually they reduce either to 0 or to £; these two sets have essentially the same properties through all common nonstandard models and axiomatics. The setting of this paper is the axiomatic system Internal Set Theory (IST) of Nelson [32, 33] , and we refer to [24] and [23] for up-to-date presentations and terminology. The language of this system contains two primitive symbols E and st (standard) . It differs from model-theoretic approaches in the way that

7.2. The decomposition theorem

95

nonstandard elements are already contained in infinite standard sets, instead of extensions of such sets. It has the advantage of simplicity, of saturation in all (standard) cardinals, and of full validity of the "Fehrele principle" [5] : no galactic formula, i.e. a I; 1 -formula, starting with the "external quantifier" 3x ( st x 1\ ) is equivalent to a halic formula, i.e. a II 1 -formula, starting with ) unless they arc equivalent to an the "external quantifier" Vx ( st x -----+ internal formula. These are exactly the two characteristics that enable to prove the classification theorem of halftincs (theorem 7.3.23) , which has as a direct consequence that every order of magnitude is an external interval, and constitutes a crucial step in the proof of the decomposition theorem for neutrices in two dimensions (theorem 7.3.42) . Formally, Nelson's axiomatics does not regard external sets, but there are no major problems if we consider "external sets" which have only internal ele ments, and are defined by an external formula in which all "external quantifiers" range over standard sets. This is supposedly the case for neutrices, and for sets reduced to such. Axiomatics which enable to deal with this kind of external sets are given by [31 , 27, 1 , 25, 26] . · · ·

7.1.3

· · ·

Structure of this article

In section 7.2 we define formally the notion of neutrix and state the de composition theorem for neutrices in 2-dimensional space. The decomposition theorem is proved in section 7.3. The actual proof, which is contained in section 7.3.3, needs some elementary external geometry (section 7.3.1) and algebra (section 7.3.2). In particular we study two kinds of division for neutrices in one dimension, and their relation to certain geometric properties of neutrices in two dimensions. We give special attention to practical aspects, like the calculation of the divisions, and present many examples. In general, the proof is neither fully algebraic, nor fully analytic. Instead, it tends to be a mixture of algebraic and analytic arguments, where typically algebraic operations arc adapted to the order of magnitude of the quantities involved. 7.2

The decomposition theorem

Definition 7.2.1

subgroup of �k .

Let k E N, k � 1 be standard. A neutrix is a convex additive

A convex (external) subset N of � is a subgroup if and only if it is sym metric with respect to 0 and whenever x E N one has 2 x E N. Then by external induction nx E N for all standard n E N; by convexity, one has lx E

96

7.

Neutrices in more dimensions

N for all limited l E R This indicates an alternative way to define neutrices: they should be modules over £, the external ring of all limited real numbers: ·

a subset N of � is a neutrix if and only if £ N = N . The only internal neutrices of �k are its linear subspaces. Thus the only internal neutrices of � are {0} and � itself. Two obvious external neutrices in � arc £ itself and 0, the external set of all infinitcsimals. The ncutriccs of the form c£ for some positive c E � ("c-galaxies") are isomorphic to £ and the ncutriccs of the form c0 for some positive c E � ("c-halos") arc isomorphic to 0 . Every neutrix N f. {0} is non-noetherian in an external sense: there exists always a strictly ascending chain of subneutrices (Nn ) n EN with N1 � N2 � for standard indices n. Indeed, let w E � � Nn � Nn + 1 � be positive unlimited, and put Nn = wn £. Then Nn ct Nn+ 1 for all indices n. Consider ww which is not an element of Nn for all standard indices n. Let c E N be sufficiently small such that also cww E N . Then (cNn ) st n is a strictly ascending chain of £-submodules of N . It may be proved [5] that for any strictly ascending chain ( Nn ) st n of neutrices, the union U st n Nn is neither isomorphic (for internal homomorphisms) to £ nor to 0. This suggests that there is a rich variety of external neutrices in �' non-isomorphic with respect to internal homomorphisms. Still, as it is tacitly understood that a neutrix is defined by a bounded formula ¢ of Internal Set Theory, a neutrix has a simple logical form, for ¢ may be supposed to be internal, galactic or halic [5] . The main theorem of this paper asserts that in a sense augmenting the dimension to two docs not generate entirely new types of ncutriccs, for any neutrix in �2 may be decomposed into two neutrices of R We adopt the notation Nx { nx I n E N} for the ncutrix of all multiples of some vector x with coefficients in some neutrix N C �. Theorem 7.2.2 (Decomposition theorem) Let N C �2 be a neutrix. Then there are ne1drices N1 ::) N2 in � and orthonormal vectors u 1 , u 2 such that · · ·

· · ·

=

Moreover, if lvf1 ::) lvh in � are nev.trices and 111 , v 2 are O'rthonormal vectors with N M1 v1 EB M2 v2 , it holds that lvh N1 , M2 N2 . =

=

7.3

=

Geometry of neutrices in IR 2 and proof of the decomposition theorem

The present section is divided into various subsections, conform the various stages of the proof of the decomposition theorem.

7.3. Proof of the decomposition theorem

97

The first subsection contains the definitions of the notions of thickness, width (smallest thickness) and length (largest thickness) of neutrices, which are the basic ingredients of the proof. We prove an important theorem, called the "sector-theorem", expressing convexity of the thicknesses of neutrices in two dimensions on not too large sectors. One of the consequences is that the thicknesses in most directions arc minimal. This implies that. the width of a neutrix is realized in some direction, so in the decomposition represented in the main theorem we obtained already the ncutrix N2 . Other important notions used in the proof are near-orthogonality and near-parallelness. The most important step in the proof of the decomposition theorem in JR2 consists in establishing a direction, which realizes the length of the neutrix; then up to a rotation, the neutrix will simply be "length times width". This part of the proof uses a form of euclidean geometry in which the points and lines may have non-zero thickness, in fact such a thickness takes the form of a neutrix. We have to adapt some definitions, operations and theorems of ordinary, exact euclidean geometry to this geometry of "clouds" or "mistbanks". This is done in the first subsection, while the second subsection contains an algebraic tool: the "division" of a neutrix by another, the result of which may be calculated through a "subtraction", after taking logarithms. The final subsection establishes the existence of a direction which realizes the length. It uses an argument of "external analysis": every (external) lower halftine has a supremum, which is an external number, i.e. , the sum of an ordinary real number and a ncutrix (theorem 7.3.23) . In a sense, the external set of directions which realize the length is the supremum of an external set of directions which do not realize the length. 7. 3. 1

Thickness, width and length of neutrices

Let N c IR2 be a neutrix. We call N square in case there exists a neutr"ix lvl C lR such that N = M x AI . Definition 7.3.2 Let N C IR2 be a neutrix and r E IR2 be a non-zero vector. The thickness of N in the direction of r is the neutrix Tr of lR defined by Tr E JR EN . Definition 7.3. 1

=

{X I X nfu ·

The width W of N is defined by

w

and its length L by

=

n

rES1

Tr ,

}

98

7.

Neutrices in more dimensions

In an obvious way, if X is a subset of �2 one may define II X II = So, if r is unitary, we may write the thickness in the di rection of r alternatively as Tr = ± li N n �r l l · As an example, consider the neutrix £ X 0 C � 2. Let r ( ��: ) . Then Tr £ if cjJ 0 (mod ) otherwise Tr 0 . Hence W 0 and L £. It will be shown later on that for every ncutrix the width is assumed in some direction (in fact most of the directions) and the same holds for the length (in only few directions) . For square neutrices N lvf2 all thicknesses arc equal to M, hence their width and length arc also equal to M. { llxll l x E X }. =

=

=

=

=

"':'

1r

,

=

Definition 7 . 3 . 3

Let N

C �2 =

be a ne'Utrix and x, y =

E �2 .

We call x and y

nearly orthonormal if l l x l l II Y II 1 and < x, y >"'=' 0. We call x and y nearly orthogonal if l � l and l l � l l are nearly orthonormal. Let 0. Then the vectors ( 6 ) and ( v'l':_c:2 ) are nearly orthonormal. The 0 1 1 vectors ( 6 ) and ( i ) are nearly orthogonal. We introduce now a notion of near-orthogonality with respect to neutrices. c "'='

Definition 7.3.4 Let N C �2 be a non-sq'Uare ne'Utrix and x, y E �2 be non zero vectors. We call a line �y nearly parallel to N if Ty > W. We call x a near-normal vector- of N if X is neaTly or-thogonal to all y E S 1 S'Uch that y is nearly parallel to N.

As an example, consider the neutrix N £ x 0 C �2 . The vector ( � ) is nearly orthogonal to N . Indeed unitary vectors Yc: such that y is nearly parallel, i.e. Ty" £ > 0 , are all of the form ( v'l':_c:2 ) with "'=' 0. Then ( ( n . ( v'l; c:2 ) ) 0. Note that all vectors of the form ( 1 .+ a ) with a , (3 0 are also nearly orthogonal to N . The notions of near-parallelness and near-normality will be justified for neutrices N in two dimensions by theorem 7.3.6. In fact it is a consequence of the next simple, but important theorem, based on the convexity of N, that most directions have the same thickness. This proves a fortiori the existence of a direction which realizes the width. �

c

=

"'='

"'='

(Sector-theorem) Let N C �2 be a ne'Utrix and a and b two 'Unitary vectors which make an angle e with 0 ::; e :ii Let c be a 'Unitary vector, which makes an angle I with a S'Uch that 0 ::; I ::; e. Then

Theorem 7.3.5

1r.

Proof. Without restriction of generality, we may assume that Tb ? Ta > 0 and that a = ( 6 ) . Let x E �a n N . Consider the change of scale M = Nj l l x ll ·

7.3. Proof of the decomposition theorem

99

The image of x is a E M. Notice that also b E M, for Tb ? Ta . Let ( be the intersection of the lines ab and �c. Then ( E M by convexity and 1 1 ( 11 is appreciable. Because £( E M it holds that (/ 11(11 E lv1. Put z = ll x ll · ( / 11(1 1 . D Then z E N. Because ll z ll = ll x ll we conclude that Tc ? Ta . Theorem 7.3.6 Let N C �2 be a neu.trix. Then theTe e:r:ists an unitary vector u such that Tu = W. In fact the set of directions corresponding to an unitary vector- r with T,. = W contains an inter-val of the fonn ( a + 0 , a + 1r + 0 ) .

(

)

Let M = min Tm , Tm . By the sector-theorem Tu ::;:,. M for all unitary u in the first quadrant. But Tm = T( -;} ) and Tm = T( �1 ) hence Tu ? M for all unitary u in every quadrant. Hence !vi = W. If N is square, one has T,. = W for all unitary vectors r. If not, we may rotate N in order to obtain that TW = T(-01 ) > W. Let r be an unitary vector which makes an angle e with the horizontal axis such that 0 � e � If T,. > W, one should have Tm > W, by the sector-theorem applied to [B, 1r] or [0, B] , depending to whether e � � or e ? � . This implies a contradiction, hence T,. = W. Thus we proved the second assertion,with a = 0. D Proof.

Jr .

Let N C �2 be a neutrix with width W. Let u, v be two orthonormal vectors. Then Tu = W or Tv = W. The proof of the decomposition theorem for a neutrix of �2 is easy, once

Corollary 7.3. 7

we know that its length is realized in some direction.

Let N C �2 be a neutrix with length L and width W . Assume there exists a unitary vector u such that N n �u = Lu. Let v be unitary such that u _l v . Then N = L u EB W v . Moreover, if u', v' are orthonormal and N1 , N2 C � are neutrices with N1 ::) N2 such that N = N1 u' EB N2 v', one has N1 = L and N2 = W . Proof. By corollary 7.3.7 it holds that Tv = W. Because N is a neutrix, we have Lu EB Wv C N. Conversely, let n E N. Let p be the orthogonal projection of n on �u. Because II n il E L and II P II � II n i l one has II P II E L, so p E N. Then n - p E N, and because Tv = W, it follows that n - p E Wv. Hence n = p + (n - p) E L u EB W v and N C L u EB W v . We conclude that N = L u EB Wv. We prove now the uniqueness part. The neutrix N cannot contain vectors larger than its length, so N1 C L. If there exists A E L \ N1 , all vectors n in N satisfy ll n ll < 1 >- 1 , so L cannot be the length of N. So L ::) N1 , from which we conclude that L = N1 . By corollary 7.3.7, one has N2 = W. D Theorem 7 .3.8

100

Neutrices in more dimensions

7.

It is now almost straightforward to prove the decomposition theorem for neutrices in �2 if their length is of the form A£. =

Let A E � ' A > 0. Let N be a neutrix with length L A£ and width W. Then there are orthonormal vectors u and v such that N = Lu EB Wv .

Proposition 7.3.9

=

By rescaling if necessary we may assume that A 1 . Let u E N be unitary. Then £ u E N because N is a ncutrix, and N n �u C £ u because the length of N is £ . Hence N n �u = £ u = Lu . Let v be unitary such that u _l v. Then N Lu EB W v by theorem 7.3.8.0 The decomposition theorem is also easily proved for subneutriccs of a given neutrix with length less than the length of this neutrix. Indeed, we have the following definition and proposition.

Proof.

=

Definition 7. 3 . 1 0 Let N C �2 be a neutrix with M C � be a neutrix with W C M C L . We define

length L and width W. Let

NM = {n E N l ll n ll E M } . Clearly NM is a neutrix with length M. Its length is realized in any direc tion for which N contains a vector u with Tu ;:::: M. Then the next proposition is a direct consequence of theorem 7.3.8. Proposition 7.3. 1 1 Let N C �2 be a neutrix with length L and width W . Let A1 C � be a neutrix with W C lvl C L . Assume there is a unitary vector u such that Tu ;:::: M . Let v be a u.nitary vector v such that ( u, v) is orthonormal.

Then NM

=

Mu EB Wv .

Let N C �2 be a neutrix. We call N lengthy if it is not square, and if its length is not of the form L = s£ for some E �� > 0.

Definition 7. 3 . 1 2

c

c

So the two-dimensional decomposition theorem will follow once we have proved that lengthy neutrices assume there length. Note that the proof of proposition 7.3.9 does not work, because for every A E L there exist n E N such that II n il = cj:J · A. In order to prove that a lengthy neutrix assumes its length too, we work "from outside in". We consider lines with unitary directions u such that Tu < L, with u in an appropriate segment of the unit circle, in such a way that all lines "leave to the right". We divide the segment into two (external) classes: directory vectors for lines which are "leaving upside" and directory vectors for lines which arc "leaving downside". It follows from the completion argument mentioned earlier that the two classes do not entirely fill

7.3. Proof of the decomposition theorem

101

up the segment, just as two disjoint open intervals within some closed interval omit at least one point. The unitary vectors left out are then exactly those directions v such that Tv = L. We introduce an appropriate scaling and orientation for lengthy neutrices, and make also precise what we understand by "leaving to the right upside" and "leaving to the right downside".

Let N C �2 be a lengthy neutrix. We call N appropriately scaled if W � 0 and L � £. Proposition 7.3. 14 A lengthy neutrix N C �2 is homothetic to an appropri ately scaled neutrix.

Definition 7.3. 13

Let A, w E L \ W be positive such that Ajw ::: 0. Consider M = Njw. Its width is Wj w C W/A C 0 and its length is Lj w which contains at least some unlimited elements. If Wj A � 0 we are done. If Wj A 0 we have D Wjw � W/ A, so clearly Wjw � 0. Hence M is appropriately scaled. Proof.

=

Let N C �2 be a lengthy appropriately scaled neutrix with length L . We call N appropriately oriented if

Definition 7.3. 1 5

Let N C �2 be a lengthy appropriately scaled neutrix. There exists a rotation p of the plane such that p(N) is appropriately oriented.

Lemma 7.3 . 16

Let W � 0 be the width of N. Then by proposition 7.3.11 there are orthonormal vectors u and v such that N£ = £u EB W v. So let p be a rotation D such that p(u) = ( 6 ) . Then p(N)£ = £ ( 6 ) EB W ( � ) .

Proof.

Let N C �2 be a lengthy appropriately scaled and oriented neutrix with length L and width W . Let A E L, A � 0. Then there exist orthonormal vectors u ::: ( 6 ) and v ::: ( � ) such that N£ .-\ = £Au EB W v.

Lemma 7. 3 . 1 7

I f A is limited, the property follows from definition 7.3. 15. I f A is unlimited, any element n E N with ll n ll = A is of the form n = ( 6� ) with � ::: 00 and E ::: 0. Take u = vl�c2 ( � ) and v = vl�c2 ( 7 ) . Then the proposition follows from lemma 7.3.16. D In the final part of this section we consider points and lines close to a lengthy appropriately scaled and oriented neutrix. Proof.

102

7.

Neutrices in more dimensions

Let N C �2 be a lengthy appropriately scaled and oriented neutrix with and length L and width W . Let q E �2 be such that llqll E L. Then q is said to be infinitely close to N if q -::::: n for some element n E N. Let u -::::: ( 6 ) , v -::::: ( ? ) be orthonormal vectors such that N£>.. = £Au EB W v for some unlimited A with A � ll qll . Then q E,u + T)V with T) -::::: 0. It is called a lower point if T) < W and an upper point if T) > W. Let x be a unitary vector, with x -::::: ( 6 ) . We say that the nearly parallel line �x is downward if it contains an infinitely close lower point, and upward if it contains an infinitely close upper point.

Definition 7.3.18

=

As an example, consider N w0 x �£. w The line containing the vector l( fw ) is nearly parallel upward, and the line containing the vector ( -l;w ) is =

nearly parallel downward. By convexity, a nearly parallel line cannot be both downward and upward with respect to a neutrix N. By the next proposition, if its intersection with N is not maximal, it should be either one.

Let N C �2 be a lengthy appropriately scaled and ori ented neutrix with length L. Let x -::::: ( 6 ) be a unitary vector. Assume that Tx < L. Then the nearly parallel line x is either downward or upward with respect to N. Proposition 7.3. 19

Proof. Let W be the width of N. Let A be unlimited such that Tx < A < L. By proposition 7.3.9 there exist orthonormal vectors u -::::: ( 6 ) and v -::::: ( ? ) such that N£l = £A EB Wv. By continuity x n £A EB 0 v contains some point y �u + T)V with � E L + and ITJI > W. If T) < W the line x is downward, and T) > W the line x is upward. D =

Let N c �2 be a lengthy appropriately scaled and oriented neutrix with length L and width W. Let tf. N be an infinitely close lower point and y tf. N be an infinitely close upper point, with llxll = II Y II · Let u -::::: ( 6 ) and v -::::: ( ? ) be orthonormal vectors such that u _l xy. If for some unlimited A with IAI � ll x ll it holds that Nn = £Au 8 Wv, the points x and y are called opposite with respect to N. Also, the lines �x and �y are called opposite with respect to N.

Definition 7.3.20

x

The final proposition of this section states that opposite lines generate the same thicknesses.

Let N C �2 be a lengthy appropriately scaled and ori ented neutrix. Let a, b -::::: ( 6 ) be unitary such that �a is a neaTly parallel downward line, and �b an opposite nearly parallel upward line. Then Ta Tb .

Proposition 7.3.21

=

7.3. Proof of the decomposition theorem

103

Proof. Let L be the length of N and W be its width. Let x E �a be an infinitely close lower point and y be its opposite infinitely close upper point on �b. Let A E L + , 1 >- 1 ::::> llxll be such that N£>.. = £>-u EB W where u ( 6 ) and v -::::: ( � ) are orthonormal vectors such that u _l xy. Let a > 0 be such that Ta < a < llxll . Then aa t/. N, so there exist � ::; ll x ll , T) > W such that aa = � u T)V . Because N£>.. = £>-u EB W it holds that ab = � u + T)V tf. N, so a t/. Tb. Hence Tb C Ta . In a symmetric manner we prove that Ta C Tb . We conclude that Ta Tb . D v,

-:::::

v,

-

=

On the division of neutrices

7.3.2

Let N C �2 be a neutrix with width W and length L. We assume that N is of the form N L ( 6 ) EB W ( � ) . Consider all lines �x with x of the form

x = (�), y E R

=

The following sets appear to be of interest:

{ y I � ( � ) n L ( 6 ) EB we ( � ) 0 } S = { y l �( � ) n Le ( 6 ) EB W ( � ) # 0 }

1. R 2.

=

=

(L ¥ �) .

Clearly, i f y E R or y E S the neutrix realizes its length in the direction x, i.e. one has Tx = L. If y E S. the line � ( � ) leaves N on its "small" side. On the other hand, if y E Re, the line �( � ) leaves N on its "large" side. But N may have "corners", i.e. , there may exist lines �x which after leaving N enter into the set Le ( 6 ) EB we ( � ) . Then S � R. An example of such a neutrix is N = £( 6 ) EB c:£ ( � ) , with 0, > 0. An example of a neutrix without such "corners" is given by N = £ ( 6 ) EB 0 ( n , see also [5] . If W ¥ L, the sets R and S are neutrices, in fact they result from an algebraic operation applied to W and L. The first one is the well-known division operator on ideals or modules, commonly written :" [36] . We recall here its definition, in the context of neutrices. c -:::::

c

"

Definition 7.3.22

Let M, N C � two neutrices. We write M:N

=

{ x E � I (Vn E N) ( nx E M) } .

One can also abbreviate by

M : N = {X E � I NX c M)} . Since N( £x) = (£N)x = Nx C M, the set M : N is a neutrix. Notice that

N(M : N)

c

M

104

7. Neutrices in more dimensions

and that lvl : N is the maximal set X which satisfies the property N · X C M.

(7. 1)

A s such, we call M : N the solution o f the equation (7. 1 ) . The second one is also a sort of division, that we note M/N . This division is based on a sort of inverse. Its definition needs more knowledge on neutrices, and will be postponed. The study of divisions is highly related, but distinct from earlier work on the division of ncutriccs by Koudjcti [29] (sec also [30] ) , mainly in the sense that our definitions are of algebraic or analytic nature, instead of set-theoretic. We usc many of his tools and results. sometimes in a slightly modified form. Following Koudjeti, the argumentation becomes more simple and intrinsic, if instead of multiplications of neutrices we study additions of lower halftines. The transformation from neutrices N to lower halftines is done by the symmetrical logarithm log8 (N) = log(N + \ {0} ) ; formally we define log8 {0} = 0 . We transform halftines G back by the symmetrical exponential exp8( G ) = [- exp( G ), exp( G ) ] ; formally we define exp8 0 = {0}. Below we recall some fundamental properties of halftincs and ncutriccs. A neutrix I is idempotent if I · I = I. A lower halftine H is idempotent if H + H = H. A lower halftinc is idempotent if and only if it is of the form ( oc N) or ( -oo, N] , where N is a neutrix. A lower halftine H is idempotent if and only if exp8 H is an idempotent neutrix. A neutrix N is idempotent if and only if log8 N is an idempotent lower halftine. Let c; ::: 0, c; > 0. Examples of idempotent neutrices are, in increasing order {0} ' £ e - @/c ' £ c; cfJ ' 0 ' £ ' £e ( l/c) ©J and � . The neutrices £ · e - ( l/c) 2 - @ /c ' ( 1 /c: ) 1/c . £ . c;ciJ , ( 1 / c: ) · 0 , c:£ and £er ( l/c ) + ( l/c) @ are not idempotent . They are all of the form a I where I is an idempotent neutrix, and a is a real number. In fact every neutrix can be written is this form. This is a consequence of the following classification theorem of lower half-lines, which with successive generalizations has been proved in [9, 3, 5] : -

,

·

·

·

Theorem 7.3.23 Every lower half-line H C � has a representation either of the form H = ( - oo , r + N) or H = ( - oo , r + N] , where N is a neutri:r:, which is unique, and r is a real number, determined up to the neutrix N . We see that a lower half-line H may b e written i n the form H = r + K , where K i s of the form ( - x , N ) o r ( - oo , N] , i.e. the lower halftine K is idempotent. Then exp8 H = e r exp8 K, and we obtained as a consequence that every neutrix is the product of a real number and an idempotent neutrix. With respect to the above theorem we recall some notation. Halftincs of the form ( - oo , r + N) may be called open, and halftines of the form ( - oo , r + N]

7.3. Proof of the decomposition theorem

105

closed. The external set r + N is called an external number; it has been shown [29] , [30] that many algebraic laws valid for the real number system continue to be valid for the external numbers. Certain analytic laws, too, on behalf of the above theorem. For instance, it may be justified to call the external number r + N the supremum sup H of H. We call H ( - oo , sup H] the closur-e of H, and H ( -oo. sup H) the interior of H. The pointwise addition of lower halflines H and K, satisfies =

=

sup H � sup K, or sup H = sup K and K is open (7.2) sup H � sup K, or sup H = sup K and K is closed.

Similar definitions and rules hold for upper halflines, working with infimums instead of supremums. Since lower halflines may be translated into idempo tent lower halflines, and neutrices may be rescaled to idempotent neutrices, in defining algebraic operations we may restrict ourselves to the idempotent cases. The extension to the general case is straightforward and will be briefly addressed to at the end of this section. We turn first to the problem of defining subtractions. The first opera tion --;-- will correspond to the division : , and the second operation . . , which will correspond to the division / , is defined through inverses. Definition 7.3.24

K by

Let H, K be idempotent lower halfiines. We define H -:- H -:-- K = {x I (Vk E K)(k + x E H) } .

Notice that K + (H --;-- K) C H and that we have a maximality property similar to the division :, i.e. H --;-- K may be called the (maximal) solution of the equation K + X H. =

Definition 7.3.25 Let H be an idempotent lower halfiine. We define the sym metrical inverse ( -H) s of H by

(-H) s =

{ HH

H open H closed

If H is open ("boundary to the left of zero") , its symmetric reciprocal is a idempotent halfline, which is closed ("boundary to the right of zero") ; in a sense the "distance" of the "boundaries" of H and (- H8) to zero is equal. So there is some geometric justification in calling the reciprocal symmetric. Formally it holds that ( -0) s = � and ( -�) s = 0 . Definition 7.3.26

Let H, K be idempotent lower halfiines. We define H . . K by H . . K = H + (-K) s ·

106

7.

Neutrices in more dimensions

Notice that H . . K is an idempotent lower halftine, too. Proposition 7.3.27

sup K. Then 1 . ( -H) s

Let H, K be lower halfiines. Let S = sup H and T =

= -H0 .

2.

H . . K = H - K0 .

3.

H ·· K �

{

S � T, or S = T and H is closed S � T, or S = T and H is open.

:

( - )s

The proofs are straightforward. using formula (7.2) in 3.

Let H, K be two idempotent lower- half-lmes of lRL Let S = sup H and T = sup K. Then

Proposition 7.3.28

2.

H : K··

Proof.

{

S � T, or S = T and K is open S � T, or S = T and K is closed.

( - K) s H

1. Let x E H -;- K. Suppose x E H0 - K. Then there exist y > H, k E K such that x = y - k. Thus x + k > H, so x tf. H --:-- K, a contradiction.

c,

c.

Then x E ( H0 - K) hence H --:-- K C ( H0 - K) Conversely, let x E (H0 - K) 0 . Suppose x tf. H --:-- K. Then there exists k E K, z E H0 such that k + x = z . Thus x E H0 - K, a contradiction. So x E (H0 - K) 0 and (H0 - K) 0 c H -;- K. We conclude that H --:-- K = (H0 - K ( . 2. Straightforward, from 1 and formula (7.2).

D

The following theorem is a direct consequence of propositions 7.3.27.3 and 7.3.28.2. Theorem 7 .3.29

Let H, K be two idempotent lower half-lines oflRL Whenever

H i= K, it holds that

but

H . . K = H --:-- K,

7.3. Proof of the decomposition theorem

107

So, generically, the subtraction .. , defined through reciprocals, and the sub traction -:-- , defined through solutions, yields the same result; thus the equation K + X = H can be solved through reciprocals. In the exceptional case of the subtraction of two identical halflines the outcomes are strongly interrelated, for H H is the interior of H --;- H, and H --;- H the closure of H H. We will now consider divisions. We start by defining the symmetric inverse. We relate this notion to the symmetric reciprocal. ··

· ·

Definition 7.3.30

Let N

C

inverse (N - 1 ) 8 is defined by

� be an idempotent neutrix. The symmetric

Notice that (N - 1 ) is an idempotent neutrix, for it is the exponential of an idempotent lower halfline. Below we calculate the symmetric inverse for some familiar neutrices. We have always c 0, c > 0. s

:::

{0}

£e - @/c £c: cfJ 0

0

�

( - oo , 0/c:) ( - oo , £ log c:) ( - oo , £) ( -oo, £] ( - oo , £/c:]

( - oo , 0/c:] ( -oo, £ log 1/c:] ( -oo, £] ( - oo , £) ( - oo , £jc:)

£ £e (l /c) Col � �

0

� £e 0/c

£ ( 1 /c:) @

£ 0

£e - cfJ/c

{0}

Table 7. 1 : Inverses of some idempotent neutrices. In [29] the symmetric inverse of a ( convex) set N is defined to be tfc U {0}. The next theorem states that, if N is a neutrix, the two definitions are equivalent. Proposition 7.3.31

Let N C � be a neutrix. Then ( N- 1 ) s

=

1

Nc

U

{0}.

Proof. The equality holds formally for N = {0} and N = R Let 0 � N � R We prove only that (N - 1 ) 8 C tfc U {0}. Clearly 0 E tfc U {0}. Now assume

108

7.

Neutrices in more dimensions

that x is nonzero element of (N - 1 ) s, that we may suppose to be positive by reasons of symmetry. Then a

log x E ( - logs N) s - (logs N)0 - (log ( I N I \ {0})0 - log ( � + \ N) log IR+\N . So x E 1 IN ° . Hence (N- 1 ) s C Jc U {0}. D We define the symmetric division through multiplication by the symmetric inverse, and relate it to the symmetric subtraction. Definition 7.3.32 Let lvf, N C � be two idempotent neutrices. The symmet ric division MIN of M and N is defined by

Proposition 7.3.33

Let M, N be idempotent neutrices. Then

We have, using the algebraic relations M = exps logs M and exps H exps K = exps(H + K) , MIN = M (N- 1 ) s Proof.

·

·

= exps logs M exps( - logs N) s = exps (logs M + ( - logs N) s ) = exps (logs M . . logs N) . ·

D

We refrain from giving a general formula for MIN and point out that it can be calculated with the help of the propositions 7.3.27 and 7.3.33. However in some special cases MIN may be readily calculated. 1. M ¥ N c 0 : MIN = M. Nllv1 = (Ar 1 ) s . 2. £ c M ¥ N : MIN = (N- 1 ) s , NIM = N. 3.

M c 0 : MIM

4. M ::) £ :

=

M.

MIM = (M - 1 ) s .

7.3. Proof of the decomposition theorem

109

Observe that always MjM C 0. We consider now the division M N, of definition 7.3.22. It bears the following relation to the subtraction -:-- : Proposition 7.3.34

Let M, N be idempotent neutrices. Then :

We prove only that M N C exp s (logs M -;- logs N) . Formally, one has 0 E exps (logs AI -:- logs N) . Let x E .M : N, x # 0. For reasons of symmetry, we may suppose that x > 0. Because xN C lvl one has log x + logs N C logs M . S o log x E logs lvl -;- logs N, hence x E exps(logs M -;- logs N) . We conclude that M : N C exps (logs M -:-- logs N) . D Proof.

As a consequence of theorem 7.3.29 and propositions 7.3.33 and 7.3.34 we obtain that M/N M N whenever M # N. If M N we have =

:

=

Mc0 M ::) £.

Notice that always M : M ::) £. Because Mjlvl C 0, we obtain MjM � M : M.

The division has the following set-theoretic characterization. Proposition 7.3.35

Let M, N C � be two neutrices. Then M:N=

(

Me

N \ {0}

)c

The proof is very similar to the proof of proposition 7. 3. 28 . 1 . D It follows from the results above that, analogously to the subtraction -:-- , the division fiv1 : N can in practice be calculated through inverses. Sec also the table and the special cases we presented earlier. We indicate briefly how subtractions of non-idempotent halftines can be reduced to subtractions of idempotentent halftines, and consider also the anal ogous reduction for divisions of neutrices. Let F, G be two lower halftines. By theorem 7.3.23 there exist real numbers f and g and idempotent halftines H and K such that Proof.

F=f+H

G = g + K.

110

7.

Neutrices in more dimensions

We define

F . . G = f - g + H .. K

F -:-- G = f - g + H -:-- K.

In the same manner, let AI, N be neutrices. Let potent neutrices such that

M = mi

m,

n E � and I, J be idem

N = nJ.

We define

MjN = rn n IjJ. As regards to the operation M : N, the relation m M:N=( I J) n :

(7.3)

readily follows from definition 7.3.22. It is a matter of straightforward verifi cation to show that the above formulae do not depend on the choice of f, g, m and n, and that, mutatis mutandis, the properties considered earlier in this section continue to hold. We state three useful properties of the division : . We recall that a neutrix N is linear if there exists c; > 0 such that N = 0c: or N = £c:, else N is nonlinear. Nonlinear neutrices have the property that N = w N for at least some w "'=' +oo ( see [5] ) . ·

Let AI, M1 , l'vi2 , N, N1 , N2 C � be neutrices. If l'vh c M2 , it holds that M1 : N c M2 : N. If N1 c N2 , it holds that M : N1 ::) M : N2 . If M � N, one has M : N = 0 if and only if there exists c > 0 such that M 0c and N £c:, otherwise M : N � 0 .

Proposition 7.3.36 1. 2. 3.

=

=

We only prove 3. Let c:, T) > 0. As regards to linear neutrices we have the following table

Proof.

0c: £c: 0T) £c:!TJ 0c/TJ £T) £TJ/c £T)jc So the only possibility for such neutrices !vi, N to obtain M : N 0, is when M = 0c and N = £T), with c/T) appreciable, i.e., when N = £c:. Else the condition AI � N ensures that c/TJ 0, respectively TJ/c 0, which implies that M : N � 0. =

"'='

"'='

7.3. Proof of the decomposition theorem

111

Assume M is nonlinear. Let w ::: +oo be such that

M:N

=

A1 -:N w

1 ( M : N)

= -

w

1

c -0

w

Mlw =

A1 .

Then

� 0.

The case that N is nonlinear is similar. This concludes the proof. D Finally we establish the relation between the two divisions and the families of directions in the plane R and S. Theorem 7.3.37 Let N c form N = Lu EB W v, where

�2 be a neutrix with width W and length L, of the u and v are orthonormal vectors. Let

and, if L ¥ �� let Then R = W : L and S = WIL . Without restriction of generality, we may assume that u ( 6 ) and v = ( � ) . First, let y E R. If y = 0, clearly, y E W : L. Assume y i= 0. Then there exist A E L, A f. 0 such that ( }y ) tt L ( 6 ) EB we ( � ) . so =

Proof.

(

)

AY we e = W : L. y=E L \ {0} A =

Hence R C W : L. Conversely, let y E W : L. If y 0, clearly y E R. Assume y f. 0. Suppose there is A E L such that ( .-\.-\y ) E L ( 6 ) EB we ( � ) . Then y E L\{�} , which means that y tt W : L, a contradiction. So y E R, hence W : L C R. We conclude that R W : L. Second, let y E S. Then there exists p, E Le such that ( /}y ) E Le ( 6 ) EB W ( � ) . So p,y E W and y E � = WI L. Hence S C WI L. Conversely, let y E WI L. Then there is p, E Le and T) E W such that y = '111 . So ( /}y ) E Le( b ) EB W 0 ) , which means that y E S. Hence WIL c S. We conclude that S = WI L. D =

7.3.3

Proof of the decomposition theorem

Let us consider a neutrix in �2 . We know already that, if it is square or not lengthy, it can be decomposed into two neutrices of R For lengthy neutrices, we sketch here the remaining part of the proof of the decomposition theorem.

112

7.

Neutrices in more dimensions

A lengthy neutrix may be assumed to be appropriately scaled and oriented. By theorem 7.3.8 it suffices to look for a direction in the plane with maximal thickness. The set of directions with maximal thickness will be obtained as the complement of the directions with nonmaxirnal thickness, where by the sector-theorem we may restrain ourselves to directions nearly parallel to the ncutrix. By proposition 7. 3 . 1 9 and 7.3.21 they arc divided into two "equal" opposite parts. The set of directions with maximal thicknesses will then be the supremum, in the sense of theorem 7.3.23, of the set of downward nearly parallel directions, or alternatively, the infimum of the set of upward nearly parallel directions. In fact, the "gap" between the two opposite families of directions is of the form ( x +�:L ) , where W is the width of the neutrix, and L its length. We turn now to the proper proof, and start with some terminology. Definition 7.3.38

neu.trix. We write D U SD Su For y ::: 0 we write

Let N C �2 be a lengthy appropriately scaled and oriented :::

{ y 0 I ( � ) is nearly parallel downward } { y ::: 0 I ( � ) is nearly parallel upward } { x E �I D + x = D } { x E �1 U + x = U } .

and we define I=

n Iy.

yeo: O

Let N C �2 be a lengthy appropriately scaled and oriented neutrix with length L and width W . Then

Theorem 7.3 .39

I = W : L � 0.

It follows from proposition 7.3.36.3 that W : L � 0. Let x E I. Suppose X tt w : L. By proposition 7. 3.35 there are T) E w e and A E L such that x = TJ/ A. By proposition 7.3.9 there arc orthonormal vectors u ::: ( 6 ) and v ::: ( � ) such that N£>.. = £Au EB W v; up to a rotation we may assume that u ( 6 ) and v ( � ) . So ( � ) tf. N . Hence TG ) c T( )' / >. ) ¥ £A c T( 6 ) . Hence x t/. Iu ::) I, a contradiction. This implies that x E W : L. which means that I c W : L. Conversely, let x E W : L. Let y ::: 0. By proposition 7. 3 . 1 1 there are orthonormal vectors u , v such that Nr(t) = T ( � ) u EB Wv. It follows from Proof.

=

=

7.3. Proof of the decomposition theorem

113

theorems 7.3.37 and 7.3.8 that T( y �x) T( t ) for all x E W : T( ; ) ::) W : L, so x E Iy . Because y is arbitrary, it holds that x E I. Hence W : L C I. We conclude that I = W : L. D =

Let N C �2 be a lengthy appropriately scaled and oriented neutrix with length L and width W. Then 1 . For all y, z E D such that y ::; z one has T(�) ::; TG) .

Theorem 7.3 .40

2.

For all y, z E U such that y ::; z one has TG ) z

:2

TG) .

For- all c > W : L ther-e is y E D, E U such that z - y ::; c. 4. SD = Su = W : L. 5. There exists x ::: 0 such that D = [0, x+ W : L ) and U = (x + W : L, 0] . 3.

Proof.

1 . Suppose T(t) > TG) . By proposition 7.3.21 there is x E U such that T( ! ) = T( � ) . So y < z < x, y ::: x , while T( � ) > T( ! ) < T( ; ) . This contradicts the sector-theorem. Hence TG ) ::; TG) .

2. Analogous to 1 . 3 . Because W : L is a neutrix, one has c/2 > W : L. By proposition 7.3.35 there exist A E L, T) > W such that TJ/ A ::; c/2. Let v be orthonormal such that N£>.. £Au EB W . We see that AU + T)V, AU - T)V tf. N, so - TJ/A E D and TJ/A E U, while TJ/A - ( - TJ/A) ::; c. 4. From 3 we derive that SD C W : L. Conversely, let y E D. Then it follows from theorem 7.3.39 that T( y+W1 ) = T(1) y < L. This implies that y + x E D for all x E W : L. Hence W : L C SD . We conclude that SD = W : L. The proof that Su = W : L is analogous. 5. By theorem 7.3.23 and 4 the set D is either of the form D = [0 , x+W : L ) or D [0, x + W : L ] . We show that the second possibility is absurd. Then the only way to satisfy 3 is when U (x + W : L, 0] . By 7.3.39 and 1 one has T( � ) ::; T( ! ) for all y E D. Then also TG) ::; T(�) for all z E U. by proposition 7. 3.21. Because D U U = 0, one has TC ) ::; TG) for all y ::: 0. By theorem 7.3.6 all other thicknesses are equal to W. Hence Tr ::; T(;) < L for all unitary vectors r. Then L cannot be the length of N, a contradiction. Hence D = [0 , x + W : L ) . The proof that U (x + W : L, 0] is analogous. D =

11,

v

L

=

=

=

114

7.

Neutrices in more dimensions

Let N C �2 be a lengthy appropriately scaled and oriented neutrix with length and width W. Then there exists x � 0 such that y I T( �) = = X + w

Theorem 7.3.41

L}

{

L

L.

:

By theorem 7.3.40.5 there exists x � 0 such that 0 \ (D U U) = By theorem 7.3.39 we have T( �) = TG ) for all y E x + W x+W Suppose TG ) < By proposition 7.3. 19 either x E D or x E U, a contradic tion. Hence T( ! ) ::::> in fact T(;) for all y E x + W D

Proof.

:

L.

L.

:

L,

L

=

Theorem 7.3 .42 ( Two-dimensional

a neutrix. Then there are neutrices vectors u, v such that N

=

:

L.

�2 be C �� and orthonormal

decomposition theorem) Let N

L, W with W C L Lu Wv .

L.

C

EB

L is the length of N, and W is its width. Proof. Let L be the length of N and W its width. If L W one has N = L( 6) W ( � ) . If L = £>. for some A E �' the theorem follows from Moreover, the neutrix

=

EB

proposition 7.3.9. In the remaining cases we may assume that N is appropri ately scaled and oriented. By theorem 7.3.41 there exist a unitary vector u such that Tu Let v be unitary such that u, v are orthonormal. By the orem 7.3.8 we have N = u EB Wv . The last part of the theorem also follows from theorem 7.3.8. D =

L.

L

References

[1]

D. BALLARD, Foundational aspects of "non" standard mathematics, Con temporary Mathematics 176, American Mathematical Society, Provi dence, RI, 1994.

[2]

.J .-L. CALLOT, F. D IENER, and M . DIENER, " Chasse au Canard", Collectanea Mathematica, 31 1-3 (1983) 37- 119. E. BENOIT,

[3] I . P .

110

VAN DEN BERG,

(1983) 193-208.

"Un principe de permanence general", Asterisque,

[4] I . P .

VAN DEN BERG, "Un point-de-vue nonstandard sur les dcvcloppe ments en serie de Taylor", Asterisque, 1 1 0 ( 1983) 209-223.

[5] I . P .

VAN DEN BERG,

Nonstandard Asymptotic Analysis, Lecture Notes in

Mathematics 1249 , Springer-Verlag, 1987.

References

115

[ 6] I. P.

VAN DEN BERG, An external probability order theorem with some applications, in F. and M. Diener eds., Nonstandard Analysis in Practice, Universitext, Springer-Verlag, 1995 .

[7] I . P .

VAN DEN BERG , An external utility function, with an application to mathematical finance, in H. Bacelar-Nicolau, ed., Proceedings 32nd European Conference on Mathematical Psychology, National Institute of Statistics INE. Portugal (2001).

[8] I.P. VAN DEN BERG , A decomposition theorem for neutrices, IWI preprint 2004-2-01 , Univ. of Groningen, submitted. [9] I . P .

VAN DEN BERG

293

( 1981) 385-388.

and M . DIENER, "Diverses applications du lemme de Robinson en analyse nonstandard", C. R. Acad. Sc. Paris, Scr. I, Math.,

[10] A .

BOHE, " Free layers in a singularly perturbed boundary value problem", SIAM J. Math. Anal., 21 ( 1990) 1264- 1280.

[11] A.

BoHE, "The existence of supersensitive boundary-value problems", Methods Appl. Anal., 3 ( 1996) 318-334.

[12] A .

"The shock location for a class of sensitive boundary value problems", J. Math. Anal. Appl., 235 ( 1999) 295-314. B oHE,

[13] J.

BoSGIRAUD, "Exemple de test statistique non standard", Ann. Math. Blaise Pascal, 4 ( 1997) 9-13.

[14]

J BoSGIRAUD, "Exernple de test statistique non standard, risques ex ternes", Publ. Inst. Statist. Univ. Paris, 41 (1997) 85-95. .

.

[15] J.

BOSGIRAUD, "Tests statistiques non standard sur des proportions", Ann. I.S.U.P. , 44 (2000) 3- 12. BoSGIRAUD, "Tests statistiques non standard pour les modeles a rap port de vraisemblance monotone I", Ann. I.S.U.P., 44 (2000) 87-101.

[16] J.

"Tests statistiques non standard pour les modeles a rap port de vraisemblance monotone II", Ann. I.S .U.P., 45 (2001) 61-78.

[17] J . [18]

B oSGIRAUD,

.J . BoSGIRAUD,

volume.

[19] N . G. [20]

Nonstandard likely ratio test in exponential families, this

DE B RUIJN,

Asymptotic Analysis, North Holland, 1961.

Neutrix calculus, neutrices and distributions, MRC Tecnical Summary Report, University of Wisconsin ( 1960) .

.J . G . VAN D E R CORPUT,

116

7.

Neutrices in more dimensions

DIENER, "Sauts des solutions des equations EX = f(t, x, x)", SIAM J. Math. Anal., 17 ( 1986) 533-559.

[21] F . [22] F.

"Equations surquadratiques et disparition des sauts", SIAM J. Math. AnaL 19 (1988) 1 127-1 134. DIENER,

DIENER . M . DIENER ( cds.), Nonstandard Analysis in Practice, Uni versitext, Springer-Verlag, 1995.

[23] F.

[24] F.

DIENER

and G. REEB, Analyse Non Standard, Hermann, Paris, 1989.

[25] V.

KANOVEI and l\1 . REEKEN, "Internal approach to external sets and universes I. Bounded set theory", Studia Logica, 55 ( 1995) 229-257.

[26] V.

KANOVEI and l\1. REEKEN, "Internal approach to external sets and universes II. External universes over the universe of bounded set theory", Studia Logica. 55 (1995) 347- 376.

[27] T.

KAWAI, An axiom system for nonstandard set theory, Rep. Fac. Sci. Kagoshima Univ., 12 (1979 ) 37-42.

[28] R. KEEFE, Theories of Vagueness, Cambridge University Press, 2000. [29] F. KouDJETI, Elements of External Calculus with an aplication to Math ematical Finance. thesis, Labyrinth publications, Capelle ajd I.Jssel, The Netherlands, 1995. [30] F.

KouDJETI and I. P. VAN DEN BERG, N eutrices, external numbers and external calculus, in F. and M. Diener eds . . Nonstandard Analysis in Prac tice, Universitext, Springer-Verlag, 1995.

[31] R. LUTZ and M. G ozE , Nonstandard Analysis: a practical guide with applications, Lecture Notes in Mathematics 881, Springer-Verlag, 1981. [32] E. NELSON, "Internal Set Theory, an axiomatric approach to nonstandard analysis", Bull. Am. Math. Soc., 83 (1977 ) 1 165 - 1 198. [33] E. NELSON, "The syntax of nonstandard analysis", Ann. Pure Appl. Logic, 38 (1988) 123- 134. [34] .J . P. REVEILLES, " Une definition externe de la dimension topologique", C. R. Acad. Sci. Paris, Ser. I, l\Iath. , 299 ( 1984) 707- 710. [35] R . M . SAINSBURY, Paradoxes, Cambridge University Press, 2nd ed. , 1995. [36] B . L . VAN DER WAERDEN, Moderne Algebra 11, Springer-Verlag, 1931.

Nonstandard metho ds for additive and combinatorial number theory. A survey Renling Jin *

8.1

The beginning

In this article my research on the subject described in the title is summa rized. I am not. the only person who has worked on this subject. For example, several interesting articles by Steve Leth [21 , 22, 23] were published around 1988. I would like to apologize to the reader that no efforts have been made by the author to include other people's Tesearch. My research on nonstandard analysis started when I was a graduate student in the University of ·wisconsin. A large part of my thesis was devoted towards solving the problems posed in [1 9] . By the time when my thesis was finished, many of the problems had been solved. However, some of them were still open including [19, Problem 9 .1 3 [ . It took me another three years to find a solution to [ 19, Problem 9. 13] . Before this my research on nonstandard analysis was mainly focused on foundational issues conceming the structures of nonstandaJ:d universes. After I told Steve about my solution to [ 1 9 , Problem 9 . 1 3] , he immediately informed me how it could be applied to obtain interesting results in combinatorial number theory. This opened a stargate in front of me and lead me into a new and interesting field. For nonstandard analysis we use a superstructme approach. We fix an �1saturated nonstandard universe * V . For each standard set A we write *A for t.he nonstandard version of A in *V. *

Department of JV[athematics, College of Charleston, Charleston, SC 2942<1.

[email protected]

120

8.2

8. Additive and combinatorial number theory Duality between null ideal and meager ideal

Given an ordered measure space n such as the Lebesgue measure space on the real line with the natural order, one can discuss the relationship between measurable sets and open sets 1 . The null ideal on n is the collection of null sets. i.e. the sets with measure zero, and the meager ideal is the collection of all meager sets 2 . The sets in an ideal are often considered to be small. The duality between null ideal and meager ideal means that there exists a meager set with positive measure, i.e. the smallness in terms of null ideal is incomparable with the smallness in terms of meager ideal. However, it is usually true that if the space also has an additive structure, then the sum of two set with positive measure may not be meager. See Corollary 8.2.2 for example. What can we say about a Loeb space? Let H be a hyperfinite integer and let [0, H] be an interval of integers. The term [a, b] in this article always means the interval of integers between a and b including a and b if they are also integers. On [0 , H] one can construct a Loeb measure generated by the normalized counting measure. By a Loeb space we always mean the hyperfinite set [0, H] with the Loeb measure generated by the normalized counting measure3 . On [0, H] there is also a natural order and an additive structure. However , the order topology on [0, H] is discrete, therefore uninteresting. In [19] a U-topology is introduced for each cut U <:;-; [0, H] that gives meaningful analogy of the order topology on the real line. An infinite initial segment U of non-negative integers is called a cut if it is closed under addition. For example N, the set of all standard non-negative integers, is the smallest cut. We often write x > U for a positive integer x and a cut U if x tt U. Let U <:;-; [0, H] be a cut. A set A <:;-; [0, H] is called U-open if for every x E A , there exists a positive integer y > U such that [x - y, x + y] n [0, H] <:;-; A . A U-topology is the collection of all U-open sets and a U-meager set is a meager set in terms of U-topology. In [19] it was proven that for any cut U <:;-; [0, H] there is always a U-meager set of Loeb measure one in [0, H] . The question 9.13 in [19] asked whether the sum (modulo H + 1) of two sets in [0, H] with positive Loeb measure can be U-meager for some cut U <:;-; [0, H] . For two sets A and B and a binary operation o between A and B, we write A o B for the set {a o b : a E A and b E B}. For a number k, we write kA for the set { ka : a E A} . In [ 1 1] we prove the following theorem.

1 A n order on a space can generate a topology called order topology on the space so that

a set is open if it is the union of open intervals.

2 A set A in a topological space is nowhere dense if every non-empty open set 0 contains

another non-empty open set R disjoint from A. A meager set is the union of finitely many or countably many nowhere dense sets. A meager set is also called a set of the first category. 3 The Loeb space here is often called the hyperfinite uniform Loeb space.

8.3.

121

Buy-one-get-one-free scheme

Theorem 8.2.1 Let H be a hyperfinite integer and U <:;-; [0, H] A, B <:;-; [0 . H] are two internal sets with positive Loeb measure,

be any cut. If then A 8 H B is not U -nowhere dense, where EB H is the usual addition modulo H + 1 .

Note that Theorem 8.2. 1 yields a negative answer to [19, Problem 9.13] . Also note that the theorem is still true if EB H is replaced by the usual addition + and the sumset A + B is considered to live in [0, 2H] . Theorem 8.2. 1 has several corollaries in the standard world. If one lets U be the cut nn EN [O, �], then Theorem 8.2.1 implies the following well known non-trivial fact. Corollary 8.2.2 If A and B are two sets of reals with positive Lebesgue mea sure, then A + B must contain a non-empty open interval of reals. Corollary 8.2.2 was credited to Steinhaus in [21] . Let A <:;-; N be infinite. The upper Banach density BD ( A ) of A is defined by BD ( A ) = lim sup

k-+ = nEN

l A n [n , n + kJ I _ k+1

A set C <:;-; N is called piecewise syndetic if there is a positive integer k such that C + [0, k] contains arbitrarily long sequence of consecutive numbers. The definition of upper Banach density, syndeticity, and piecewise syndeticity can be found in [1, 7] . If one let U = N, then Theorem 8.2.1 implies the follow ing result . Corollary 8.2.3 Let A, B is piecewise syndetic.

<:;-;

N. If BD ( A )

>

0 and BD ( B ) > 0, then A + B

Corollary 8.2.3 was pointed out to me by Steve Leth. By choosing other cuts U one can have more corollaries. These corollaries also have their own corollaries. The reader can find more of them in [11, 18] . Corollary 8.2.2 and Corollary 8.2.3 can also be proven using standard meth ods [13] . However, Theorem 8.2.1 does not have a standard version. The gen erality of Theorem 8.2. 1 shows the advantages of the nonstandard methods. Theorem 8.2.1 reveals a universal phenomenon, which says that if two sets are large in terms of "measure", then A + B must not be small in terms of "order-topology".

8.3

Buy-one-get-one-free scheme

Excited by the results such as Corollary 8.2.3, I was eager to let people know what I had obtained. After a talk I gave at a meeting in 1997, I was informed

8.

122

Additive and combinatorial number theory

by a member in the audience that Corollary 8.2.3 had probably already been proven in [1] or in [7] . This made me rush to the library to check out the book and the paper; I was anxious to see whether my efforts were a waste of time. Fortunately, they weren't; in fact, Corollary 8.2.3 complemented a theorem in [7] which says that if a set A � N has positive upper Banach density, then A - A is syndctic. From [1, 7] I also learned of terms such as upper- Banach density, syndeticity, piecewise syndeticity, etc. the first time. One thing which caught my eye when I read [1, 7] was the usc of Birkhoff Ergodic Theorem. It is natural for a nonstandard analyst to think what one can achieve if Birkhoff Ergodic Theorem is applied to some problems in a Loeb measure space setting. With that in mind, I derived Theorem 8.3.1 and Theorem 8.3.2 as lemmas in [12] . Given a set A � N, the lower asymptotic density g(A) of A is defined by

l A n [1, n ] l g(A) = lim n ----+inf oc n

'

where l X I means the cardinality of X when X is finite. Later, we will use l X I representing the internal cardinality of X when X is a hyperfinite set . For a set A and a number x we often write A ± x for A ± { x} and write x ± A for {x} ± A.

Suppose A � N with BD(A) = a . Then ther-e is an inter-val of hyperfinite length [H, K] such that for almost all x E [H, K] in terms of the Loeb measure on [H, K] , we have 4((*A - x) n N) a. On the other hand, if A � N and there is a positive integer x such that 4( (*A - x) n N) ? a , then BD(A) ? a .

Theorem 8.3 . 1

=

Given a set

A � N, the Shnirel'man density a-(A) o f A i s defined by a-( A)

= ninf ;?l

l A n [1, n] l _ n

Suppose A � N with BD(A) integer x such that J((*A - x) n N) = a .

Theorem 8.3.2

=

a.

Then there is a positive

Theorem 8.3.1 is [12, Lemma 2] and Theorem 8.3.2 is the combination of [12, Lemma 3, Lemma 4 and Lemma 5] . It is often the case that a result involving Shnirel'man density is obtained first. Then people explore possi ble generalizations to some results involving lower asymptotic density. The behaviors of these two densities are quite similar. From Theorem 8.3.1 and Theorem 8.3.2 we can sec that the behavior of upper Banach density is also similar to the behavior of lower asymptotic density or Shnirel'man density. We

8.3.

1 23

Buy-one-get-one-free scheme

can now claim that there is a theorem involving upper Banach density parallel to each existing theorem involving lower asymptotic density or Shnirel'man density. This is the scheme that can be called buy-one-get-one-free because we can get a parallel theorem involving upper Banach density for free as soon as a theorem involving lower asymptotic density or Shnirel'man density is obtained. I would now like to briefly describe how this works. Given a set A � N with BD(A) = a , there is a positive integer x (may be nonstandard) such that d( (*A - x ) n N) = a (or a-( (*A - x ) n N) = a ) . This means that in x + N, a copy of N above x, the set *A has lower asymptotic density (or Shnirel'man density) a. Now apply the existing theorem involving g (or O" ) to the set *An ( x + N) to obtain a result about *A. Finally, pushing down the result to the standard world, one can obtain a parallel theorem involving upper Banach dmsity. Corollary 8.3.3 and Corollary 8.3.4 below arc the results obtained using this scheme. The first one is a corollary parallel to Mann's Theorem. Mann's Theorem says that if two sets A, B � N both contain 0 , then J(A + B) ;:?- min { a-( A) + a-( B), 1 } . Mann's Theorem is an important theorem; in [20] it is referred as one of three pearls in number theory. We can now easily prove the following corollary of Theorem 8.3.2. ,

Corollary 8.3.3 BD(A + B + { 0 I}) A, B � N.

?

min{BD(A)

+

BD(B), 1} for all

The addition o f { 0 , 1 } , which substitutes the condition 0 E AnB i n Mann ' s Theorem, is necessary because without it, Corollary 8.3.3 is no longer true. For example if A and B both are the set of all even numbers, then BD(A) =

BD(B) = BD(A + B) = �-

The second corollary is parallel to Plunnecke's Theorem [24, p. 225] which says that if B � N is a basis of order h. then a-(A + B) ? J(A)1-± for every set A � N. A set B � N is called a basis of order h if every non-negative integer n is the sum of at most h non-negative integers (repetition is allowed) from B. In the upper Banach density setting we can define piecewise basis of order h. A set B � N is called a piecewise basis of order h if there is a sequence of intervals [a k , bk ] � N with limk-Hxl(b k - a k ) = oo such that every integer n E [0, bk - a k ] is the sum of at most h integers (repetition allowed) from (B - a k ) n [0, bk - a k ] . If B is a basis of order h, then it must also be a piecewise basis of order h because one can take bk = k and a k = 0.

Corollary 8.3.4 If B is a piecewise basis of order h, then BD(A + B) BD(A)1-i for every set A � N.

?

124

8. Additive and combinatorial number theory

Corollary 8.3.3 and Corollary 8.3.4 can also be proven using the standard methods [13] from Ergodic Theory. For more results similar to Corollary 8.3.3 and Corollary 8.3.4, see [12] .

8.4

From Kneser to Banach

In the last section we didn't use the full power of Theorem 8.3. 1. Suppose A � N has upper Banach density a . We use only one x such that 4( (*A x ) n N) = a while there are almost all x in an interval [H, K] of hyperfinite length such that 4( (*A - x) n N) = a. We can take this advantage and prove a theorem involving upper Banach density parallel to Kneser's Theorem [9] . For two sets A, B � N we write A B if (A "'- B) U (B "'- A) is a finite set. Kneser's Theorem4 says that for all A, B � N, if 4(A + B) < 4(A) + 4(B), then there is a positive integer g and a set G � [0, g - 1] such that A + B � G + gN, A + B G + gN, and 4(A + B) = ?: 4(A) + 4(B) rv

1 �1

rv

�-

Kneser's Theorem was motivated by Mann's Theorem. Can l\Iann's The orem be true if one replaces Shnirel'man density with lower asymptotic den sity? There are obvious counterexamples. Let d, k, and k' be positive integers. Suppose G = {O, d, 2d, . . . , (k - 1)d} and G' = {O, d, 2d, . . . , (k' - 1 )d} . Sup pose also that g > (k + k' - 2)d, A = G + gN, and B = G' + gN. Then 4(A + B) = Roughly speaking, Kncscr's Theorem = 4(A) + 4(B) says that the only kind of counterexamples which make the inequality false in Mann's Theorem with J replaced by 4 arc similar to the one just described. In [13] a parallel theorem [13, Theorem 3.8] was obtained. Let A, B � N with BD (A) = a and BD (B) = (3. Then there are intervals [a n , bn ] and [en , dn] " n---> oc " n-HXl (bn - a n ) " n--->CXl (d n - Cn ) sue·h th at l lm - oo, l lm - oo , l lm

k+;-l

�-

n[aan,n bnll l -IAbn+

��_[c�!n}l

and limn___, oc = (3. We hope to characterize the structure of A + B inside the intervals [an + en , bn + dn] . However, [13, Theorem 3.8] when restricted to the addition of two sets, only characterized the structure of A + B on a very small part of N. The reason for this is because we used only one x and one y with 4((*A - x ) n N) = a and 4((*B - y) n N) = (3 in the proof. During the summer of 2003 my undergraduate research partner Prerna Bi hani and I conducted an undergraduate research project funded by the College of Charleston to work on theorems parallel to Kneser's Theorem. The work done during the summer and the following year produced the paper [2] , which contains Theorem 8.4. 1 . To avoid some technical difficulties we considered only the sum of two copies of the same set in [2] . a,

4

The version of Kncser's Theorem in

[9]

is about the addition of multiple sets. We stated

the version here for the addition of two sets just for simplicity.

8.5.

Inverse problem for upper asymptotic density

1 25

Theorem 8.4. 1 Let A be a set of non-negative integers such that BD (A) = a and BD (A + A) < 2 a . Let {[an, bn ] : n E N} be a sequence of intervals such . ,b, ] l �T " that llmn_, = (b n - an ) -- 00 an d l lmn_, = IAn[a, bn - an+ l - a . Th en th ere are g E n , G c;;;; [0, g - 1], and [en , dn ] c;;;; [an , bn ] for each n E N such that . (1) llmn-> CXJ dn-Cn bn - a n - 1 _

'

(2) A + A c;;;; G + gN, (3) (A + A) n

[2en , 2dn ]

(4) BD (A + A) = JQl g

= (G + gN) n

[2cn, 2dn ] for all n E N,

a

? 2 - lg .

Nate that ( 1) above shows that the structure of A + A is characterized on a large portion of [2a n , 2bn ] . Note also that we cannot replace [2cn , 2dn] with [2an , 2bn ] in (3) because all conditions for A still hold if we delete any elements from A n ( [an, bn ] "'- [en, dn ] ) . The proof of Theorem 8.4. 1 can b e described i n several steps. Given a hyperfinite integer N. we know that for almost all x, y E [a N , b N ] we have g((*A -x) nN) = g( (*A - y) nN) = a . We can also assume that g((x - *A) nN) = g((y - *A) n N) = a . Step one: characterize the structure of *(A + A) in x + y + Z using Kneser's Theorem, where Z is the set of all standard integers. Step two: show that the structures of *(A+A) n (x +y+Z) for almost all x, y E [a N , bN ] are consistent with one another so that these structures can be combined into one structure. Hence we can characterize the structure of *(A + A) in [2cN , 2dN] , � 1 . Step three: prove that for different hyperfinite integers N where and N', the structure of *(A + A) in [2a N, 2bN ] and the structure of *(A + A) in [2a N' ' 2bN' ] are consistent so that these structures of *(A + A) in [2a N , 2b N ] for all hyperfinite integers N can be combined into one structure of *(A + A) in U{[a N , bN ] : N is hyperfinite } . Step five : pushing down the structure of *(A + A) to the standard world results Theorem 8.4. 1 . The methods developed in [13] do not seem to b e enough for proving The orem 8.4. 1. So it is interesting to see whether one can produce a reasonably nice and short standard proof of the theorem. In [3] the structure of A was characterized when g(A) is very small and g(A + A) "( cg(A) for some constant c ? 2. It is also interesting to see how one can characterize the structure of A + A when BD (A + A) "( cBD (A) for some constant c ? 2.

�z=��

8.5

Inverse problem for upper asymptotic density

In January of 2000, I was invited to give a talk at the DIJ\IACS workshop "Unusual Applications of Number Theory". One of the workshop organizers

8.

126

Additive and combinatorial number theory

was Melvyn Nathanson to whom I am grateful for being the first number theorist to express an interest in my research on number theory not to mention his continued encouragement. During the workshop I had a chance to meet another number theorist G. A. Freiman who is well-known for his work on inverse problems in additive and combinatorial number theory. He gave me a preprint of his list of open problems [5] . This list and the book [24] have since gotten me interested in the inverse problems. Inverse problems study the properties of A when A+ A satisfies certain con ditions. Freiman discovered a phenomenon that if A + A is small, then A must have some arithmetic structure. In fact Kneser's Theorem and Theorem 8.4.1 can b e viewed a s two examples o f the phenomenon. One can characterize the arithmetic structure of A from the structure of A + A in Theorem 8.4. 1 and characterize the structure of A and the structure of B from the structure of A + B in Kneser's Theorem (see [2] for details ) . In this section we characterize the structure of A when the upper asymptotic density of A + A is small. Given A � N, the upper asymptotic density d(A) of A is defined by A n [1, n] l d( A ) = lim sup l _ n ----':' <Xl

n

Without loss of generality we always assume 0 E A in this section. We can also assume that gcd(A) = 1 because if gcd(A) = d > I , then we can recover the structure of A from the structure of A', where A' = { a/ d : a E A } . When 0 E A and gcd(A) = I , one can easily prove, using Freiman's result ( 1 ) at the beginning of the next section, that d( A + A) ? � d( A) if d( A) :'( � and

d(A + A) ? l +�( A ) if d(A) ? � - The following two examples show that the

lower bounds above are optimal. Example 8 . 5 . 1

For every real number 0 :'( a :'( 1, let 00

{0} U U [1(1 - a)22nl , 2 2n ] . n= l Then d(A) = a, d(A + A) = 11" if a ? �� and d(A + A) = �a if a :'( � A=

Let k , m E N be such that k ? 4 and 0 , m , 2 m are pairwise distinct modulo k. Let A = kN U (m + kN) . Then d(A) = � = a :'( � and d(A + A) = t = �a . It is easy to choose k, m such that gcd (A) = 1 . S o we can say that d(A + A) is small whm d(A + A) = min g J(A), l+�( A ) }

Example 8.5.2

and we need to characterize the structure of A when d( A + A) is small. Clearly the characterization of the structure of A should cover the cases in both Example 8.5.1 and Example 8.5.2. We hope to show that A must have

8.5. Inverse problem for upper asymptotic density

127

the structure described in one of the examples above when d(A + A) is small. However, some variations of the examples are unavoidable. If the set A is replaced by A' <:;-; A with d(A') = a in both Example 8.5. 1 and Example 8.5.2, then d(A' + A') is also small. Furthermore, if A = A' U A" where A' =

and

{0}

00 22n l 2 22n ] u U [ i(I - a)2 , n =l

A" is an arbitrary subset of 00

n

n

u [ 1 ( 1 - a)222 + l l ' 222 + \

n =l

then again d( A + A) is small. This example shows that we can only hope to characterize the structure of A along the increasing sequence hn such that . llllln-->00 I Anh[l,hn] l - a. ' It was a long journey for me to arrive at the most recent result in [ 1 6] due to the technical difficulties of the proof. First the structure of A was characterized in [ 1 4] when d(A + A + {0, I } ) is small. Later the structure of A was characterized in [15] when d(A + A) is small and A contains two consecutive numbers. Finally in [16] the following theorem was proven. _

Theorem 8.5.3

Let d(A) = a > 0.

Part 1: Assume a > � - Then d(A + A) = 11a implies that for every increasing I An[O,+hn] l = a, we h ave · sequence { h n : n E nR-T } wz·th 1 1m17_,00 hn l nlim ->oo

I (A + A) n [0, h n ] l hn + 1

= a.

Par-t 11: Assume a < � and gcd(A) = 1 . Then d(A + A) = �a implies that either (a) there exist k > 4 and c E [ 1 , k - 1] such that a = f and A <:;-; kN U ( c + kN) or {b) for every increasing sequence { h n : n E N } with lim17_, 00 IA�lo+�n ] l = a, there exist two sequences 0 � Cn � bn � h n such that . Cn = 0, lin nl->oo hn and [ n + 1 , bn - 1] n A = 0 for every n E N. -

e

8. Additive and combinatorial number theory

128

Part III: Assume a = � and gcd (A) = 1 . Then d( A + A ) = � a implies that ei ther (a) there exists c E { 1 , 3} such that A � 4NU (c+4N) or (b) for every increasing sequence { hn E N } with limn ->exl I A�lo+�"l l = a , we have :

n

nlim -> oo

I (A + A ) n

hn

[0, h n ] l = +1

a.

I would like to make some remarks here on Theorem 8.5.3. First, the proof of Part I is easy; the most difficult part is Part II. Second, Part I and (b) of Part III cannot be improved so that set A has the structure similar to the structure described in (b) of Part II. For example, if one lets A = {0} -

U

-

00

u ( [3 l

n=

•

22n- 3 ' 4 22n- 3 ] U [5 22n- 3 ' 22n ]) ' •

•

l +d(A)

then d ( A ) = 21 and d ( A + A ) = -- . Clearly A does not have the structure 2 described in (b) of Part II. The main ingredient of the proof of Theorem 8.5.3 is the following lemma in nonstandard analysis. For an internal set A � [0, and a cut U � [0, we define the lower U-density du (A) by

{ { CA�lO�n] l ) : n

du (A) = sup inf st where

g(A)

H]

E U " [O, m]

}

}

:mEU ,

means the standard part map. Note that if U N and A � N, then du (*A). A set I = { a , a + d, a + 2d, . . . } is called an arithmetic progres

st

=

H]

=

sion with difference d. An arithmetic progression can be finite (hyperfinite) or infinite. If an arithmetic progression is finite (hyperfinite) , then its cardinality (internal cardinality) is its length. A set I U J is called a hi-arithmetic pro gression if both I and J are arithmetic progressions with the same difference d and I + I, I + J, and J + J are pairwise disjoint. A finite (hyperfinite) hi-arithmetic progression I U J has its length I I I + I J I . Let U be a cut. A hi-arithmetic progression B � U is called U-unbounded if both I and J are upper unbounded in U.

H

O *].

H]

Lemma 8.5.4 Let be hyperfinite and U = nn E N [ , Suppose A � [0, be such that 0 < du (A) = a < �. If A n U is neither a subset of an arithmetic progression of difference greater than 1 nor a subset of a U -unbounded bi arithmetic progression, then there is a standard positive real number 1 > 0 such that for every N > U, there is a K E A, U < K < N, such that

I (A + A ) n [o, 2K J I 2K + 1

[o, K

lA n JI 3 � 2K + 1

+

I·

8.6. Freiman's

3k - 3 + b conjecture

1 29

Lemma 8.5.4 is motivated by Kneser ' s Theorem. It basically says that either A + A is large in an interval [0, 2K] with K > {f for some standard n or A has desired arithmetic structure in an interval [0, K] with K > {f for some standard n. The proof uses the fact that U is an additive semi-group. This can be done only in a nonstandard setting. It is interesting to see whether this lemma can be replaced by a standard argument with a reasonable length. Recently G. Bordes [4] generalized Part II of Theorem 8.5.3 for sets A with small upper asymptotic density. He characterized the structure of A when d(A) :( ao for some small positive number ao and d(A + A) < i J(A). It is interesting to see whether one can replace ao by a relatively large value, say � , in Bordes' Theorem.

Freiman's 3k - 3 + b conjecture

8.6

After Theorem 8.5.3 was proven, I realized that the same methods used there could also be used to advance the existing results towards the solution of Freiman's 3k - 3 + b conjecture [5] . This is important because the conjecture is about the inverse problem for the addition of finite sets. Let A be a finite set of integers with cardinality k > 0. It is easy to see that lA + AI � 2k - 1. On the other hand, if lA + A I = 2 k - 1 , then A must be an arithmetic progression. In the early 1960s, Freiman obtained the following generalizations [6] . ( 1 ) Let A <:;-; N. Suppose k = I A I , 0 = min A. and n = max A. Suppose also gcd(A) = 1 . Then l A + A I � 3k - 3 if n � 2k - 3 and l A + AI � k + n if

n

:(

2k - 3.

(2) If k > 3 and I2AI = 2 k - 1 + b < 3k arithmetic progression of length at most

(3)

- 3, then k+b

A is a subset of an

If k > 6 and I2AI = 3k - 3, then either A is a subset of an arithmetic progression of length at most 2k - 1 or A is a hi-arithmetic progression.

In [6] a result was also mentioned without proofs for characterizing the structure of A when k > 10 and lA + AI = 3k - 2. In [10] an interesting generalization of (3) above was obtained by Hamidoune and Plagne, where the condition I2AI = 3k - 3 is replaced by lA + tAl = 3k - 3 for every integer t. However, no further progress of this kind had been made for a larger value of lA + A I before my recent work. In fact, Freiman made the following conjecture in [5] five years ago.

Conjecture 8.6.1 There set of integers A with I A I

exists a nat·ural rwmber K S1L Ch that for any finite k > K and l A + A I 3k - 3 + b < 13° k - 5 for

=

=

8. Additive and combinatorial number theory

130

some b ? 0, A is either a subset of an arithmetic progression of length at most 2k - 1 + 2b or a subset of a bi-arithmetic progression of length at most k + b. Note that the conclusion of Conjecture 8.6. 1 could be false if one allows 1 l A + A I = 3° k - 5. Simply let A be the union of three intervals [O, a - 1 ] , [b, b+ a - 1] , and [2b, 2b+a - 1 ] , where k = 3 a and b is a sufficiently large integer. 1 Clearly lA + A I = 3° k - 5. Since b can be as large as we want , we can choose a b so that set A is neither a subset of an arithmetic progression of a restricted length nor a subset of a hi-arithmetic progression of a restricted length. Using nonstandard methods such as Lemma 8.5.4, I was able to prove the following theorem in [ 1 7] .

Suppose f N N is a function with limn_,oc f�n) = 0. There exists a natural number K such that for any finite set of integers A with I A I = k, if k > K and lA + AI = 3k - 3 + b for some 0 :'( b :'( f(k), then A is either a subset of an arithmetic progression of length at most 2k - 1 + 2b or a subset of a bi-arithmetic progression of length at most k + b. Theorem 8.6.2

:

c-+

Theorem 8.6.2 gives a new result even for f(x) = 2. However, we still have a long way before solving Conjecture 8 . 6 . 1 . It is already interesting to sec whether we can obtain the same result with f(x) = ax for some positive real number a. The ideas for proving Theorem 8.6.2 are similar to the proof of Theo rem 8.5.3, but much more technical. Suppose Theorem 8.6.2 is not true: then one can find a sequence of counterexamples An such that I An I c-+ oo . Given a hyperfinite integer N, let A = AN . Without loss of generality, we can assume that 0 = min A, H = max A, gcd (A) = 1. and a � 1 » 0. Note that

J!

I A+N�/'A I+3 � 0.

Hence l A + AI is almost the same as 3 I A I - 3 from the non standard point of view. Using the case-by-case argument, we can show that if 1 « � , then A is a subset of a hi-arithmetic progression. If 1 � � and b = l A + A I - 3 I A I + 3, then we can show that H + 1 :'( 2 I A I - 1 + 2b when A is not a subset of a hi-arithmetic progression. The proof for the case 1�!

J!

J!

J!

J!

is much harder than the proof for the case 1 « � although the former depends on the latter. In both cases Lemma 8.5.4 was used to get structural information of A on an interval with length longer than !i n for some standard positive integer n. There are some similarities between our methods and analytic methods. In order to detect some structural properties of A <:;-; [0, n] , one may need to show that either A is uniformly distributed on [0, n ] or A has a greater density on a well formed subset of [0, n ] . The analytic methods usually look for a large Fourier coefficient A(r) (cf. [8, Corollary 2.5] ) or a large exponential sum :z::=�==-01 A( i)e 2;,' ( cf. [2 4, Theorem 2.9] ) to detect the greater density on a well

References

131

formed subset o f [0, n ] when n i s a prime number. When n is not a prime number then one needs to replace it with a prime number p > n and consider A in [0, p] instead. This replacement may not work well for Conjecture 8 . 6 . 1 as the structure o f A needs t o b e very precise. I n our methods we look for the greater density of A � [0. H] on an interval [0, K] for some K > U by checking the value of .tiu (A) . where U = n n EN [O, * ] . If .tiu (A) � � ' then the density of A on [0, K] for some K > U is significantly greater than I A I / H , which will lead t o a contradiction that l A + AI i s almost the same as 3k - 3. If du (A) = 0, then the density of A on [K, H] is significantly greater than I A I / H , which will again lead t o a contradiction. Otherwise either I (A + A) n [ 0 , 2K] I � � a, or A has very nice is large, which is impossible by the fact that structural properties on [0, K] following Lemma 8.5.4, which will force A to have the structure we hope for.

�'jt:11

References [1] V. BERGELSON, Ergodic Ramsey theory-an update, in Ergodic theory of zd actions (Warwick, 1993-1994), London Mathematical Society Lecture Note Ser. 228, Cambridge University Press, Cambridge, 1996. [2] P. BIHANI and R. JIN, Kneser's Theorem for upper Banach density, sub mitted. [3]

BI LU , "Addition of sets of integers of positive density", Journal of Number Theory, 64 (1997) 233-275.

Y.

[4] G. BORDES , "Sum-sets of small upper density", Acta Arithmetica, to appear. [5] G . A . F REIMAN, Structure theory of set addition. II. Results and prob lems, in Paul Erdos and his mathematics, I (Budapest, 1999 ) , Bolyai Soc. Math. Stud . , 1 1 , Janos Bolyai Math. Soc . , Budapest , 2002 . [6] G . A . FREIMAN, Foundations of a structural theory of set addition, Trans lated from the Russian. Translations of Mathematical Monographs, Vol. 37, American Mathematical Society, Providence, R. I., 1973. [7] H. FURSTENBERG , Recurrence in Ergodic Theory Nu.mber Theory, Princeton University Press, 198 1 .

and Combinatorial

[8] T . C oWERS, "A new proof o f Szemercdi 's theorem", Geometric and Func tional Analysis, 1 1 (2001) 465-588. [9] H. H ALBERSTAM and K . F . ROT H , 1966.

Sequences,

Oxford University Press,

8. Additive and combinatorial number theory

132

[10] Y. 0 . HAJ\IIDOUNE and A . PLAGNE, "A generalization of Freiman's 3k - 3 theorem", Acta Arith. , 103 (200 2 ) 147-156. [ 1 1] R . .TIN, "Sumset phenomenon", Proceedings of American Mathematical Society. 130 (2002) 855-861 . [12] R . JIN, "Nonstandard methods for upper Banach density problems·· , The Journal of Number Theory, 91 (2001) 20-38. [13] R. JIN, Standardizing nonstandard methods for upper Banach density problems, in the DIMACS series, Unusual Applications of Number Theory, edited by M . Nathanson, Vol. 64, 2004. [14] R. JIN, "Inverse problem for upper asymptotic density", Transactions of American Mathematical Society, 355 (2003) 57-78. [ 1 5] R. JIN, "Inverse problem for upper asymptotic density II", to appear. [16] R . .TIN. "Solution to the inverse problem for upper asymptotic density", to appear. [ 1 7] R. JIN, Freiman's 3k-3+b conj ecture and nonstandard methods, preprint. [18] R. JIN and H . J . KEISLER, "Abelian group with layered tiles and the sumset phenomenon", Transactions of American Mathematical Society, 355 (2003) 79-97. [19] H . J . KEISLER and S . LETH, "Meager sets on the hyperfinite time line", Journal of Symbolic Logic, 56 ( 1 99 1 ) 71-102. [20] A. I . K H INC H IN , Three pearls of number theory, Translated from the 2d ( 1 948) rev. Russian ed. by F. Bagemihl, H. Komm, and W. Seidel, Rochester, N.Y. , Graylock Press, 1952. [2 1] S . C. LETH . "Applications of nonstandard models and Lebesgue measure to sequences of natural numbers", Transactions of American Mathematical Society. 307 ( 1988) 457-468. [22] S . C . LETH, "Sequences in countable nonstandard models of the natural numbers", Studia Logica, 47 ( 1988) 243-263. [23] S . C . LETH , "Some nonstandard methods in combinatorial number the ory", Studia Logica. 47 ( 1 988) 265-278.

Additive Number Theory-Inverse Problems and the Geometry of Sumsets, Springer-Verlag, 1996.

[24] M . B . NATHANSON,

Nonstandard methods and the Erdos- Turan conj ecture Steven C. Leth *

9.1

Introduction

A major open question

in combinatorial

conj ecture which states that if with the property that arithmetic progressions

A

2:: �=1 1/an

[1[.

=

number theory is the Erdos-Tunin

(a71 ) is a sequence of natural numbers

diverges then

A

contains arbitrarily long

The difficulty of this problem is underscored by the

fact that a positive ans·wer would generalize Szcmer6di's theorem which says that if a sequence

A

C

N has positive upper Banach Density then

A

contains

arbitrarily long arithmetic prog;:ressions. Szemeredi's theorem itself has been

object of int en se int;crcst since first, conject.urocl, al so by Erdos and Turan , 1936. First proved by Szomerf)di in 1974 [91 , the theorem has been re-proved using completely different approaches by Fnrstcnherg in 1977 [2, 3[ and Gowers in 1999 [4[ , with each proof introducing powerful new methods. the

in

The Erdos-Turan conjecture immediately implies that the primes contain

arbitrarily long arithmetic progressions, and it was thought by many that a

successful proof for the pri mes would be the result of either a proof of the conj ectme itself re

c en tly

Green

or

significant progress toward the conj ecture. However, very

and Tao were able to solve the question for the primes without

generalizing Szemeredi's result in terms of providing weaker density conditions on -a sequence guaranteeing that it contain arithmetic progressions.

In this paper we outline some possible ways in which nonstandm·d methods

might be able to provide new approaches to attacking the Erdos- Tun'tn conjec ture, or at least other questions about the existence of arithmet.ic progressions. Heavy reference will be made to result:s iu

[7[ �:tnd [8[ ,

and the proofs for all

results quoted but not proved here appear in those two sources. *

Department of Mathematical Sciences,

co 80639.

steven . l eth@unc o . edu

University of Northern Colorado,

Greeley,

9.

134

9.2

Nonstandard methods and the Erdos-Tun1n conjecture

Near arithmetic progressions

We begin with some definitions that first appear in

[8] .

Let A C N, and let I = [a, b] be an interval in N. We will write l (I) for the length of I (i. e. l (I) = b - a + 1) and we will write 8 (A, I) or 8(A, [a, b] ) for the density of the set A on the interval I. Th1LS 6(A, I) = ��{1 .

Definition 9 . 2 . 1

Definition 9.2.2 Let t, d and w be in N , and let a E � with 0 < a < 1. For A C N and I an interval in N of length l (I) we say that A contains a t-termed a-homogeneous cell of distance d and width w in I or simply a < t, d, > cell in I iff there exists b E I with b + (t - 1)d + w also in I such that for each v, � = 0, 1 , 2, . . . ' t - 1 : 6 (A, [b +�· d, b +�· d + w l) :::0: ( 1 -a) c5 ( A , [b + v · d, b + v · d + w l) :::0: (1 -a) 2 c5 (A, I). o:,

w

If each 6 (A, [b + .; · d, b + � · d + w] ) is simply nonzero, i. e. the intervals are nonempty, then we say that A contains a < t, d, > cell. For (3 > 0 and 0 ::; u ::; w we will say that a < t, a, d, w > cell is u, (3 uniform if for each v = 0. 1 , 2, ... , t - 1, and all x such that u ::; x ::; w : w

( 1 - {3 )6(A, Jv ) ::; cl (A, [b + V · d , b + V · d + x] ) ::; ( 1 + {3 )6(A , Jv ) · where Jv denotes the interval [b + v · d, b + v · d + w] . It is clear that an actual arithmetic progression of length t and distance is an example of a < t, a, w, d > cell with w = 0 and a any non-negative number. Furthermore, this cell is u, (3 uniform for u = 0 and any non-negative (3. We could view the existence of a < t, a, d, w > cell inside a sequence A as a weak form of an actual arithmetic progression inside A . These cells arc "ncar" arithmetic progressions in some (perhaps rather weak) sense, and intuitively arc "nearer" to arithmetic progressions as the size of w decreases. In some of the results that we look at w will be "small" in the sense that the ratio of w to d will be small compared to the ratio of d to the length of the interval I. In other results w will be "small" by actually being bounded by a finite number while d gets arbitrarily large.

d

Definition 9.2.3 Let I be an interval in N, and A C I, with r > 1 E � and m E N. We say that A has the m, r density property on I iff for any interval

J C I, if l(J)

:::0:

l�) ,

then 6(A, J) ::; r6(A, I) .

Theorem 9.2.1 below gives a condition for the existence of "ncar" arithmetic progressions for any sequence on any interval I in which the density does not

9.2.

Near arithmetic progressions

135

drastically increase as the size of the subinterval decreases. More specifically, it provides an absolute constant such that whenever the density of a sequence does not increase beyond a fixed ratio for any subinterval of size greater than the length of I divided by that fixed constant, then the sequence will contain a < t, a, w. d > cell with some relative "smallness" conditions for w. A complete proof o f this theorem appears i n [8] , but we will outline the proof here. as it provides the clearest illustration of how the use of the non standard model provides us with a new set of tools for questions of this type.

Theorem 9 . 2 . 1 Let h(x) be any increasing h(x) > 0 whenever x > 0, and let g(x) be

real valued function such that any real valued function which approaches infinity as x approaches infinity. For all real a > 0, r > 1 and j, t E N there exists a standard natural number m such that for all n > m , whenever- I is an inter-val of length n and any nonempty set A C I has the m, r density property on I then A contains a u, (3 uniform < t, a, d, w > cell with {1; < h('!f), '![ < h( � ), (3 < h( � ) and g(::r,) < d < y · Furthermore, we may take w and d to be powers of 2. Proof. ( Sketch only ) . Suppose h (x ) , g(x ) , a, j, r, t arc given as in the statement above and that no such m exists. By "overspill" there exists an M, N in *N - N with lvl < N and a hypcrfinitc internal set A such that A has the M, r density property on an interval of length N but A contains no < t, a, d. w > cell on this interval with the required properties. Since the conditions are translation invariant we may assume that the interval is [0, N - 1] . We now define a standard function f : [0, 1] -----+ [0, 1] by:

f(x) = s t

(

l A n [O, x NJ I I A n [O, NJ I I

)

.

Using the fact that A has the M, r density property on [0, N] it is not diffi cult to show that f(x) satisfies a Lipschitz condition with constant r. Thus, the function f is absolutely continuous, differentiable almost everywhere and equal to the integral of its derivative. Since f(1) = 1, f(O) = 0 and f is the integral of its derivative, it must be that the Lebesgue measure of { x : f' ( x) ::0: ( 1 is nonzero. Thus, there exists a real number c ::0: 1 such that the Lebesgue measure of the set

�)}

E

=

{x : c - �

:S:

j'(x) :S: c

}

1s nonzero. By using the Lebesgue density theorem it is straightforward to show that any set of positive measure contains arbitrarily long arithmetic progressions, and that, in fact, these progressions may have arbitrarily small differences

9. Nonstandard methods and the Erdos-Tun1n conjecture

136

between elements. This allows us to obtain a < t, a, D, W > cell, with *N - N with the property that there exists B E *N - N such that

st

B D B 2D st st � ) ' � , ) ) ( ( (�

,

D, W E

( � )

. . . , st B + ( - 1)D

forms an arithmetic progression in E. The a homogeneity follows from the definition of E. The fact that f is differentiable at each point in E allows us to obtain the uniformity condition, and allows us to take U, D and W arbitrarily small but not infinitesimal to N. This, in turn, allows us the freedom to make those quantities powers of 2. We are thus able to obtain a U, uniform < t, a , D, W > cell for A in [0 , N - 1 ] with all the properties required in the theorem, contradicting our assumption. D

(3

Definition 9 . 2.4

For A a sequence of positive integers we define the

Banach Density of A or BD(A) by:

ED (A)

=

.

1n f. n1ax x EN-{0} a EN

upper

I A n [a + 1, a + x J l . X

Upper Banach density is often simply called Banach density, and is some times referred to in the literature as strong upper density, with notation d* ( A ) in place of B D (A) . That notation is used in [7] . The theorem allows us to obtain some results about the existence of uniform < t, a , d. w > cells in sequences that are relatively sparse (certainly too sparse to necessarily contain actual arithmetic progressions) . The theorem below, also proved in [8] , is of this type.

Let a > 0 and t > 2 E N be given, h ( x ) be any continuous real valued function such that h ( x) > 0 whenever x > 0 and let A be a sequence in N with the property that for all c > 0, l A n [0 , n - 1 ] 1 > n l - c: for sufficiently large n. Then for sufficiently large n, A contains a u, uniform < t, a , d, w > cell on [0, n - 1 ] with w and d powers of 2 and such that: log d u w w log d
Theorem 9.2.2

( )

( )

(i�:

(3

(3 ( )

The condition that 'J < h �) is not very strong, and it would be de sirable to improve this weak "smallness" condition. The theorem below shows that, even for t = 3, if we keep the density condition on A as above then we cannot improve this smallness condition to 'J < (�)" for any a > 0, even if we do not insist on any homogeneity or uniformity conditions.

9. 2 . Near arithmetic progressions

137

Let a > 0. There exist constants r > 0 such that for arbi trarily large n there are subsets A of [0 , n - 1] such that n l A n [O, n - 1 ] 1 > -r l-og_l_ o_ g -v�= lo=g= '

Theorem 9.2.3

2

and yet A contains no w and d a power of 2.

< 3,

n

n

d, w > cell in [0 , n - 1] satisfying 'J

Here we recall that a < t , d , w > cell is merely a collection of arithmetic progression on which A is nonempty.

<

( � ) , with "

t intervals in

Proof. Since the statement is strictly stronger as a decreases, we will assume that a ::; 1 . In [5, p. 98] it is shown that there exists a constant c > 0 and a sequence A satisfying

n

I A n [ 0, n - 1J I > e cvlog for sufficiently large n n that contains no 3-term arithmetic progression. This result is due to Behrend. For convenience we will adjust the constant and use log base 2 here, and also replace e with 2. By adjusting the constant if necessary (and using 2A) we may assume that A contains no two consecutive numbers. We may also translate so that 0 E A without changing the density condition. Thus, we begin with a sequence A which contains 0 and no two consecutive numbers and satisfies

n

. I A n [0, n - 1J I > c .JjOgi1: for sufficiently large n. 2 Let N E * N - N and let (3 =

( 1/ 2 ) 2 1°; mk+ l

mo = N; =

m 1 = the largest power of 2 less than (J( l + o/ 2 ) N

the smallest power of

L = the

2 greater than

smallest number such that

(rr;; )

m£ + 1

::;

0

mk

1.

We now wish t o show by induction that

( 9.2. 1) To see this we note that for k the construction we have

=

1 the definition of m 1 guarantees this. By

9. Nonstandard methods and the Erdos-Tun1n conjecture

138

so that, assuming the induction hypothesis,

(rr;; ) a mk a 2 (r( l +a/ 2 )k ) p( l +a/ 2 )k N 2 (ra( l +a/ 2 )k ) p( l +a/2)k N 2 (ra/ 2 ( 1 +a/ 2 )k ) (ra/ 2 ( 1 +a/2)k ) p( l +a/ 2 )k N

mk +l < 2 <

2 pa/2 pa/2(1+a/2)k+(1+a/2)k N < pa/ 2 ( l +a/2)k+( l +a/2)k N p( l +a/2)k+l N, <

completing the induction step, and establishing 9.2. 1 above. We will define a subset B of [0, N - 1 ] with the property that it contains no < 3, d, w > cell in [0, N - 1 ] satisfying '![ < (-� t, with w and d a power of 2 , and such that

N

I E n [o ' N - 1J I > cLylog 2 N

by essentially using block copies of initial segments of *A. More specifically we let and, for

i

1

:::; k

<

E Bk + 1 iff

L

l-i_'-j mk + 1

E *A, where

i'

i

is the remainder of mod

mk ·

When k is finite, the fact that 0 E *A means that Bk intersected with B1 n . . . n Bk_1 has cardinality at least a noninfinitesimal multiple of that of B1 n . . . n Bk_1 . When k is in *N - N the density condition on A guarantees that at least a Cb portion of Bk intersects with B1 n . . . n Bk- 1, where 2 n < N. Thus

N

B1 n . . . n Bk has cardinality at least -----= == 2 ck ylog N · We now let

B = B1 n . . . n B£.

( 9.2.2)

Now suppose that B contains a < 3, d, w > cell on [0, N - 1] with '![ < (-� t where both w and d are powers of 2 . We show that this forces an actual arith metic progression of length 3 in *A and thus in A (by transfer) , contradicting our assumption about A.

9.2. Near arithmetic progressions

139

To see this, we let i be such that mi+ l :S: d < mi . Then since *A contains no two consecutive numbers, the < 3, d, w > cell on [0, N - 1] must be completely contained inside one of the blocks of length mi, i.e. inside some [vmi, (v+ 1 ) mi] · But a

w <-

and since

w, d and the

( -d ) N

d<

( mN" )

a

m

< "·

m +1 " ·

m k 's are powers of 2

.

so that there exist 3 intervals of length m2+ 1 inside [vm2 , (v + 1 )mi] which contain elements of B and are in arithmetic progression. By the construction this means that *A contains an arithmetic progression of length 3, and then by transfer, so does A. It remains to estimate L in terms of N. From 9 . 2 . 1 we see that

so that

log N :S: ( 1

or

+ a j 2) k log ( 1 / p)

log log N :S: k log( 1

Thus

mk :S:

1 wl1enever k �

+ a/2) + log log ( 1 /p) . log log N - log log ( 1 /p) . log ( 1 + a /2)

This and the definition of L now yield L+ 1 < -

log log N - log log( 1/ p) . log ( 1 + a/2)

The above inequality, the definition of cj log ( 1 + a/2)

For such an

r

I B n [O, N - 1] 1 >

B

and 9.2.2 imply that for any

r

>

N 2 r loglog N y'IOg)V .

the result now follows by transfer.

D

We note that the density condition given in the theorem above is stronger than simply being greater than n 1 -c for sufficiently large n, since n 1 -f; is of the form n n 2c log N < 21og log n-Jlog n for large n.

---

9.

140

Nonstandard methods and the Erdos-Tun1n conjecture

Thus, weak as theorem 9.2.2 is in its smallness conditions, there are clear limits to how much it can be strengthened for sets of this relative sparseness. It appears to be more promising to look at denser sequences in the hope of maintaining a stronger "smallness" condition. The theorem below is proved in [8] , and provides just one possible example of conditions like this that might provide a means for approaching deep questions about arithmetic progressions. The proof of theorem 9.2.4 below is similar to that of the proof of theorem 3 given above. In particular these proofs illustrate how "smallness" conditions that may not seem very strong can be used to show that somewhat denser sets contain actual arithmetic progressions.

Theorem 9 . 2 .4 The Erdos- Tunin conjecture follows if we can show that for fi:r:ed t and constant c > 0, there exists no S1LCh that for all n > no, whenever the sequence A satisfies n I A n [O , n - lJ I > ) 2 log log --�,-----n

(c log n

then A contains a < t, d, w > cell on [0, n 1] with '![ < � where both w and d are powers of 2. -

9.3

The interval-measure property

The conditions given below are natural from the nonstandard perspective and not at all so from the standard perspective. They might provide another means of attacking questions about arithmetic progressions. These definitions first appear in [7] .

Let A be an internal subset of *N, y, z E *N with z - y E *N - N. Then we say that A has the IM (interval-measure) property on [y, z] iff for every real, standard (3 > 0 there is a real, standard a > 0 such that whenever [u, v] C [y, z] with v - u E *N - N and the largest gap of A on [u, v] is ::; a ( v - u) then

Definition 9.3. 1

If A is a standard subset of N, we say that A has the SIM (standard interval-measure) property iff *A has the IM property on every interval [y, z] C *N with z - y E *N - N, and

( { ( : = �)

A st

:

a E *A n [y, z]

}) > 0 on some such interval [y, z] .

Here A is used to denote Lebesgue measure.

References

141

A somewhat cumbersome but standard equivalent definition for the SIM property is given in [7] . Through the use of the standard equivalent it is easy to see that if A has the SIM property then given (3 > 0 there exists a fixed a > 0 that works for every infinite interval . The theorem below follows immediately from theorem 3.2 in that same work.

Let t E N, 0 < (3 < 1/t, and suppose that A C N has the SIM property, with a corresponding to the given (3. Then there exists a constant j E N such that whenever A contains a < t, d, > cell consisting of the intervals [b + � · d, b + � · d + w] for 0 ::; � ::; t - 1 in which the largest gap of A on each of these intervals is ::=:; aw then A contains < t, d, j > subcell, i. e. consisting of intervals of the form [b' + � · d, b' + � · d + j] C [b + � · d, b + � · d + w] . Theorem 9.3.1

w

This result is significant in that a fixed constant size to the intervals 1s a much stronger "smallness" condition than was achieved in previous results. However, the assumption that A is a SIM set is a strong condition. It is shown in [7] that any sequence A = (a n ) in which limn -Hxl (an + l - a n ) = oo does not have the SIM property, so no pure density condition weaker than positive Banach density can imply that A contains a SIM set . On the other hand, sets may certainly have the SIM property without having positive Banach density. Even the question of whether or not positive Banach density is sufficient for a sequence to contain a SIM set is still open. A positive solution to either of the conj ectures below would be a major step toward establishing the SIM condition as a useful tool for questions of this type. Since the two conjectures together imply Szemeredi's theorem, at least one of them is certain to be quite difficult .

Let A C N have positive Banach density. Then A contains a subset B with the SIM property.

Conjecture 9 . 3 . 1

Every set A C N with the SIM property contains arbitrarily long arithmetic progressions. Conjecture 9.3.2

References

[1]

P . ERDOS and P. TeRAN " On some sequences of Integers", J. London Math. Soc., 11 ( 1936) 261 -264.

[2]

H. FURSTENBERG , "Ergodic behavior of diagonal measures and a theorem of Szemeredi on arithmetic progressions", J. Anal. Math. Soc., 31 (1977)

204-256. [3]

H . FuRSTENBERG, ReC'urrence in Ergodic Theory ber Theory, Princeton University Press, 1981.

and Combinatorial Num

142

[4]

9.

Nonstandard methods and the Erdos-Tun1n conjecture

T . CoWERS , "A new proof of Szemeredi's theorem". GAFA, 1 1

465-588.

(2001)

Ramsey Theory, 2 nd ed.

[5]

R. GRAHAM. B . RoTHSCHILD and .T . S PENCER , Wiley, 1990.

[6]

B . G REEN and T. TAo , The Primes Contain Arbitrarily Long Arithmetic Progressions. Preprint, 8 Apr 2004. http : //arxiv . org/abs/math . NT/0404188

[7]

S . L ETH , ··Some nonstandard methods in combinatorial number theory", Studia Logica, XLVII (1988) 85-98.

[8]

S . L ETH , " Ncar arithmetic progressions in sparse sets", Proc. of the Amer. Math. Soc . , 134 (2005) 1579-1589.

[9]

E. SzEMEREDI , " On sets of integers containing no k clements in arithmetic progression". Acta Arith. , 27 (1975) 199-245.

Nonstandard likelihoo d ratio t est In exponential families

Jacques Bosgiraud *

Let

(po ) oEe

be an exp ou enti al

Abstract family in JRk . After estahlishiug noust<m

dard results about large deviations of the sample mean X, this paper de

fines the nonstandard likelihood ratio test of the null hypo thesis Ho : fJ E hal(El0 ) , where El0 is a standard s ubset of 8 and hal(Elo) its halo [f a is the level of the test, depending on whether 1�.� is infini tesimal or not we obtain different rej ection criteria. vVe calculat e risks of the first ancl second kinch; (cxtenwl prohahiliti.es) and prove that, this test is more powerful than any "regular'' nonstandard test based on X . .

10. 1 10. 1 . 1

Introduction A

most powerful nonstandard test

In a preceding paper [9] , we proved that the nonstandard likelihood ratio test (NSLRT) is more powerful than the nonstandard chi-squared test. Our purpose now is to generalize this result to classical exponential families in JRk (wh ere k is standard) : the NSLRT is more powerful than any nom;tandard test issuing from a (family of) standard test(s) with rejection crit erion X E R, where X is the sample mean and R a sufficiently regular set. The standa,rd likelihood ratio test is not so powerful (see §1 0.5) . We hope that viewing the problem from a general perspective will lead to a clearer understanding of its strn<:tnrc and Rimplcr and bett.er proofs. In §10.3, we establh;h some results about large deviations of the mean, more or less similar to classical results, before studying the N S LRT in §10.4 and comparing iL to nonstandard "regular" tests based on X in §10.5. " Universfte Paris VIII, Paris. j acques . bosgiraud@univ-paris8 . fr

10. Nonstandard likelihood ratio test in exponential families

146

This paper follows also papers about nonstandard tests for monotone like lihood ratio families ( [5] , [6] , [8] ) and we shall maintain the same defini tion for a nonstandard test in this paper. For notation and definitions of nonstandard analysis, external calculus , and external probabilities we refer to [ 1 5] , [ 1 1 ] , [14] , [4] .

10. 2 10.2.1

Some basic concepts of statistics Main definitions

A statistical family is a triplet (Sl , F, P) where Sl is a set, F a O"-field of subsets of Sl , P a family of probabilities on (Sl , F) : P := { Pe : e E 8}. In the following, we suppose that Sl is a subset of �d (where d is a standard integer) , that 8 i s a subset of �k (where k i s a standard integer) and that there exists a positive mesure p, defined on (Sl, F) such that for each e E 8, Pe is absolutely continuous with respect to p,: we note Pe := PetL where pe is a F-measurable function defined on Sl. So, X : Sl ---+ �d , x ---+ x is a random variable with distribution Pe . Let n be a (standard or nonstandard) integer; denoting X : = Sl n , Pl) : = P!fn is a probability defined on (X , ;::em ) and we can write Pl) = Pe p,n , where p,n : = p, 0n and p(f (x 1 , . . . , xn) = TIZ= pe (xz). A n-sample (x 1 , . . . , xn) is an 1 element of X. For i = 1 , . . . ' n we define xi : X ---+ �d by Xi(X1 , . . . ' Xn ) = Xi; then x1 ' . . . ' Xn are independent identically distributed (i.i.d.) random variables with distribution Pe . (X1 , . . . , Xn ) is a n-sampling of X. Note that some authors do not distinguish Xi and Xi, sample and sampling. A statistic is a measurable function For example, the sample mean X := � I:Z= 1 X2 is a statistic (here We note EeT the expectation of T for the probability Pl) .

10.2.2

m

=

d) .

Tests

Let 8o a nonempty proper subset of 8, 81 := 8� its complement and a E ] 0 , 1 [. A level-a test of Ho : e E 8o against H1 : e E 81 is a statistic rp : X ---+ [0, 1] such that W E 8o, Eerp :S: a. Ho : e E 8o is called the null hypothesis; 1 - 'P l S the probability of accepting Ho. H1 : e E 81 is the alternative hypothesis and rp is the probability of rejecting Ho.

10.3.

Exponential families

147

Eerp is called the size of the test . { (x 1 , . . . , xn) E X , 1} is called the rejection set and { (x 1 , . . . , xn) E X , 0} is called the acceptation set; if rp(x 1 , . . . , X n ) E ]0, 1 [ the

Also, supli E Bo =

rp(x 1 , . . . , xn ) rp(x 1 , . . . , Xn )

=

test is said to be randomized. Very often, rp is defined through a statistic T and rp(x1, . . . , xn) = 1 -¢? T(x 1 , . . . , Xn ) E R (in fact, we shall write rp = 1 -¢? T E R) where R is a subset of �m. R is also called the rejection set and "T E R " is the rejection criterium (e.g. if m = 1 , T > to is a rejection criterium: to is a constant depending on 8o and a) . For e E 8o, Eerp is the risk of the first kind (at B); for e E 8 1 , Ee (l - rp) is the risk of the second kind (at B); for e E 8, Eerp is the power of rp (at B) . For testing a given null hypothesis, there arc generally a lot of level-a tests (for example the constant test rp := a) . A level-a test ¢ is said uniformly the most powerful (U.M.P.) if for any level-a test rp, ¢ is uniformly more powerful than '1/J , i.e. VB E 8 1 , Ee¢ � Eerp (in fact "more powerful" means "at least as powerful") . U .M .P. tests only exist in particular cases: for example for ! dimensional exponential families if Ho : e :::_: Bo (where Bo E 8, an interval of �) . So, some more sophisticated notions are used to compare the power of two tests: for exemple the relative efficiency ( cf. § 10.5) . It is not possible to summarize this notion in some words; so we suggest to refer to [1] . [2] or [12] .

10.3 10.3.1

Exponential families Basic concepts

The following classical results and more information about exponential fam ilies can be found in [10] or [13] . Let k be a standard integer. We denote by (x I y) = 2::]= 1 Xj YJ the scalar product of x and y, vectors of �k . Let p, be a probability mesure on Sl : = �k (the O"-field is the field of borelian sets) and let e

The set

:=

{

eE

�k :

J exp (e I x)p,(dx) < 00} .

8 is convex and for e E 8, let '1/J ( e) = ln

j exp( e 1 x)p,( dx ) .

The function 'ljJ is convex and continuous on 8 ° (the interior of 8). The statistical family { Pe : e E 8} defined by Pe : = PetL where

Pe (x) = exp ( (e I x) - ?jJ(B))

10. Nonstandard likelihood ratio test in exponential families

148

is the (full) exponential family associated to p,. A lot of classical statisti cal families (e.g. multinomial distributions, multidimensional normal distri butions, . . . ) are exponential families, generally after reparametrization. This reparametrization can be chosen such that e0 is nonempty. Let e' := {e E e : Ee i iX II < oo} where 11 1 1 denotes the Euclidean norm. For e E e' we define .>.. ( e) : = EeX ; this mapping is 1-1 from e' onto A := .>.. ( e') : e' contains e0 and .A is a 1-1 diffeomorphism from e0 onto A0 (cf. [10, pp.74,75] ) . To see this, notice that for e E e 0 , EeX = \l 'ljJ (e) and all derivatives of 'ljJ exist at e. For e 1 , e2 E e0, e 1 -/=- e2 , 'lj J is strictly convex on the line joining e l and e2 and then ( e l - e 2 I .>.. ( e l ) - .>.. ( e2 ) ) > 0. 10.3.2

Kullback-Leibler information number

For eo E e and e E e', the Kullback-Lciblcr information number is given by 'ljJ (eo) - 'ljJ (e) + ( e - eo I .>.. ( e) ) . If eo is a proper subset of e ' let I(e, eo)

=

J(e, eo) := inf {J(e, eo) : eo E eo} . For (�, �o) E A2 , we set and for A C A and � E A, let J ( �, A) : = inf {J ( �, a) : a E A} , J(A, �) := inf {J(a, �) : a E A} . The classical likelihood ratio test of He0 : e E eo against e E e \ eo is based on the statistic

where ( Xi h < i < n is a n-sampling of X (sec § 10.2.1). Here. denoting X n

� L Xi i= l

Re0

sup { (e I X) - 'ljJ (e) } - sup { (eo I X) - 'ljJ (eo) } ()o E 8o () E 8 sup (e - eo I X) - 'ljJ (e) + 'ljJ (eo) } . eoinf E 8o () E 8 {

=

10.3. Exponential families

149

If X E A, then >, - 1 (X) is the maximum likelihood estimator of e and so

where Ao := >.(eo). A (the closure of A) is the closure of the convex hull of the support of p, (cf. [10] ) ; so, in any case, X E A. As sup {(e I �) - '1/J(B) : e E e} is lower semi continuous with respect to � ' we define (as in [10] ) , for � E 8A (the boundary of A) and �o E A, J(�, �o) by J(�, �o) : = lim inf J( ( , .;o) . e_,�, e EA

So, if X tf. A (then X E 8A) it remains possible to write

Re0 = J(X, Ao). In the following, we shall suppose that, for each �o E A, J(·, �o) is continu ous on A. This assumption is verified by classical exponential families, but it is possible to build counter-examples (cf. exercise 7.5.6 in [10] ) . Note that if �o E A\ hal(8A) is limited, this assumption implies the S-continuity of J(·, �o) on any limited subset of A. Indeed, if �o is standard, it is obvious. If �o is non standard, and if � is limited, let Bo := >, - 1 (.;o); as >. is a diffeomorphism from e 0 onto A0, then oeo = >, - 1 (0�o) and so, as 'ljJ is continuous on e0, we can write The S-continuity of J(·, �o) is then deduced from the S-continuity of J(·, 0�o). In the following. if A is a subset of � k , A0 will denote its interior, A its closure, aA its boundary for the euclidian topology of �k ; its shadow 0A is a closed standard set (cf. [11, p. 63] ) . If A is a subset of A, §A will denote its boundary for the induced topology of A . 10.3.3

The nonstandard test

In this paper n will be supposed unlimited , so we shall use classical asymptotic properties of the likelihood ratio test ( cf. §10.4) . Let S o be a standard subset of e, such that hal( S o) - its halo - is included in a standard compact K included in e0. Let F : = { eo internal : S o C eo C hal( S o) } and let e 0 be the likelihood ratio test of He0 : e E eo against e tf. eo of size a. Recall that e 0 is defined in the following way: there exists a number d d( a , eo) such that :=

sup Pl) (Re 0

liE8o

> d) :::;

a

:::;

sup Pl) (Re0

liE8o

? d) .

10. Nonstandard likelihood ratio test in exponential families

150

Consequently, if Re0 < d then e0 randomizes if Re 0 = d.

0, if Re 0

>

d then e 0

1 and one

The nonstandard likelihood ratio test (NSLRT) of the non standard null hypothesis (Ho) : e E hal(Go) of level a is defined by
e0 : 8o E F} .

Definition 10.3.1

=

We prove in this paper that, except for a case studied in § 10.4.3 where

either \i8o E F, e0 = 1 and then
•

or 38o E F, e 0 = 0 and then

=

0 (if He0 is accepted for at least

At the beginning of § 10.4 we shall explain that a has t o b e infinitesimal. We shall prove in § 10.5 that this NSLRT is uniformly more powerful than any "regular" (this term will be defined later) nonstandard test based on X. Convention: if E is an external event, and P R(E) its external probability and if T) is an external number, "the probability of E is equal to T)" means P R(E) = T) and "the probability of E is exactly equal to TJ" means P R(E) TJ · =

10.3.4

Large deviations for

X

In exponential families, J( · , >.(Bo)) can be regarded as the Cramer transform of Pe0 • rviore generally, classical literature establishes that

1

-

n nlim - ln P (X n E A) = - xinf C(x) -+ oo n

EA

where A is a Borel set of �k such that A C A0 and X n the mean of n i.i.d. ran dom variables taking values in � k , C the Cramer transform of their distribution (see [16] , section 4: the proof is based on the minimax theorem for compact convex sets) . Even more generally, if the i.i.d. random variables take values in some topological vector space E , classical literature ( cf. [3] ) establishes the same result if A is a finite union of convex Borel sets of E. For exponential families, it is possible to establish specific proofs by using half-spaces (cf. [10] , chap. 7 and exercises 7.5.1 to 7.5.6). In a more general setting, half-spaces are also used in [3] . section 2. Our propositions 10.3.2 and 10.3.3 give nonstandard results which imply (by transfer) the classical results. If the first part of the proofs is based on the law of large numbers as in the classical literature, the second part, based on infinitesimal pavings, seems to be original.

10.3. Exponential families 10.3.5

151

n-regular sets

Definition 10.3.2 Let A is n-regular in a iff 3b E

be a subset of A and a E 8A. We shall say that A hal(a), 3p E J Jn, 0] such that P l l >- - 1 (b) I I 0 and B(b, p) C A (where B(b, p) is the open ball of centrum b and radius p) .

Proposition 10.3.1 Let A hal(8A) = hal( a 0A); then A

=-:

be a limited subset of A such that 0A = ( 0A )0 and is n-regular in any point a E a A.

We first claim that A \ hal(8A) = 0A \ hal(8 °A) . Indeed, if x E A \ hal(8A) then °x E 0A and 0x tf. hal(8A) hal( a 0A) and so 0x E 0A \ hal(8 °A); consequently, 0x E (0A)0 and so x E (0A)0; finally, as x tf. hal(8A) = hal( a 0A) , x E 0A \ hal( a 0A) . Conversely, if x E 0A \ hal( a 0A), there exists y E A such that x y; then x E A for if not x E hal(8A) = hal(8 °A) . If a tf. hal(8A), choose an infinitesimal p such that p > Jn · Then for each b E A \ hal(8A) , B (b, p) C A: it is obvious if b is standard and if b is nonstan dard, with 0b its shadow, there exists a standard po such that B(0b, po) C A and then B(b, p) C B(0b, Po) C A. So Proof.

=

=-:

Vc ?!:, 0, 3b EA, d(a, b) < c and B (b, p) C A. Then the internal set { c E � : 3b EA, d( a, b) < c 1\ B(b, p) C A} contains all appreciable c. Cauchy ' s principle yields that ::lc

=-:

0, 3b E A, d(a, b) < c and B(b, p) C A;

as a tf. hal(8A) then b tf. hal(8A) and so >. - 1 (b) is limited (because A is a standard diffeomorphism from 8 ° onto A0). Finally p 1 1 >- - 1 (b) l l ::: 0. If a E hal(8A) , fix p = n -1 14 ; if b E A\ (hal( 8A)Uhal( 8A)) then B (b, p) c A and, as >. - 1 (b) is limited p 1 1 >- - 1 (b) l l < n- 1 18 (for example). So the internal set

{ c E � : 3b EA. d(a., b) < c 1\ B(b, p) C A 1\ P l l >-- 1 (b) ll < n- 1 } 18

contains all appreciable c. Cauchy 's principle yields that

::lc ::: 0, 3b E A, d(a, b) < c 1\ B(b, p) C A 1\ p I I >- - 1 (b) I I < n- 1 18 .

D

For example, a standard set A such that A c A0 (e.g. an open standard set) is n-regular in any boundary point. It is possible to prove that a limited convex set A such that 0A has a nonempty interior is also n-regular in any boundary point (we shall not use this result in the following) .

10. Nonstandard likelihood ratio test in exponential families

152 Lemma 10.3.1

(i) Let (�o , 6) E A2 and let � E ]�o , 6 [ (the segment between �o and 6). Then J ( ( 6 ) < J ( �o , 6 ) . (ii) Let A be a nonempty subset of A and 6 E A0 \ A. Then J(A, 6) J(A, 6 ) = J (aA, 6 ) . (iii) Let E and F be subsets of A such that E is a compact set included in F 0 ; let � E A0 \ E. Then J(F, �) < J(E, 0 and J(�, F) < J (�, E) . =

Proof. (i) As A is convex, � E A. We set �o - � =: a(� - 6 ) where a > 0; denoting e = >, - 1 (�) and e 1 := >,- 1 (6 ) , we have ( using corollary 2.5 in [10]) (ii) Let �o E A. If �o tf. aA, let � E aA n ].;o , 6 [. Using (i), we can write J(�, 6 ) :::; J(�o, 6 ) . So J(aA, 6 ) :::; J(A, 6 ) . Conversely, using the continuity of J ( , 6), we have ·

J (aA, 6 )

�

J (A, 6 ) = J(A, 6 ) .

(iii) If � E F, then J ( P �) = J (�, F) = 0 and J (E, �) -/= 0 , J(�, E) -/= 0 because � t/. E. We suppose now that � tf. F. Let �E E §E be such that J(E, �) = J (�E , �) ; there exists at least one �F E aF n ]�E , �[ and then, using (i), J (�F, �) < J(�E, �) and so J(F, �) < J(E, �) . Let now �k E §E be such that J (�, E) J (�, �k) ; there exists at least one �� E §F n J �k , �[. We set B E := A - 1 (�k) and Bp := >. - 1 (�� ). Then =

J (�, �k) - J (�, �� ) = =

1/J (BE) - 1/J (B) + (e - e E 1 �) - 1/J (BF) + 1/J (B) - (e - eF 1 �) = '1/J (e E ) - 1/J (eF) + ( eF - e E 1 �) = 1/J (BE) - 1/J (BF) + (eF - e E 1 �� ) + (eF - eE 1 � - �� ) = 1 ( eF , e E) + ( eF - e E 1 .; - �� ) .

Then setting .; - �� = : (3( �� - �k) where (3 > 0, we can write according to corollary 2.5 in [10] .

D

10.3.

Exponential families

153

Proposition 10.3.2 Let A be a limited subset of A \ hal (8A) , 6 = >. (B1) be a limited point of A \ hal (8A) and �o E aA such that J(�o , 6 ) :-: J(A, 6 ) . If A is n-regular in �o , then

1

-

- ln Pl) (X E A) :-: -J(A, 6 ) , 1

n and if furthermore d(6 , A) 7: 0, then

1

-

-- ln Pl) (X E A) 1

n

=

J (A, 6) ( 1 + 0)

=

@.

Proof. For all (el , e2) E 8 2 ' denoting X (xl , . . . ' Xn ) and X (where Xj = (xj ,i hs i s k E � k ) we can write =

(

=

� 'L.j= l Xj

)

PlJ; ( dx) = exp n ( ( el - e2 I x) - 1/J ( el) + 1/J ( e2)) p� ( dx) .

Let eo := >. - 1 ( �o) and let 6 =: >. (e2 ) and p > Jn be such that 6 E hal ( �o ) and B( 6 , p) c A. As >. is a standard diffeomorphism from 8° onto A0 , Bo , B1 , B2 are limited. Then

PlJ; ( X E A) > PlJ; (X E B ( 6 , p)) > c exp n ((el - e2 I X) - 1/J(Bl ) + 1/J(B2)) dP� jXEB(b,p) > exp n c ((el - e2 I X) - '1/J (Bl ) + 1/J (B2)) dPlJ2 JXEB(b,p)

(

(

)

)

by Jensen ' s inequality. P� (Xi E 6 , - p/ 2, 6,; + p/2]) = + 0 for According to (8.4) in each i = . . . k. Then the nonstandard law of large numbers yields P� (X E B ( 6 , p)) = + 0 and we can write ( denoting by 1:k the external set of limited vectors of �k )

[ 4] ,

1

1 1 - ln P(j ( X E A) n

1

>

[

1

,

.!. ln( 1 + 0 )

(B 1 - e2 1 6 + J:k p ) - 1/J(BI ) + 1jJ(B2) + n 0 ( e l - e2 1 6) + J:p - 1/J(e 1 ) + 1/J(e2) + n - 1 (e2 , e1 ) + 0.

As

I (B2 , B1 ) - I(eo , B 1 ) = (eo - e1 1 6 - �o ) + I (B2 , Bo), and I (B2 , Bo) = @ I I B2 - Bo ll = @ 11 6 - �o il with I I Bo - B 1 ll limited, 11 6 - �o il we have ( cf. lemma 3. 2 . 2 in

[ 13] ),

:-:

0

- I (B2 , e1 ) = - I(eo , el ) + 0 = - J(A, 6) + 0 ,

10. Nonstandard likelihood ratio test in exponential families

154

and so

-

1 n

- ln P� (X E A)

2'

- J (A, 6 ) + 0.

Conversely, the limited set A is included in a hypercube [ -p, p] k where p is a standard integer; pave it with n k (2p) k hypercubes ( T1 ) with side 6 = � · Among these hypercubes, eliminate the ones which do not intersect A and for the others choose t1 E A n Tt . In the aim of simplicity, we shall denote again the selected hypercubes by ( Tt h< t < N · Let Bt : = >.- 1 ( tt) , Bt is limited because t1 is limited and t 1 tf. hal(8A) . F�om the relation N < nk (2p) k we deduce In N = In n £ . Then we can write ' n n N dPenl p� ( X E A ) :::: L r l=l Jp<E Tl } N L r X- E T1 } exp n( ( el - el I X) - 1/J (Bl ) + 1/J (Bt)) dP� l=l }{

(

< <

Thus

1

)

f:l=l exp (n ( (el - el I tt + �k ) - 1/J (Bl) + 1/J (Bt) )) J: k ax exp { n ( ( el - el I t l + ) - ?jJ(Bl ) + 1/J (Bt) ) } . N l ,!?l...N n

-

{(

1

k

)

J: - ln Pe (X E A) < - ln N + :m ax el - el I tt + - - 1/J (Bl ) + 1/J (Bt) l - l...N n n n J: 1 < - ln N + max { (Bl - Bt l tt) - 1/J (BI ) + ?/J (Bt) } + l=l...N n n 1

< <

Finally,

1

}

J:

max - I (Bt , Bl ) + -n ln N + l=l...N n

-J(A. 6 ) + 0. 1 n

-

- ln Pe (X E A ) 1

=

- J ( A, 6 ) + 0.

As we said previously, according to lemma 3.2.2 in [ 13] , if a and 6 are limited elements of A \ hal(A) , then J(a , 6 ) = @d(a, 6 ) . Consequently, if d ( A, 6 ) = @ then J ( A , 6) = @. Thus

1

-

- -:;:;, ln P� (X E A) = J(A, 6 ) ( 1 + 0) = @.

D

In fact. the continuity of J(·, 6) allows us to generalize this result to limited subsets of A:

10.3. Exponential families

155

Proposition 10.3.3 Let A be a limited subset of A, 6 be a limited point of A \ hal(8A) and let �o E aA such that J(�o, 6 ) J(A, 6 ) . ff A is n-regular in �o then 1 -n ln Pl)1 (X E A) :-: - J (A, 6). :-:

Proof. Let 6 =: >.(B2 ) E hal(�o) and p > Jn be such that B ( 6 , P) C A and p IIB2 II :-: 0. As in the proof of proposition 10.3.2, we can write 1 -n ln Pl)1 (X E A) > (B 1 - e2 1 6 + £kp) - 1jJ (B 1 ) + 'ljJ (B 2 ) + _!_n ln (1 + 0 ) (e l - e2 1 6 ) + £p + 0 - 1/J (B 1 ) + 1/J (B2 ) + 0 -.J( 6 , 6 ) + 0 = -J( �o , 6) + 0 = - J(A, 6 ) + 0 because 116 - �o il :-: 0 and .J( · , 6 ) is S-continuous. Conversely, as in the proof of proposition 10.3.2 , use a paving (T1h < l < N with side 6 = � where each T1 is closed and such that T1 n A -/=- 0 , but cho;se t1 in T1 (maybe outside A) such that: if B1 , z - el. z > 0 then t l , z is maximal in T1 if B1 , z - el. i � 0 then tl , i is minimal in T1 So, for X E Tl we can write (e l - el I X) :::.: (e l - el I tl) and then

Pe� ( X E A)

N

�

L Jrp<E Tz } dr'e 1

<

L Jr{x- E T1 } exp ( n((el - el I X) - 1/J (Bl ) + 1/J (Bl )) ) dP�

<

l=l N

l=l N :L: exp n((e l - Bl l tl ) - 1/J (B I ) + 'l/J (B I )) . l=l

(

Thus

1 -n ln Pl)1 (X E A)

)

1

<

- ln N + =maxN {(el - el I t l ) - 1/J (B l ) + 1/J ( Bl ) } l l... n

<

- ln N + =max l...N {-.J(t1 , 6 )} .

1 n

l As .J(· , 6 ) is S-continuous, J(tl , 6 ) = J (0tl , 6) + 0 and then , as 1 - ln Pe: (X E A) � - J( 0 A, 6 ) + 0. n

0tl

E

0A

10. Nonstandard likelihood ratio test in exponential families

156

We now claim that J(0A, 6) ::: J(A, 6). Indeed, if a E A, then °a E 0A and J(a, 6 ) ::: J(0a, 6); so J(0A, 6 ) :;, J(A, 6). Conversely, if a E 0A, there exists a' E A such that a ::: a' and thus such that J (a, 6 ) ::: J (a', 6); so J(A, 6) ;::_ J( 0A, 6 ) . D Finally � ln PJ: ( X E A ) :::; -J(A, 6) + 0. 10.3.6

n-regular sets defined by Kullback-Leibler information

Let 8o be an internal subset of 8, included in a standard compact K included in the interior of e. Let co := co (8o) := sup {I(B, 8o) : e E 8} :::; oo . For c E ]O, co[, let Ac : = {� E A : J(�, Ao) :::; c} where Ao : = >.(8o). We shall see in lemma 10.3.2 that J(· , Ao) is continuous; then if c < c' < co , Ac is a proper subset of Ac' (for if not, { � E A : J(�, Ao) :::; c1c' } = { � E A : J(�, Ao) < c1c' } is a connected component of A which is a connected set.) Lemma 10.3.2

If c :::; co is limited, then aAc is a limited compact set.

= { � E A : J ( �, Ao ) = c} is closed in A since the function � J(�, Ao) is continuous on A as we prove now. By transfer, we just have to prove this continuity for a standard Ao: if 6 and 6 are such that 11 6 - 6 11 ::: 0, if �o E Ao is such that J(6 , �o) ::: J(6 , Ao) then, as J(-, �o) is S-continuous, J( 6 , �o) ::: J(6 , �o) and so J( 6 , Ao);;,J(6 , Ao) ; similarly J(6 , Ao);;,J( 6 , Ao). We prove now that if � is unlimited, then J(�, A0 ) is unlimited. For each �o E Ao , �o belongs to the standard compact K; then II � - �o il ::: oo and so J(�, �o) ::: oo (cf. [10, p. 177]). Thus J(�, Ao) is unlimited. Therefore { � E A : J(�, Ao) = c} is limited. D ____,

Proof. a Ac

Proposition 10.3.4 Let c :::; co be limited and B1 be a limited point of 8' such that 6 := >.(B 1 ) tf. hal (8A) . (i) If 6 tf. Ac, then � ln PJ: (X E Ac) = � ln PJ: (X E A�) ::: - J( aAc, 6 ) .

(ii) If 6 E Ac, then � ln PJ: ( X E Ac)

:::

(iii) If 6 tf. Af , then � ln PJ: ( X E Af)

0.

=

� ln PJ: (X E Af) ::: -J( aA c , 6 ) .

(iv) If 6 E Af , then � ln PJ: ( X E Af) ::: 0.

10.3. Exponential families

157

Proof. We first prove the both similar results (i) and (iii), and then the both similar results (ii) and (iv) which are obtained in a same way. (i) If 8o and c are standard, Ac is a standard compact set such that Ac A2 . Indeed, A� {� E A : J (�, Ao) < c}. If � is such that J (�, Ao) c , let >.. o E Ao (a compact set) be such that J(�, Ao) = J (�, Ao) = J (�, >..o ) = c. According to lemma 10.3.1 (i) , for any ( E ] ( >.. o [, J ((, >..o ) < J(�, >..o ) and so J((. Ao) < c. Thus ]�, >..o [ E A� and then � E A2. So, according to proposition 10.3. 1 , Ac and A� arc n-regular in any point of 8Ac and then, ac cording to proposition 10.3.3, we can write � ln PlJ; (X E Ac) -::::: - J (Ac, 6 ) and � ln PlJ; (X E A�) -::::: -J(A� , 6 ) . Finally, lemma 10.3.1 (ii) yields J(Ac, 6 ) = J (A�. 6 ) J(aAc. 6 ) . Now, if c or 8o are not standard, the continuity of J (�, ) on >.. ( K) implies 0Ac(8o) Aoc(08o) and the hypothesis of proposition 10.3.1 is verified since =

=

=

=

·

=

hal (8Ac)

=

hal (aAc) U { � E 8A : J (�, Ao) � c}

and

hal( a 0Ac) = hal (a0Ac) u { � E aA : J (�, 0Ao) � °C} . Indeed, on one hand, the continuity of J (�, ) implies ·

and on the other hand, this continuity also implies hal( 8Ac)

{ � E A : J (�, >.. ( 8o)) -::::: c} { � E A : J(�, >.. ( 08o)) -::::: 0c}

=

hal (8 °Ac)

·

Then we can conclude as before, using proposition 10.3.1 , proposition 10.3.3 and lemma 10.3. 1. (iii) In the same way, Af is n-regular in any point of aAf = aAc, but Af is not always limited. Meanwhile, aAc is limited and so the proof (in proposition 10.3.3) of � ln PlJ; (X E Af ) ? - J (8Ac, 6 ) + 0 remains valid. Suppose now that 6 E Ac. Ac is included in a standard hypercube Up : = {x E �k : Xi > p} [ - p , p] k (where p is a standard integer) . If we set Sp,i and s�, 2 {X E �k : Xi < - p } then u:; u7= 1 (Sp,i u s�, i ) . I t i s clear that :=

=

:=

On one hand, as Up \ A c is limited and n-regular in any boundary point, we can write

10. Nonstandard likelihood ratio test in exponential families

158

Using lemma 10.3.1 (iii) with E = §uP and F = A( , we have J(aUPu aAc, 6 ) = � � 6). J(8Ac, 6) and so n1 ln Pe 1 ( X E Up \ Ac) -::::: -J(8Ac, O n the other hand, n

P8; ( X E Upc ) �

k

2._) P8; ( X E Sp,i) + P8; ( X E S�,i )) . i =1

We know that � ln P8; (X E Sp ,i) -::::: -J(Sp,2 , 6 ) and � ln P8; (X E 3� ,2) -::::: - J(S�,i ' 6) (it is a classical result concerning half-spaces: cf. chap. 7 in [10]); so 1

1 - E U C ) � max max( - J( SP 6 ) , -J( S , 6)) + 0 - ln P8; (X ·"' P p,; n 2=l... k which is less than - J(aAc, 6 ) according to lemma 10.3. 1(i). Then � ln P8; (X E � -J(8Ac, 6 ) + 0. 1 C � · Fmally, r;, ln Pe 1 ( X E Ac ) -::::: -J(8A c, 6 ). (ii) If 6 E hal(aAc) , then exists �o E §Ac such that J (�o , 6 ) -::::: J(A, 6 ) -::::: 0 and it remains possible to use proposition 10.3. 1 , proposition 10.3.3 and lemma 10.3.1 as for the proof of (i). If 6 tf. hal(aAc) , then 6 E 0Ac \ hal(aAc) according to the proof of (i) and so there is a standard number p such that the ball B (�, p) is included in 0Ac \ hal(§Ac) and consequently included in Ac. Then the nonstandard law of large numbers yields P8; ( X E Ac) = 1 + 0 and finally ln P8; ( X E Ac) -::::: 0. (iv) is obtained in a similar way. D A

A cc )

n

Definition 10.3.3 Let e E 8'. Let he be the function defined on [0, co [ by h e( c ) := J(A(, >.(B)) and ge the function defined on [0, co [ by ge ( c) : = J(Ac, >.(B) ) . ___,

As c Ac is increasing, ge is a non-increasing function and he is a non decreasing function. Let r(B) : = I(e, 8 0 ); >. ( e ) E Ac iff c ? r(B) and so ge ( c ) = 0 if and only if c E [r(B) , co [ and h e( c) = 0 if and only if c E ]o, ,(B)].

Proposition

10.3.5 ___,

___,

h e (c) are S-continuous on (i) The functions (e, c) ge (c) and ( e, c) {e E 8 \ hal(88) : e limited } x ( [O , co [ n £). (ii)

The function ge is decreasing on [0 , /( B)] and he is increasing on [r(B) , co] .

10.4. The nonstandard likelihood ratio test

159

Proof. Let e be standard and e ' ::: e , then .>.. ( e ) ::: .A ( e ' ) ; let c and c' be such that c' ::: c. Then °Ac' = 0Ac and Similarly, J(Ac' ' .>.. ( e' )) ::: J(0Ac' ' .>.. ( e)). The monotonicity of ge is deduced from lemma 10.3.1 (iii) which shows that if c' < c and .A ( e ) tf. Ac' then J(Ac' , .>.. ( e)) > J(Ac, .>.. ( e)). D The continuity and monotonicity of he are proved in the same way. 10.4

The nonstandard likelihood ratio test

Recall that F := { 8o internal : So c 8o c hal(So) } , that the NSLRT is defined by e 0 : 8o E F}, that e 0 is defined through Re 0 and that, for exponential families, Re 0 = J(X, Ao) where Ao : = .A(8o). So, the relation sup Pl) (Re0 ? d) sup Pl)(Re 0 > d) :S: a :S: eEe eEe o o (see §10.2.1) can be written : ==:

sup Pl) ( X E Ar (8o)) eEe o

:S:

a :S: sup Pl) ( X E Ar (8o)). eEeo

Consequently, Ar is the rejection set (relatively to X ) , A� the acceptation set and the test e 0 is randomized if X E aAc . As n is illimited, if d < co (8o) and if d is limited then, for e E 8o, proposition 10.3.4 yields

As supeE e o -J(aAd, .>.. ( e ))

=

-d by definition of Ad, we have d(a, 8o) ::: ln a . - -

n

This is the nonstandard form of a classical result of Bahadur ( [1] , [2] ) . We shall study the NSLRT according as these d( a, 8o) are infinitesimal or appre ciable. In this latter case, we shall have to distinguish the cases d( a, 8o) < co (8o) and d(a, 8o) ? co (8o). In the following, Zo stands for co (So), Ao for ).. (So) and Ro J (X, Ao) for R e 0 ; Ac will denote Ac ( Ao); the notations he (c) and ge (c) will also refer =

10. Nonstandard likelihood ratio test in exponential families

160

to these Ac implies

:=

K

Ac (Ao). Note that if

8o E

F, then the S-continuity of A on

�

Re0 Ro. The following lemmas will be used for the calculation of the risks of the first and the second kind. They are valid not only for exponential families, but also for any statistical family. Here 8" is a standard subset of 8. Lemma 10.4.1 Let T be a statistic, S a (perhaps external) subset of 811 and F : D ____, � + , ( e , c) ____, Fe (c) a standard continuous function defined on a domain D = {(B, c) E 8" x �+ : de � c � d� } such that e • •

the functions e ----+ de and e ----+ d� aTe continuous, for each e E S, Fe is decreasing on [ de , d�] , for each e in s and c nearly standard in l de ' d� [ ' 1 -n ln Pe(T � c) � Fe ( c) .

Then, for each e nearly standard in S and Co standard in [de, d� [ Proof. PRIJ (T � Co) = supw =@ Pe (T > Co + w). Now write 1 - ln Pe (T � Co + w) � Fe (Co + w) � o Fe (Co + w). n

So, as oFe = Fe0 ( where Bo we have

:=

oe) is a standard continuous decreasing function,

� ln PR(j (T� C0 ) = J -oo, �� Fe (Co + w) [ = ] -oo, Fe (Co) + 0[ .

Similarly, 1 -n ln PR e (T � C0 ) = ] -oo, ° Fe (Co) + 0[ = ] -oo, Fe (Co) + 0 [ , -n

for if not, there exists T) < 0,

T)

�

0 such that

1

Vr:: E hal + ( o ) , -n ln Pe(T � Co + r:: ) > o Fe (Co) + TJ

where hal + ( o )

:=

{c: E hal ( O ) : � 0} . r::

10.4. The nonstandard likelihood ratio test

161

Then { E � + : � ln PlJ (T ? Co + ) > ° Fe (Co) + 7]} is an internal set in cluding hal + ( o ) and consequently ( by Cauchy ' s principle ) including an interval [0, wo] where wo is standard. But c:

c

� ln Pe (T ? Co + wo) :-: Fe (Co + wo) :-: ° Fe (Co + wo) n with °Fe(Co + wo) < °Fe (Co) which is a contradiction, because both these numbers are standard. Finally D Lemma 10.4.2 Let T be a statistic, S a (perhaps external) subset of 811 and G : E ----+ � + , ( e , c) ----+ Ge (c) a standard continuous function defined on a domain E = { (B , c) E 8" x � + : e e ::; c ::; e� } such that •

the functions e

____,

ee

and e

____,

e� are continuous,

•

for each e E S, Ge is increasing on [ee , e�] ,

•

for each e in s and c nearly standard in l ee ' e� [ ' 1 - ln PlJ (T ::; c) :-: Ge (c) . n

Then, for each e nearly standard in S and Co standard in [e e , e � [ PRIJ (T � Co) = L: e nGe ( Co ) + n0 . Proof. PR� (T � Co)

=

inf t =©: PlJ (T ::; Co + t). Now write

� ln PlJ (T ::; Co + t) :-: Ge (Co + t) :-: 0Ge(Co + t). n So, as 0Ge is a standard continuous increasing function, we have

1 -n ln -n PRe (T � Co) =

]

- oo ,

]

. Ge (Co + t) = ] - oo , Ge (Co) + ] . 0 tmf =@

As in lemma 10.4.1 , one establishes that PR� (T � Co) finally PRIJ (T � Co) = L:enGe ( Co ) + n0 .

=

PRIJ (T � Co) and D

10. Nonstandard likelihood ratio test in exponential families

162

10.4.1

In a

-n

Proposition (i) if Ro

�

infinitesimal

10.4.1

For l�a

0 then
=

�

0,

0 otherwise
=

1;

(ii) for e E hal(Go ) , the risk of the first kind is exactly equal to £e -n @ -nhe ( 0l ; (iii) for a limited e E 8 \ (hal( 88) u hal( Go))! the risk of the second kind is exactly equal to £ e - nge ( O ) + n° . Proof. (i) If Ro 7: 0 then \i8o E F, Re0 7: 0 and so \i8o E F, e0 = 1 because d( a , 8o) - 1�a 0 and finally .-1 (X) E Go U { >.-1 (X) } E F. Then, e0 0 for this 8o Go U p-1 (X) } . Finally,

�

�

=

=

=

=

:=

�

____,

:

p Rlf ( Ro � 0 )

=

£e -n@ -nhe ( O ) .

(iii) According to proposition 10.3.4 (i) and (ii) , for a limited c :::; co , we have � ln Pl) (Ro :S: c) -ge (c) . For any limited e E 8 \ (hal(88) U hal(Go)) , there exists a standard compact set 811 C 8 \ hal(88). such that e E 8". According to proposition 10.3.5, (e, c) -ge (c) is continuous on 8" [O, co (8")] and for e E 8", - ge is increasing on [o, , (e)] . We can now apply lemma 10.4.2 with D { (e, c) E 8" x � + : 0 :::; c :::; I ( e) } , S = 8" \ hal( Go) , T = Ro , Ge = -ge , Co = 0, t o obtain P RIJ (Ro � 0) �

____,

X

=

£e-nge ( O ) + n0 .

0

Remark 10.4. 1 Let e E hal( Go); the nonstandard version of the law of large numbers yields PRIJ (X E hal(>.(e)))

=

1 + 0 ==? PRIJ (X E hal(Ao))

=

1 + 0.

As the reject criterion is X tf. hal ( Ao), it is not logical to take a appreciable, but infinitesimal.

10.4. The nonstandard likelihood ratio test 10.4.2 0 �

1

n

163

l in a l � co

If co oo , we shall suppose that I !:a l is limited because some preliminary results (in § 10.3) are no longer available (for example Ac is no longer limited if c 00 ) . =

:::

Proposition 10.4.2 For o ; l l:a l ; zo and l l:a l limited, p1Lt C 0 ( - 1�a ) . (i) If Ro � C then
=

=

Proof. (i) If Ro � C then V8o E F, R e0 � - 1�a , so Re0 > d(a, 8o) and e 0 = 1 . Finally, e 0 = 0. If not, the external set F is included in the internal set { 8o : So C 8o C K 1\ e0 -/=- 0 } , and using the Cauchy ' s principle, there exists an internal set 8o � hal( So) such that e 0 -/=- 0, i.e. Re 0 2: d(a, 8o) ::: C. Denoting by So the shadow of this 8o we have R e0 ::: Re 0 � C. Go and So are two distinct relatively compact standard sets such that aso c eg and then such that 8Ao c :Ag (where Ao : = >.(So)) . According to lemma 10.3. 1 (iii) , J(�, Ao) < J(�, Ao) for each � tf. Ao and then Re = J(X, Ao) ::: J ( 0 X, Ao) < J( 0 X, Ao) ::: J (X, Ao) = Ro . o =

�

As J ( 0 X, Ao) and J ( 0 X, Ao) are both standard numbers, this contradicts R e � C and Ro :;, C. o (ii) The proof is similar to the proof of proposition 10.4. 1 (ii) , with C0 = C instead of Co = 0. We note that the risk of the first kind calculated in §10.4.1 is a particular case of this result. (iii) For the first result, the proof is similar to the proof of proposition 10.4.1 (iii) , with Co = C instead of Co = 0 and S = { e E 8" \ hal( So) : I ( e) � C} instead of S = 811 \ hal(So ) . For the second, we choose a standard compact set 8" c 8 \ hal( 8 8) . such that e E 8". In fact, the proof of (ii) is valid with S { e E 8" : I ( e) :;, C} and one obtains =

p R{f ( Ro

� C) = Le-n@-nhe ( C)

D

10. Nonstandard likelihood ratio test in exponential families

164

10.4.3

1 - l ln a l � co n

The number co is standard because 8o is standard and co -/=- oo ; A is a limited set for if not J(�, Ao) is unlimited because � is unlimited (see the proof of lemma 10.3.2) and then co (So ) � sup�E A J(�. Ao) is unlimited. We shall suppose here that for each subset 8o of e, the function e J(e, 8o) is continuous on 8. This assumption is verified by classical exponential families. Then for any 8o , co ( 8 o) = sup�E A J(�, A( 8 o)) . For any 8 o C K, if co ( 8 o) � � lln a l then d(a, 8 o) = co( 8 o) and so e0 < 1 because A d = A. If X E A0 ' then eo = 0 and if X E a A, either eo = 0, or 0 < e0 < 1 (there is a randomization) ; the problem is not simple, as the example of multinomial distributions shows (cf. [9]). We note that Kallenberg avoids this case in theorem 3.3.2 of [13] . ____,

Proposition

10.4.3

For � lln al ?; co,

(i) o < 1 , (ii) if X E A0 then o 0 . =

Proof. As
e0 < 1. If not, \i8 o E F, max e0 = 1 (which means that e0 takes value 1), and by Cauchy's principle 3So ::) hal(So), such that max S o = 1 . According to lemma 10.3. 1 (iii) , for any � E A , J(0�,0Ao) < J(0�,0Ao) because hal(Ao) C A o := >. (So). Both these numbers being standard, the relation implies J(�, Ao) � J(�, Ao) and then co (So) � co(So) . So co(So) � l lnna l which contradicts max e�- o = 1 . (ii) Let X E A0 ; if \i8 o E F, e0 -/=- 0, by Cauchy's principle 3So ::) hal( So) l l n l a such that s 0 -/=- 0. But co(8o) � co ::: ----;;;:- and so, as X E A0 , s 0 = 0, which D contradicts So -/=- 0. �

Remark 10.4.2 Let e be a limited element of e \ (hal(88) u (So)); the non standard version of the law of large numbers yields P RIJ ( X E hal( A( B))) = 1 +0 and thus PRIJ ( X E A0 ) = 1 + 0. So, for this e, the risk of the second kind is equal to 1 + 0: if � lln al � co, the NSLRT is not consistent (in the classical

10.5. Comparison with nonstandard tests based on X

165

theory, a sequence of tests ( (f?n)n ::>no {where n is the size of the sample} is con sistent if for any e E 81 , lim17_,= Ee ( l - (f?17 ) = 0: then, for n "" oo, the risk of the second kind is infinitesimal). As the example of multinomial distributions shows (cf. {9}), it seems impossible to give general results about the value of
Remark 10.4.3

=

10. 5

Comparison with nonstandard tests based on X

In the classical theory, for testing e E 8o against e E 81 , one fixes the level a and one tries to find a test that has maximum power in any e E 8 1 . However, such uniformly most powerful (U.M.P.) tests exist only in a few exceptional cases: for example for !-dimensional exponential families, if 8o : = { e E 8 : e :S: Bo } . If no U .M.P. exists, one studies an asymptotic approach, for sample sizes tending to infinity. One of the tools to compare asymptotically the power of two tests of the same hypothesis is the Bahadur relative efficiency. In a more general framework (not only for exponential families) Bahadur ([1] or [2] , see also [12]) has proved that the likelihood ratio test is at last as "efficient" as any test based on a rejection criterion of the form T > c (with randomization if T = c if necessary) , where T is a statistic built with the n-sampling. However, it may be that to get the same power in a point e E 8 \ 8o the size of the sampling for the likelihood ratio test has to be larger than for the test based on T: thus was introduced the notion of "Bahadur deficiency" (cf. [13]). We shall prove that the NSLRT is uniformly the most powerful in a large family of nonstandard tests if a � 1 and I�a � co . The standard likelihood ratio test is not as powerful in the corresponding standard family. Let ((f?e0 ) be a family of tests of level a (where (f?e 0 tests He0 : e E 8o against e tt 8o) . The corresponding level-a nonstandard test (f?O is defined by (f?O

=

: inf { (f?e 0

: 8o E F }

and when we shall refer to a "nonstandard test", it will be defined thusly. 10.5.1 R

Regular nonstandard tests

Suppose that for each 8o , R (a, 8o) such that

:=

y? e 0

is a test of size a defined by a rejection set

sup Pl) ( X E R) :::_: a :::_: sup Pl)( X E R u aR) .

liE8o

liE8o

10. Nonstandard likelihood ratio test in exponential families

166

If R is n-regular in any point of aR and if aR is limited , rpe0 will be called "regular". The most famous example of such a test is the chi-squared test since the relative frequency F is in fact the sample mean for the multivariate distribution. In order to get R -/=- 0 we shall suppose that � lln a l � co and so � lln a l � co(8o) for 8o E F. If 'P8 o is regular for any 8o E F, 'PO will be said to be "regular". We prove now that and rp, it is clear that if � rp, then is more powerful than rp -in the classical sense- because Ee ? Eerp for any e.)

Proposition 10.5.1 For a � 1 and � lln a l � co , if 'P O is a regular level-a non standard test, then .(e)) , so infeE8o J( [JR, >. (e)) J( [JR, >. ( 8o)) verifies J( fJR, >. (8o)) :-: _ I�a . Put c := o ( l !r�a l ) . The region where 'PO -/=- 0 is included in the external subset R'PO R(a, 8o) . We prove that R'P O c { � E A, J(�. Ao)� c} 8 oE.F {. -1 (�o) . If �0 E hal(Ao), then eo E So since A is a standard diffeomorphism from 8° onto A0 . L Suppose Po := d(�o , 8R(a, 8o)) > vn and so B(�o, Po) n A c R( a , 8o). As A contains the support of Pe0 , the nonstandard law of large numbers yields PlJo (X E R( a, So)) = 1 +0 which is impossible because PlJo ( X E R( a, So)) � a. Suppose then Po � Jn · Let 6 E fJR( a , Go) n hal(�o); the n-regularity at �1 implies that 3 6 E hal(�I ) , ::Jp E ] Jn 0] such that B(6 , p) C R( a, Go) . As d(�o , 6) :-: 0, 6 E >.(So) and e2 := A-1 (6) E So. As previously, Pl); ( X E R( a, So)) = 1 + 0 which is impossible because PlJ; ( X E R( a, So)) � a. If �o t/. hal(Ao), we can write J (�o, >.( 0So)) :-: J(0�o , >.(0So)) J(�o , >.(So)) < J(0�o , >. (So)) :-: J( .;o, >. (So)) < c. �

-

=:

:=

n

=

,

•

A

�

-

,

•

�

10.5. Comparison with nonstandard tests based on X

167

But J( 0 �o , >. ( 0S o)) and J( 0 �o , >.(So)) -/=- 0 are both standard numbers; as >. ( So) c ( >. ( 0S o ) ) 0 we have J( 0.;o , >.( 0S o)) < J ( 0 �o , >.(So)) and so J (�o , >. ( Go)) ;:, c. 1 c: this contradicts On the other hand J(R(a, G 0 ) , >. (Go)) : ::o

�o E R(a, Go) .

10.5.2

Case when

80

a

-

::o

D

is convex

For any 8o E F and any e 1 E 8 \ hal ( 8o) , there is a level-a test \[!de o which is the most powerful at e 1 (for any level-a test cp testing e E 8o against e E 8 1 , Ee 1 cp ::; Ee 1 W�0 ) . This test is defined in the following way (see [13, p. 49]) let �

is the least favorable distribution. Then \[!�0 = 1 if f(X) < C and W�0 = 0 if f (X) > C where C is a constant depending on a, 8o , e1 . So, the rejection set (relatively to X) is { � E A : f ( �) < C}.

where

T

Definition

10.5.1

For e 1

E8

\ hal( So ) , let W� 1

:=

inf { W�0

: 8o E

F} .

We might think that W� 1 is "the most powerful at e1 nonstandard level-a test of Ho"; but in fact, in some cases,
Proposition

10.5.2 Jf Go is

convex, then for any e1

Proof. Let e1

E 8 \ hal( So) .

We prove that the rejection sets

A := { � E A :

!(�)

E 8 \hal (So) ,
W�0 .

< C}

are n-regular and then the proposition 10.5. 1 yields the result. The set So is convex and d ( e1 , Go) rj: 0 and so exists a standard cone � C �k and a standard positive number > 0 such that if u E � and ll u l l = 1 then ( eo e1 I u) < for any eo E So . Therefore, for any 8o E F, for any e E 8o and for any u E � \ {0} , (eo - e1 I u) < 0. Choose 8o E F. If u E � \ { 0} , f(x + u)

c

-c

-

le

:= .

o

ex

p ( n ( eo e l I u)) ·

. exp

-

(

n

( ( eo

-

e1 I x ) ?jJ ( eo) + 1jJ (e1 )) -

)

T

( deo)

< f(x).

168

10. Nonstandard likelihood ratio test in exponential families =

Let now a E 8A, then f(a) C. For any u E � \ {0}, f(a + u) < C, so a + (� \ {0}) C A . It is now possible to choose a point b E a + � such that d(a, b) = � ( for example) and such that B(b, nf12 ) C a + � - If a t/. hal ( 8A ) , this proves that A is n-regular in a. If E hal ( 8A) , we have to use Cauchy's principle like at the end of the proof of proposition 10.3. 1 D a,

References

[1] R. R. BAHADUR, An optimal property of likelihood ratio statistic, in Pmc. Fifth Berkeley Symp. Math. Statist. Prnb. , Vol. 1 , 1965. [2] R. R. BAHADUR Some limit theorems in statistics, SIAM, Philadelphia, 1971. [3] R.R. BAHADUR and S . L . ZABELL, "Large deviations of the sample mean in general vector spaces", Annals of Probability, 31 (1979) 587-62 1. [4] I . VAN DEN BERG, An external probability order theorem with applica tions, in F. and M. Diener editors, Non Standard Analysis in Practice, Springer-Verlag, Universitext, 1995. [5] J . B oSGIRAUD, "Exemple de test statistique non standard, risque ex terne", Pub. Inst. Stat. Paris, 41 (1997) 85-95. [6] .J . BoSGIRAUD, "Tests statistiques non standard sur des proportions", Pub. Inst. Stat. Paris, 44 (2000) 3-12. [7] .J . BoSGIRAUD , "Tests statistiques non standard pour les modeles a rap port de vraisemblance monotone ( premiere partie ) ", Pub. Inst. Stat . Paris, 44 (2000) 87-101 [8] J . BoSGIRAUD, "Tests statistiques non standard pour les modeles a rap port de vraisemblance monotone ( seconde partie ) ", Pub. Inst. Stat. Paris, 45 (200 1) 61-78. [9] .J . BoSGIRAUD , "Nonstandard chi-squared test", Journal of Information & Optimization Sciences, 26 (2005) 443-470 . [10] L . D . B ROWN, Fundamentals of Statistical Exponential Families, Institute of Mathematical Statistics, Hayward, California, 1986. [11] F. DIENER and G. REEB, Analyse non standard, Hermann, Paris, 1989. [12] P. G ROENEBOOM and .J . OoSTERHOFF, "Bahadur efficiency and proba bilities of large deviations", Statistica N eerlandica, 3 1 ( 1977) 1-24.

References

169

[13] W . C . M . KALLENBERG , Asymptotic Optimality of likehood ratio tests in exponential families, Mathematisch Centrum, Amsterdam, 1978. [14]

F. KouDJETI and I. VAN DEN B ERG, N eutrices, external numbers and external calculus, in F. et M. Diener editors: Non Standard Analysis in Practice, Springer-Verlag, Universitext, 1995.

[15]

Radically Elementary Pr-obability Theory, Princeton Univer sity Press, 1987. E . NELSON ,

[16] S . R . S . VARADHAN , Large Deviations and Applications, SIAM, Philadel phia, 1984.

A finitary approach for the representation of the infinitesimal generator of a markovian sem1group Scherazade Benhabib *

Abstract Ill, where the central question was: under suitab le regularity condit-ions, what is t he form of the infinitesimal generator of a Markov semigroup?

This work is based on Nelson's paper

the elementary approach using 1ST [2] , the idea is to replace the ![{ with a finite state space X possibly containing an unlimited number of points. The topology on X arises

In

continuous state space, such as naturally

from

the

probability theory.

For :1: E X, let. 'I.r.

be

the set

of all

h E M vanishing at x where M is the multiplier algebra of the doma.iu 'D of the iufiui tcsimal generator. To desaihe the st.rneLm:e of the scmigroup

l::y EX\{x} a(a; , y) h(y) so that of the points far from x appears separately. A defini tion of the quantity O:ah ( x ) = l:yEF a.(;t, y) h(y) is given using the least upper bo und of the stun..<; on all intemal sets T-V included in the external set F. Tl:ris leads to the characterization of the

generator A, we want

to

split Ah(a;)

the contribution of the external set Fx

global part of

11.1

=

the infiuitesimal generator.

Introduction

Let X be a finite state space possibly containing an uulimit.ed unrnbcr N of points. For all x in X, let a(x, y ) be a real number for each y on X such that a(x, y) is posi tive if y -=/=- x and L y a(x, y) = 0. Then A = (a(x, y)) is a finite N x N matrix, with positive off- diagonal elements and row sums 0. L n t'��" exists and is a markovian semigroup with infinitesimal Thus p t generator A. =

•EIGSI, 17041 La Rochelle, France. scherazade .benhabib@eigs i . fr

11.1. Introduction

171

Then we can define externally the domain of definition of A and its multi plier algebra as follows. Definition 1 1 . 1 . 1

A is the external set

The domain D of definition of the infinitesimal generator

D = { f E � X / f limited and Af limited}

(11.1.1)

And its multiplier algebra is the external set M = { f E D/ Vg E D fg E D}

(11. 1.2)

The topology on X arises naturally from the probability theory and a proximity relation is defined on X, more particulary from the way the functions in the multiplier algebra M of the domain D of the infinitesimal generator act on X. Definition 1 1 . 1 .2

and only if

Let u and v be in X, we say that u is close to v (u Vh E M h(u)

�

�

v) if

h(v)

Thus, the elements of M have a macroscopic property of continuity for mally analogous to the S-continuity. With this relation, we define the external sets :

�

Cx

=

X}

(11. 1.3)

Fx

= {y E X y rf. X}

( 11.1.4)

{y E X y

of points close to x and :

of those far from x. For each x in X, (a(x, y)) are positive real numbers if x -/=- y and can take unlimited values. Let Ix be the set of all h in M vanishing at x, and for h in Ix let us define ( 11.1.5 ) Ah(x) = L a(x, y) h (y) yE X \{x}

To describe the structure of the semigroup generator, we want to split Ah(x) so that the contribution of the points far from x in X. appears separately. One is tempted to define directly the quantity

aa h (x) = L a(x, y)h (y) yEFx

( 11.1.6 )

11. The infinitesimal generator of a markovian semigroup

172

which alas has no meaning because of the external character of the set of indices F:r Nevertheless, we will show in Section 1 1.2 how we can attach a definite meaning to it. That leads in Section 1 1 .3 to a definition for an ax-integrable function. Theorem 11.3.1 establishes the representation of Ah(x) for functions in M which have a zero at x of the third order. This is the global or integral part of A. Theorem 11.3.2 and its corollary extend the notion of ax-integrable to functions which have a zero of the first or the second order. 11.2

Construction of the least upper bound of sums in 1ST

Let E be an external subset of X and f a positive function defined on X. If there exists a standard natural number no such that for all internal sets W contained in E we have

Lemma 1 1 . 2 . 1

L f(y) :S: no

(11.2.1)

yEW

then there exists a unique up to an infinitesimal least number a( ! ) such that

L f(y) � a( ! )

(11.2.2)

yEW

for all internal sets W contained in E. Proof. Let no be a standard natural number such that for all internal sets W contained in E we have ( 11.2.1). By external induction, there is a least no such that (1 1.2.1) holds. Now for each standard natural k there is a largest natural number j with 0 :S: j :S: 2 k such that for all such W (11.2.3) L f(y) :S: no - k yEW

1

Let a(k) no - ik . Then for each standard k we have a standard a ( k ) , so there is a standard sequence k -+ a( k) . This sequence is decreasing, therefore it has a standard limit a (f) . Moreover, for all standard k and all such W , D ( a (f) - Ly Ew f(y)) :S: 21k · This proof uses very elementary tools. It can easily be formulated in the minimal non-standard analysis introduced by Nelson in [3] in which "standard" =

1 1.2. Construction of the least upper bound of sums in 1ST

1 73

a(O)

Step 0:

no - 1 Step 1:

no - 1

/,� i)\ : : J)\ n

(2

Step 2:

no - 1

no a(1) no a(2) no

Figure 11.1 applies only to natural numbers. On the contrary, if all the force of 1ST is used, one could check that s a (f) sup x E �+ ; 3 intw C E L f(y) :S x

=

{

yEW

}·

Figure 11.1 exhibits the way the sequence is built. Definition 1 1 . 2 . 1 1.

If f is a positive function defined on X, and if the conditions required for Lemma 1 1 . 2. 1 are verified the quantity (11.2.4) af = L f(y) yEE

is defined to be the standard limit a (f) . 2. If the conditions for Lemma 1 1 . 2. 1 are not verified, let af be the formal symbol x and say that (1 1 . 2. 2) is not defined. 3. In the general case, let f f + - f - be the decomposition of f into the difference of two positive functions. Say that af is defined in case both aJ + and a1- are defined, and let a f be their difference:

=

Notation 1 1 . 2 . 1

of af .

For each x in X, if f depends on x, write aJ (x) instead

11. The infinitesimal generator of a markovian semigroup

174

11.3

The global part of the infinitesimal generator

For x E X and h E M , let f = a(x, ·)h, then h is said to be ax-integrable if and only if aJ (x) is defined.

Definition 1 1 . 3 . 1

Notation 1 1 . 3 . 1

In that case write aa h (x) instead of aJ (x) .

Definition 1 1 .3.2 1.

h is a function in Ix if and only if h E M and h(x)

2. h is a function in I� if and only if h E M and h limited and eb gk ·in Ix .

= 0.

=

I:f=l e k gk with p

3. h is a function in I� if and only if h E M and is of the form h Ll e k fk gk with e k , fk , gk E Ix and p limited. 4. We say that h has a zero at x of the first order when h E Ix, of the second order when h E I�, of the third order when h E I� . If h has a zero at x of the third order, then h is ax-integrable and Ah ( x) is infinitely close to aah ( x) .

Theorem 1 1 . 3 . 1

Proof.

The proof goes through the following three steps:

1 . Let h be in I� , therefore it is of the form h and fk , gk in Ix .

=

I:i f� gk with q limited

,

2. I f f g E I.:r then P g is ax-integrable and so is h. These steps are proved as follows:

1. Let h E I�, as 1 1 1 efg = 4 (e + f) 2 g - 2 e 2 g - 2 f 2g, therefore h is of the mentioned form.

I L a(x, y)f2 (y)g+ (y) I w

:::;

l l g i i A(f2 ) (x)

1 1.3. The global part of the infinitesimal generator

175

As l l g l l and A(f 2 ) are limited, they are both infinitely close to a standard number. Taking the integer part of o (l lgi iA(f2 ) (x)) plus 1, we get the existence of the natural number no required by Lemma 11.2.1. The same argument applies to f 2g - . Thus aa( J2 g + ) (x ) and aa(j 2 g-) (x ) exist and as O!a( J 2 g) (x ) = aa( J 2 g + ) (x ) - aa(j 2g-) (x ) , then f2 g is ax integrable, and so is h. 3. Let (3 » 0. Since g E Ix there is an internal set W of Fx such that g + (y) :::; (3 for y E W 0 . We have

I A(f2 g + ) (x) - aa( J 2g+) I

=

IL

yE X \{x}

a (x , y) f2 (y)g + (y) - aa( J 2g + )

I

L a (x , y)f2 (y)g+ (y) - L a (x , y)f2 (y)g+ (y) L L a(x, y)f2 (y) :::; (3 L a (x , y) f 2 (y)

<

yEW

yE X \{x}

yEWr"\{x}

:::; {3

=

yEW0\{x} yE X \{x}

(3A(f2 ) (x)

Since A(f2 ) is limited and (3 » 0 is arbitrary, A(f 2 g + ) aa( J 2 g + ) (x ) . The same applies to f 2 g - . Then A(f 2 g - ) (x) ::: aa( J2 g-) (x) and aa( J2 g) = aa( J2 g + ) (x) + aa( J2 g-) (x) . D Thus A(f2 g) (x) ::: aa( J 2g) (x ) . Therefore A(h) (x) ::: a(ah) (x ) . :::

If h zs positive and has a zero at x of the first order, then h is ax-integrable and ah is less or infinitely close to Ah(x) . Theorem 1 1 .3.2 Proof.

Let W be any internal subset of Fx and h a positive function in I� , so

0 :::;

L a (x, y)h (y) :::; A(h (x w

)

)

As A (h) (x) is limited, we argue as in Theorem 11.3.1 and apply Lemma 1 1.2.1. One concludes that h is ax-integrable. As the inequality above holds for all internal subset W of Fx , we get that aah (x ) is less or infinitely close to A(h) (x) . D

176

11. The infinitesimal generator of a markovian semigroup

Corollary 1 1 . 3 . 1

If h has a zero at

integrable.

x

of the second order, then h is ax

Let h be a function in I� , as fg = -41 (e + g) 2 - -41 (e - g) 2 h is also of the form h = Lk = l Pk ff with Pk = ±1, q limited and f'f in I"} . Consequently we can apply Theorem 1 1 .3.2 and h is ax-integrable. D

Proof.

1 1 .4

Remarks

The next goal will be to find an 1ST construction of the pure diffusion part of the generator of the semi-group and to complete a work already in progress concerning the rescaling of the time parameter for the markovian semigroup pt .

I would like to thank Professor E. Nelson for the initial suggestion of this problem and numerous enlightening discussions. I would also like to thank Professor G. Wallet of the University of la Rochelle for his support.

Acknowledgment

References

[1]

E. NELSON, " Representation of a Markovian Semi-group and Its Infinitesi mal Generator", Journal of Mathematics and Mechanics, 7 ( 1958) 997-988.

[2]

E. NELSON, "Internal Set Theory: a new approach to nonstandard analy sis", Bulletin Amer. Math. Soc., 83 ( 1977) 1165-1 198.

[3]

E. NELSON,

[4]

Press, 1987 E.

B.

Radically Elementary Probability Theory, Princeton University

DYNKIN,

Markov Processes, Springer-Verlag, 2 vols .. 1965.

[5] W. FELLER, An Introduction to Pmbability Theory and Its Applications, John Wiley & Sons. [6] I . VAN DEN BERG, Non Standard Asymptotic Analysis, Lectures Notes in Mathematics, 1249 , Springer-Verlag, 1987.

On two recent applications of nonstandard analysis to the theory of financial markets * Frederik S. Herzberg

Abstract Suitable notions of "unfairness" that measure how far an empirical dis counted asset price process is fi·om being a martingale arc introduced for complete and incomplete-market settings. Several limit processes are i nvolved each tirue, prompting a nonfltandard approach to the this concept. This leads to

( rather

traded

than a

011

martingale

several stock

a

analysi

s of

existence result for a "fairest price measure"

measure ) for an

asset

exchanges. This approach

describing the impact of

12 . 1

an

that is simultaneously also proves useful when

currency tr ansaction tax.

Introduction

Nonstandard analysis is most successful when it concerns itself with

ematical concepts that. i nvolve several limit

math p ro ecss cs in a non-trivial way. One

such exampl e is the quantification of a stochastic process's distance from being a martingale.

between the existence of a martingale mea..sure for a discounted asset price process and the non-existence of atbitrage opportunities, there is a natural economic interpretation to the compru·ison of two positive stochastic processes in regard to "how fru·" they actually ru·e from being a martingale. Vole will l'encler two economic questions related to such conceived "unfairness" mathematical : (1) Is there a "fairest price measure" (rather than a m�u·tingale measure ) for an asset that is sinmltaneously traded at mul t iple stock exchanges? (2) Does a currency transaction tax imposed on a currency that is sub j ect to herd behaviour among traders have a fairnessGiven the well-known correspondence

· Mathematical Institute, University of Oxford, Oxford OXl :3LB, United Kingdom. herzberg@maths . ox . ac . uk / herzberg@wiener . iam . uni -bonn . de

12.

178

Applications to the theory of financial markets

enhancing impact on the currency exchange rates preceding a financial crash? Both questions will be shown to allow for an affirmative answer. Techniques from nonstandard analysis have already been successfully ap plied to mathematical finance in the work of Cutland, Kopp and Willinger [5] . This contribution is based on two more recent papers by the author [8, 9] . 12.2

A fair price for a multiply traded asset

In this Section, we will en pass ant motivate notions of "unfairness" for both complete-market as well as incomplete-market settings. Theorem 12.2.1 Let n E N, p > 0 and c, N > 0 . Consider an adapted probability space (r, Q, IP') and B[O, 1 ] ® Y1-measurable geometric Brownian mo tions with constant multiplicative drift g� : r x [0, 1 ] �d·n , i E { 1 , . . . , n}. Then there exist processes gi , i E { 1, . . . , n} on an adapted probability space (D, F, CQ) , equ.ivalent to the processes g� on r (in the sense of adapted equiv alence [7}), such that there is a probability measure Mm on n in the class of measur-e s Q probability measure, VA E F1 iJ · CQ(A) :S: Q(A) :S: N CQ(A) , > vz .../.r. J. E { 1 , . . . , n } 0 I( ) I s c minimising ____,

1 1 CovQ((g,)s, (gJ)Jd .t.Q 9t9J s

}

·

w·

_

�

in the class of Loeb extensions of finitely additive measures from (*C) (D, G) (G being an arbitrary lifting of g). Analogously, there is a probability measure Mn minimising

(where nn ( Q, g) is defined to be +oo if the derivative in 0 in this definition does not a.s. exist as a continuous function in t) in the class of Loeb extensions of finitely additive measures from (*C) (D, G) . Moreover, inf, mn (·, g) :S: inf mr ( , g) C(D g )

as well as

C(r,g)

·

inf nr (·, g ) . inf nn (-, g) :S: C(r,g)

C(D,g)

12.2. A fair price for a multiply traded asset

1 79

The original paper [8} does not state the Theorem and the subs equent Lemmas as precisely as it is done here, although the exact result becomes clear from the proofs in that paper.

Remark 12.2. 1

The proof for this Theorem can be split into the following Lemmas which might also be interesting in their own right.

Using the notation of the previous Theorem hyperfinite adapted space n [12},

Lemma 12.2.1

12. 2. 1,

for any

inf mn (· , g) :::; inf mr (·. g) . C (r,g)

C(D.g)

Under the asswnptions of Theorem 12.2. 1 and choosing n to be any hyperfinite adapted space, the infimum of mn (- , g) in the class of Loeb extensions of finitely additive measures from (*C) (D, G) is attained by some measure Mm E C(D. g) . Proofs for both of these Lemmas can be found in a recent paper by the author [8] . Although they look very similar, it is technically slightly more demanding to prove the following two Lemmas ( which in turn obviously entail the second half of the Theorem, i.e. the assertions concerned with the map n) . Lemma 12.2.2

Using the notation of the previous Theorem hyperfinite adapted space n [12},

Lemma 12.2.3

12. 2. 1 ,

for any

nr (· , g ) . inf nn (· , g) :::; C inf (r,g)

C(D.g)

Under the assumptions of Theorem 12.2. 1 and choosing n to be any hyperfinite adapted space, the infimum of nn (·, g) on C(D, g) is attained by some measure Mn E C(D, g) .

Lemma 12.2.4

Easy results are Lemma 12.2.5 A semimartingale X is a ?-martingale on n if and only if 1 mn (P, x ) = 0 . The function mn (P, · ) v on the space of measurable processes of n satisfies the triangle inequality and is 1-homogeneous. For p = 2, it defines

an inner product on the space [.

:=

{x : n

x

[0, 1]

--+

�d

:

x

measurable, rn(P, x )

which becomes a HilbeTt space by this constr-uction.

<

+oo }

12. Applications to the theory of financial markets

180

A semimartingale X is a ?-martingale on n if and only if The function nn (P, ·) on the space of measurable processes of n remains unchanged when multiplying the argument by constants (scaling invariance) .

Lemma 12.2.6 nn (P, x) = 0.

We can generalise this to the following Definition 12.2.1

Let D be an adapted probability space. Define

£ (Sl, �d ) :

:=

{ X : Sl

X

[0, 1] --+ �d :

X measurable } .

A function Y £ ( n, �d ) --+ [0, +oo] is an incomplete-market notion of un fairness if and only if it satisfies the triangle inequality, is 1-homogeneous, and assigns 0 to a semimartingale y if and only if y is a martingale. Y is said to be a complete-market notion of unfairness if and only if it remains unchanged under multiplication by constants and Y vanishes exactly for those semimartin gales that are in fact martingales. 1

By Lemmas 12.2.5 and 12.2.6 respectively, the function mn (Q, - ) P (for a fixed probability measure Q) is an example for an incomplete-market notion of unfairness, whereas similarly nn (Q: - ) , again for a fixed probability measure Q, is an example for a complete-market notion of unfairness.

The distinction between notions of unfairness for complete and incomplete mar-kets can be justified by the following reasoning: If it is, under assumption of completeness, possible to buy as much of an asset as one intends to, multiplication of the discounted price process by a constant does not enhance the arising arbitrage opportunities at all; therefore, for complete markets, a suitable notion of unfairness should be scaling invariant in that it does not change under multiplication of the argument - which is conceived as being a discounted price process - by constants.

Remark 12.2.2

12.3

Fairness-enhancing effects o f a currency transaction tax

In this Section, we shall analyse the empirical price of an asset that is subject to herd-behaviour preceding a financial crash and how the resulting distortion can be mitigated through imposing a transaction tax. This is of particular relevance if the crashing asset in question is a currency, given the massive (usually destabilising and therefore adverse) macroeconomic conse quences of such financial crises.

12.4. How to minimize "unfairness"

181

We shall restrict our attention to a model of currency prices where extrap olation from the observed behaviour of other traders, up to white noise and constant inflation, completely accounts for the evolution of the currency price. This is to say that, in analogy to a Nash equilibrium, we assume that every agent is acting in such a manner that he gains most if all other agents follow his pattern. In our model this pattern will consist in using some average value of past currency prices as a "sunspot", that is a proxy for a general perception that depreciation or appreciation of the particular currency in question is due. In order to introduce the model for the pre-crash discounted currency price, consider a � ----+ � ' a piecewise constant function, such that a = a(1) > 0 on (O, + x) a = a(- 1) < 0 on (-oo, O). We assume that the logarithmic discounted price process x ( v ) is governed, given some initial condition that without loss of generality can be taken to be x ( v ) 0, by the stochastic differential equation :

=

(

x �v ) = a x �v ) - L Pu . x ((�2 u ) vo uEI

d

J2

dt + J . dbt - 2 '

)

X{

(v ) _ '>'

I xt

. (v)

dt I } (12.3. 1)

L.. n E I Pn x ( t- n)VO ;:o:v

where r > 0 is the logarithmic discount rate (assumed to be constant) , b is the one-dimensional Wiener process, (Pu)uEI is a convex combination - i.e. I C (0, + oo) is finite, Vu E I Pu > 0 and L uEI Pu = 1. The parameter v depends very much on the tax rate we assume. If p is the logarithmic tax rate and T the expected time during which one will hold the asset (that is the expected duration of the upward or downward tendency of the stock price) , one can compute v as follows: v = T · p. The equation (12.3.1), is of course first of all only a formal equation that thanks to the boundedness of a can be made rigorous using the theory of stochastic differential equations developed by Hoover and Perkins [11] or Albeverio et al. [2] for instance. We will introduce the following abbreviation: n;,(v) - X{ 1 · 1 2: v } a . 'f/ -

·-

.

12.4

,

How to minimize "unfairness"

Intuitively, the map nn (P, · ) (for a fixed probability measure P on a proba bility space D) when applied to a discounted asset price process measures how

12. Applications to the theory of financial markets

182

often ( in terms of time and probability ) and how much it will be the case that one may expect to obtain a multiple ( or a fraction) of one's portfolio simply by selling or buying the stock under consideration. For the following, we will drop the first component of n ( indicating the probability measure ) if no ambiguity can arise; for internal hyperfinite adapted spaces and Loeb hyperfinite adapted spaces, we will assume that the canon ical measure ( the internal uniform counting measure and its Loeb measure, respectively) on the space is referred to. First of all we derive a formula that will make explicitly computing n easier in our specific setting: Lemma 12.4.1 If x ( v ) satisfies (12. 3. 1) for some v > 0 on an adapted prob ability space (r, g , CQ), then the discounted price process exp x� v) : t ::0: 0)

( ( )

is of finite unfairness. More specifically,

( ( ))

nr exp xl"l

[ IE [ 1 ,;,1" (x;"1 fr p;x:;1 i)vo) I] dt [ IE [ I d: l .,�o IE [ xi�¥,] + �' I ] dt I

�

-

The proof is more or less a formal calculation, provided one is aware of the path-continuity of our process and the fact that the filtrations generated by b and x(v ) are identical. For this implies that, given t > 0, the value Pro of.

'1/J ( v )

(

)

( ) ( v)u - ""' xt+ 6 Pi X ( t+ u -� ) vo ( w ) i EI v

does not change within sufficiently small times u - almost surely for all those paths w where x�v ) (w) - L i E I p� x/�� ) vo(w) tf. { ±v }, this condition itself being satisfied with probability 1 . Now, using this result and the martingale property of the quotient of the exponential Brownian motion and its exponential bracket, we can deduce that for all t > 0 almost surely:

How to minimize "unfairness"

12.4.

183

Analogously, one may prove the second equation in the Lemma: for, one readily has almost surely

In order to proceed from these pointwise almost sure equations to the assertion of the Theorem, one will apply Lebesgue's Dominated Convergence Theorem, yielding

For finite hyperfinite adapted probability spaces, an elementary proof for the main Theorem of this Section can be contrived: Lemma 12.4.2 For any hyperfinite number H we will let for all v > 0 denote the solution to the hyperfinite inztial value problem

x6v) { x ( v ) - x (v ) - � } (x(v) - "" . x(v) ) X{

\it E

=

0, 0, . . . , 1 t

t+ J, -

x (v)

a

!

-

t

+ J . 1ft+ J,

6 Pu

uEI .

( t-u) VO

2 1 J 1 (2H ! ) 1/'2 2 H !

I

(v )

xt

-

(u)

L Pu ·X (t- u )V O

uEI

I

?: v

}

.� H!

(1 2.4. 1)

12.

184

Applications to the theory of financial markets

(where 1'1£jH ! : n { ± 1 } H ' --+ { ± 1 } is for all hyperfinite e <::.:: H! the projection to the f!-th coordinate) which is just the hyperfinite analogue to (1 2. 3. 1) . Using this notation, and considering a hyperfinite adapted probability space of mesh size H!, one has for all k < H!, =

""' IE Xk(v)/H ! - IE [X (v ) I FkjH !] - 2J2 1 H! -"-±f

<::; 6

If

k
for all 2: v . As a consequence, m (xO! n) is monotonely decreasing for all hyperfinite adapted probability spaces Sl = { ±1 } H ! . Pro of of Lemma 12.4.2. It suffices to prove the result for finite ( rather than merely hypcrfinitc) adapted spaces. By transfer to the nonstandard universe, we will obtain the same result for infinite hyperfinite H as well. The proof of this Lemma for finite H relics on exploiting the assumption that a is piecewise constant, since a = a ( 1 )x(O,+=) + a ( - 1 )x(-=.O) + a (O)x{o} yields Vk < H! Vv > 0 T

[ (v) I J v) (

J2 1

IE X kJ} Fk/ H ! - X J'r, + 2 H!

v

(

- �a x( l - 6 ""' pu kj H ! H!

_

1

_

- H!

(

·

u EI

a(1)x

{

(v

X ( �,l -u)v o

(c, · J

c'·l

) {l

xk/H ' - L Pu·X .J£. uEI HI

+ a( - 1 ) x

{

C uJ

xk/H' - L Pu ·X uEl

u

x

1t

) VO

(C kJ - u ) JTf

(v) xk/H'

;:::v }

VO

::; -v

" p " L-

uEl

)} '

.x (v )

k - u) V O (w

l>-v}

which immediately follows from the construction of Anderson's random walk [3] B. = v�Jr. H, and the recursive difference equation defining the process x(v) . However,2 this last equation implies Vk < H! Vv > 0

I

[ (v) I ] (v)

J2 1

IE IE X k}i,1 Fk/ H ! - X J'r, + 2 H! 1

= -

H!

a ( 1)X

IE

{

(v )

I

((v)

:.:C: v } s-v } )

xk/ lf ' - L Pu ·X k m u) V O uEl

+ a(-1)x

{

X

�� , - uLE I Pu X ((vm� - u

VO

12.4. How to minimize "unfairness"

185

} "'"' IP' "'"' { 6 XkjH! - 6 Pu · X �, -u)vo ? v

Now all that remains to be shown is that (v)

k< H! and

(v)

(

uEI

"'"' IP' { XkjH! - "'"' 6 Pu . X );, -u)vo 6 (v)

k< H!

(v)

(

uEI

<::;

-v

}

are monotonely decreasing in v. The former statement is a consequence of the following assertion:

Vv' ::; v \If! < H!

IP' { xt�! - L Pu . xt�, -u)vo v } L k�R uEI "'"' "'"' Pu . X --", -u)vo 6 1P' { Xk/ H ' - 6 '2

H

<::::

(v' )

k�R

(v')

(H

uEI

2::

v

1

}

(12.4. 2)

One can prove this estimate by considering the minimal f! such that the point wise inequality

L k
(v)

(v)

xk/H' - L Pu ·X ( _k_ ;:o: v uEI HI u V O

)

} ::; L k
(v')

(v')

::?: v xk f H' - L Pu · X ( _b_ HI u VO uE I

)

fails to hold. Then one has an w E S1 such that Vk < f! (w ) (vJ CvJ

X{

xk/H ' - L Pu · X ( k uEI HT - u

= X{

1 0

(v')

)

vo

::?:v

}

(v' )

xk/H' - L Pu · X ( k ::?: w u VO v uEI

X{ = X{

(v )

(v)

(v' )

(v')

)

' } (w) ,

)

::?:v xf! H ' - L Pu · X ( £ uEI HI u VO

)

'}

::?: v XR f H' - L Pu ·X ( £ uEI HI u VO

(12.4.4 )

} (w) ,

( 12.4.5 ) ( 12.4.6)

) ' } (w .

But equation ( 12.4.4 ) implies, via the difference equation for inductively in k the relation Vk < f! Xf(w ) = X;;:' (w ) .

(12.4.3)

x ()

(12.4. 1),

12.

186

Applications to the theory of financial markets

If one combines this with

(which is equation (12.4.5)) and v ::; v' , one can derive - again via the recur sive difference equation (12.4. 1) - that x , :::: x ! applied in t = �! as well as the estimates

i;2

X { XR/H' - L (v ' )

uEI

Pu · X

(v ' ) k

( H!

-u

) VO

2:v ' } (w)

> >

X{ X{ X{

i;1

(v ' )

xf/H' - L Pu · X uEI

(v )

xf/ H ' - L Pu X uEI

(v )

·

xt/ H ' - L Pu · X uE l

1.

This contradicts equation (12.4.6) . Hence, the estimate tablished for all k < H!, leading to (12.4.2). S imilarly, one can prove

(v )

(t

w -u

(v )

(f

w-u

(v )

w_ - u

(t

)VO ) VO ) VO

(12.4.3)

2:v ' } (w) 2:v' } (w) 2:v } (w)

has been es

(v) - L Pu · X (v ), - O ::; -v } L IP' { Xk/H' ( H u)V k�R uEI "" IP' { Xk/H' "" Pu . X((v')l,) -u)vo ::; ' } (v' ) - L._., ::; L._., k�R uEI which entails that Lk < H! IP' { Xk/�! - 'I:, Pu · X (v1 _ u)vo ::; - v } must be monouEI ( H ! Vv' ::; v W <

H!

·

k

-v

tonely decreasing in v.

D

Using nonstandard analysis and the model theory of stochastic processes as developed by Keisler and others [10, 12, 7] , we can prove the following result:

Suppose (y(v) : v > 0 ) is a family of stochastic processes on an adapted probability space r such that y(v) solves the stochastic differential equation (12. 3. 1) formulated above for all v > 0. Then the function J f---+ nr (y(" l ) attains its minim·um on [0, S] in S.

Theorem 12.4.1

Pro of. B y the previous Lemmas 12.4.2 and the formula for n from Lemma 12.4. 1, the assertion of the Theorem holds true internally for hyperfi nite adapted spaces r, if we replace y(o-) by a lifting y(o-) and if we let nn when applied to internal processes denote the hyperfinite analogue of the standard nn . Now, according to results by Hoover and Perkins [11] as well as Albeverio

187

References

et al. [2] , the solution x(v ) of the hyperfinite initial value problem (12.4.1) is a lifting for the solution x(v ) of ( 12.3.1) on a hyperfinite adapted space for any v 2: 0. Now, y c---+ nn ( y ) is the expectation of a conditional process in the sense of Fajardo and Keisler [7] . Therefore, due to the Adapted Lifting Theorem [7] , we must have Vv 2: 0 °nn x(v ) = nn ( x(v ) )

( )

(where we identify n with its internal analogue when applied to internal pro cesses) . Since the internal equivalent of the Theorem's assertion holds for internal hyperfinite adapted space, the previous equation implies that it is also true for Loeb hyperfinite adapted spaces. Now let (y (v ) : v > 0) be a family of processes on some (not necessarily hy perfinite) adapted probability space r with the properties as in the Theorem . Because o f the universality o f hyperfinite adapted spaces [7, 12] , we will find a process x (v ) on any hyperfinite adapted space n such that x and y are automor phic to each other. This implies [2] that x ( v ) satisfies ( 12.3.1) as well. Further more, as one can easily see using Lebesgue's Dominated Convergence Theorem, Vv

>

0 nn ( x ( v) )

=

nr ( y ( v ) ) .

Due to our previous remarks on the solutions of (12.3.1) on hyperfinite adapted spaces, this suffices to prove the Theorem. D Acknowledgements. The author gratefully acknowledges funding from the German National Academic Foundation ( Studienstiftung des deutschen Volkes ) and a pre-doctoral research grant of the German Academic Exchange Service ( Doktorandenstipendi1Lm des Dwtschen Akademischen A 1Lstauschdienstes ) . He is also indebted to his advisor, Professor Sergio Albeverio, as well as an anony mous referee for helpful comments on the first version of this paper.

References

[1]

S . ALBEVERIO, Some personal remarks on nonstandard analysis in prob ability theory and mathematical physics, in Prnceedings of the V!Ilth In ternational Congress on Mathematical Physics - Marseille 1986, eds. M . Mebkhout , R. Seneor, World Scientific, Singapore, 1987.

[2]

S . ALBEVERIO , J. FENSTAD , R. H0EGH-KROHN and T . LINDSTR0M ,

Nonstandard methods in stochastic analysis and mathematical physics, 1986.

Academic Press, Orlando,

[3]

R. M . ANDERSON, "A nonstandard representation for Brownian motion and Ito integration", Israel .Journal of Mathematics. 25 (1976) 15-46.

188

12. Applications to the theory of financial markets

[4] A. CoRCOS ET AL . , "Imitation and contrarian behaviour: hyperbolic bubbles, crashes and chaos", Quantitative Finance, 2 (2002) 264-281. [5] N. CUTLAND, E . KoPP and W . WILLINGER, "A nonstandard treatment of options driven by Poisson processes", Stochastics and Stochastics Re ports, 42 ( 1 993) 1 1 5- 133. [6] l\1 . D REHMANN, .J . 0ECHSSLER and A . RoiDER, Herding with and with out payoff externalities - an internet experiment , mimoe 2004. [7] S . FAJARDO and H. J. KEISLER, Model theory of stochastic Lecture Notes in Logic 14, A. K Peters, Natick (Mass . ) , 2002.

processes,

[8] F . S . HERZBERG, "The fairest price of an asset in an environment of tem porary arbitrage", International Journal of Pure and Applied Mathemat ics, 18 (2005) 121 - 131. [9] F . S . HERZBERG, On measures of unfairness and an optimal currency transaction tax, arXiv Preprint math. PR/0410543. http : //www . arxiv . org/pdf/math . PR/04 10543 [10] D . N . HOOVER and H . .J . KEISLER, "Adapted probability distributions", Transactions of the American Mathematical Society, 286 ( 1 984) 1 59-201. [ 1 1] D . N . HooVER and E . PERKINS, " Nonstandard construction of the stochastic integral and applications to stochastic differential equations I, II", Transactions of the American Mathematical Society, 275 ( 1 983) 1-58. [12] H. J. KEISLER, lnfinitesimals in probability theory, in Nonstandard anal ysis and its applications, ed. N. Cutland, London 1Iathematical Society Student Texts 10, Cambridge University Press, Cambridge, 1988. [13] P . A . LOEB , " Conversion from nonstandard to standard measure and ap plications in probablility theory", Transactions of the American Mathe matical Society, 2 1 1 ( 1 974) 1 13-122 . [14] E . NELSON, Radically elementary probability theory, Annals of Mathemat ics Studies 1 17. Princeton University Press, Princeton (NJ) , 1987. [15] E. PERKINS, Stochastic processes and nonstandard analysis , in Nonstan dard analysis-recent developments, ed. A.E. Hurd, Lecture Notes in Math ematics 983, Springer-Verlag, Berlin, 1983. [16] K . D . STROYAN and .J . M . B AYOD, Foundations of infinitesimal stochastic analysis, Studies in Logic and the Foundations of Mathematics 1 1 9, North Holland, Amsterdam, 1986.

Quantun1 Bernoulli exp eriments and quantum stochastic processes Manfred

\iVolff*

Based on a

W ...-algebraic

Abstract

approach to quantmn probability theory we

construct basic discret e in terna l quantum stochastic pro cesses with in

d ep eiHlent increments. We ob tain Bernoulli

exp eriments

a one-parameter family of (classical)

as linear combinations of these basi c processes .

Then we use the nonstandard hull of the internal GN S-Hilbert space corresponding to the chosen state

T

( the

?-{,.

un derlying quantum probabil

i ty measure) in o rder to derive nonstandard hulls of our internal pro cesses.

Finally continuity requirements lead to the specification of a

certain flUhf;pace £ of

nal pro cesses to

the

can

Hr

to which the nonstandard hulls of our inter

be restricted and

wh ch

i

turns out to be isomorphic

Loeb-Guicltardet space intro d uced by Lci tz-Martini

space of £ the.n ifl shown to he isom orphic to

F+

(L2 ([0, 1] , >.))

1 101.

A sub space

flymmetric Fock

and om· basic processes agree with the processes of

Hudson and Pa.rtltasarathy on this

13. 1

t.he

subspace.

Introduction

In the early 1980's Hudson and Parthasarathy [5], Hudson and Lindsay [4] . and Hudson and Streater [6] started the theory of quantum stochastic pro cesses, in pcu·ticular of quantum Brownian motion. In [12[ Parthao:;arathy showed one way how to make the passage from quantum random walk to diffu sion. All theS(' topics are extensively treated in P.A. Meyer's compendium [ 1 1 [ . For another general introduction to this field see fl3] where an extensive mo tivation ti·om qua.ntum physics may be found. The Hudson-Part.hasarathy ap proach is based on t.he classical symmetric Fock space over L2 (R+ , >.) where ·Mathematisches Institut, Eberhard-Karls-Univ. Tiibingen, manfred . wolff�uni-tuebingen . de

190

13.

Quantum Bernoulli Experiments

A denotes the Lebesgue measure (see section 13.6 ) . Guichardet [3] gave an interesting representation of this Fock space as an L 2 -direct sum of spaces L� = L 2 (Xn) where Xn is the space of all subsets of the unit interval [0, 1] having exactly n elements. The measure on Xn is defined via the conesponding Lebesgue measure on �f- , This representation was extensively used by Maassen developing his kernel approach to general quantum stochastic processes. Journe [7] invented the so called toy Fock spaces (bebe Fock in French) as discrete approximations of the symmetric Fock space. Let Do = { 0, 1} be equipped with the uniform distribution f-LO · Then the toy Fock space of order n is the space L 2 (D0, p, � n) . A rigorous discrete approximation of the Guichardet space and the basic quantum stochastic processes defined on it which uses the N space L2 (D � , p,� ) as a toy Fock space is given by Attal [2] . There is another approach based also on the Guichardet space to prove such approximation theorems using nonstandard analysis which was developed by Leitz-Martini [10] . He discretized the Guichardet space in the following manner: let T = { � : k E *Z , 0 :::; k :::; N } be the hyperfinite time line where N E *N is infinitely large. Let r n be the set of all internal subsets of T of internal cardinality n and set r = u�=O rk . Now let ]1.1 be an internal subset of r and set where IAI denotes the internal cardinality of A. Then m is an internal measure on r with m(f) � e. The space L 2 (r, Lm) , where Lm denotes the Loeb measure associated to m, contains the Guichardet space as a Banach sublattice. In fact L 2 (r, Lm) is the nonstandard version of the Guichardct space, so to speak the Loeb- Guichardet space over the time interval [0, 1] , Among other things Leitz-Martini constructed all the relevant discrete quantum stochastic processes and showed that their nonstandard hulls exist in a precise manner and form the basic quantum stochastic processes: the time process, the creation and annihilation processes, and the number process. All approaches mentioned so far start with some kind of discrete approxi mation of the symmetric Fock space or the isomorphic Guichardet space over �+ , [0 , 1] respectively. In contrast to these approaches we present here a new one based on the more natural W* -algebraic foundation of quantum stochastic as described e.g. in [8] , In [13] the author sketched a different though similar approach in the finite-dimensional setting but he was not able to make the transition from the finite-dimensional to the continuous infinite-dimensional case. Our access is exactly the noncommutative version of Anderson's way [1] to Brownian mo tion. In particular we arc able to ju.s tify the usc of the symmetric Fock space (or equivalently the Guichardet space) in the context of quantum stochastic

191

13.2. Abstract quantum probability spaces

processes. We believe that by our approach the applications of nonstandard analysis as given by Leitz-Martini [ 10] are also better understandable. We begin with the discrete quantum Bernoulli experiment modelled in the von Neumann algebra of all 2 n x 2 n-matrices over CC. Then we prove a l\loivre-Laplace type theorem for quantum Bernoulli experiments, i.e. we show that these discrete stochastic processes converge to quantum Brownian motion in the same sense in which the classical approximation of the usual Brownian motion is treated by Anderson [ 1] . Furthermore we construct the basic quantum stochastic processes out of the corresponding discrete versions in the case of a vacuum state. Finally we show how the symmetric Fock space approach fits into our setting. Applications to the theory of stochastic processes are under preparation .

13 . 2

Abstract quantum probability spaces

Recall that almost all information about a given probability space (D, I:, JL ) is to be found in the pair (A, JL) with A = L= (n, I:, JL). Let us denote by [! ] the equivalence class modulo negligible functions in which the essentially f( w ) dJL( w ) = : bounded measurable function f is contained. Then [f] ----+ JL(f) defines a positive, linear, order continuous normalized functional on A. Analogously a quantum probability space is a pair (A, T) where A denotes a W*-algebra and T is a positive, linear order continuous normalized functional on A, or a normal state for short. The reader not familiar with the abstract theory of W*-algebras may look at them as subalgebras of the algebra of all bounded operators on an appropriate Hilbert space which arc closed under involution * : a ----+ a* and which are also closed with respect to the strong operator topology. In this context order continuity means the following: whenever a downward directed net (aa) of nonnegative selfadjoint operators from A converges to the operator 0 with respect to the strong operator topology then lima T(aa) = 0 holds. In particular one may interpret L= (n, l::JL ) as the W*-algebra of all multiplication operators MJ : g ----+ MJ (g) = fg (! E L= (n, I:, JL)) on the Hilbert space L 2 (D, I:, JL ) . Let (A, T) be a quantum probability space. Let A1 , A2 be W*-subalgebras generating the W*-subalgebra A3; moreover let Tk be the restriction of T to Ak (k = 1 , 2, 3). Then A1 and A2 are called stochastically independent if (A3 , T3) is isomorphic to (A1 ®A2 , Tl ®T2) where ® denotes the W*-tensor product .

fn

Remark: Let us point out that this notion of independence agrees with the usual one in the classical (commutative) case A = L= (n, I:, JL) but that

13. Quantum Bernoulli Experiments

192

there are other notions of independence in the quantum stochastic setting also generalizing the classical one (e.g. free independence) . Two probability spaces (A. T) and (A', T1) are called equivalent if there exists an order continuous algebraic isomorphism rp : A --+ A' with T1 = T o rp . Now let us construct the L 2 -space corresponding to the probability space (A, T). It is nothing else than the so called GNS-space (after Gelfand, Naimark, and Segal) . To this end we introduce the C-valued mapping (al b) = T(a*b) on A x A. It is scsquilincar, that means, it is linear in the second argument, antilinear in the first argument, and it satisfies (a l a) 2 0 for all a E A. The space LT = {a : (a I a) = 0} is a left ideal, and on A/ LT there is well-defined a scalar product by (a l b) = ( a l b) where a = a + LT denotes the equivalence class in which a is contained. 7-{T is the completion of A/ LT with respect to the associated scalar product norm 11 a11 = � = JT(a* a) . The representation JrT : A --+ J:(HT ) is given by JrT (a)(b + LT) = ab + LT which is well-defined since LT is a left ideal. The representation is injective whenever A is simple (i.e. has no nontrivial closed ideals) , e.g. A = Mn (C), the algebra of all n x n-matrices. In case A = L = (n, p,) the space 7-{T turns out indeed to be the space 2 L (D, p,) and in this way L = (n, p,) is represented as the algebra of operators of multiplication by L =-functions (see above) . 13 . 3

Quantum Bernoulli experiments

Let us begin with the easiest example of a quantum probability space: Let

A = M2 (C) be the W*-algcbra of all complex 2 x 2 matrices a = (a;k ) and choose A E] 1/2, 1] . Then T.A (a) = >-au + ( 1 - >.)a22 defines a normal state on A. As in the case of classical probability we will often write IE ,\ (a) in place of a.

T.A (a) , and we call it the expectation of Now consider the following model of tossing a fair coin: choose Ao as the commutative W*-algebra C { O . l } and let p,o : Ao --+ C be given by p,o(f) = � (! ( 0) + f ( 1)). The underlying classical probability space is to be rediscovered in Ao as the set of two idempotents 1 { o } and 1 { 1 } where 1 A denotes the indicator function of a subset A of a given set. Ao is spanned also by the elements 1r

We now construct all injective *-algebra homomorphisms : Ao --+ A satisfy ing T,\ o = f-LO · In the sense of section 13.2 this means that we will find all abstract probability spaces (B, T,\ 1 13 ) which arc equivalent to (Ao, p,o). An easy calculation shows that they are given by := with 1r

1r

1rz

-

z

1

) = : Qz

13.3.

Quantum Bernoulli experiments

and linear extension, where Pz - Q z =

( � 0 ) = : Xz .

z E lC

193

is a parameter with l z l

= 1. n2 (X) =

In summary we have seen that the simplest quantum probability space (A. T.\) contains a one-parameter family (Az , T.\ IAJ of models of classical coin tossing. For w = eis, z = e it ( -n < s, t :<:: n) the commutator [Xw , X2] . = XwXz X2Xw is easily determined: [Xw , Xz] = 2 i

sin(s - t)

( � -� ) .

So Xw and X2 commute if w = z or w = -z. We obtain JE.\ ( [Xw , X2] ) = 2 i (2>. - 1) sin(s - t). The absolute value of it attains its maximum at - t = ±n /2. The pair (Xw , X2) with z = - iw is called a quantum coin tossing. It is unique up to automorphisms. More precisely this means the following: choose w = e it (t E] - n, n] ) , and consider the automorphism 1.{) on A given by I.{)( a) = u*au where 0 e- it/2 u. 0 e it/2 Then i.p(Xl ) = Xw ' i.p(X- i ) = x- iw and T.\ 1.{) = T.\ . So in the following we need only to consider the quantum coin tossing (X1 , X_ i ) . Notice that X1 = ax , X_i = ay and [Xw , X_,w]x = 2ia2 , where ax , a y, az denote the Pauli matrices. The following important matrices can be constructed by (ax , ay) : 1 0 1 . (13.3. 1 ) (ax + wy ) , b2 0 0 1 0 0 . (13.3. 2) b+ 2 (ax - wy ) , 1 0 1 0 0 ( 13.3.3) 1 - [ax , ay] · bo 0 1 2i Obviously the following formulae hold: s

_

(

( ( (

0

)

) ) )

ax = b+ + b - , ay = i · (b+ - b - ) , [b - , b+] = a2 = 1 - 2b0 , Xw = wb+ + wb - .

These formulae show that the random variables b+ and b- are in a certain sense more basic than Xw though they are not selfadjoint. As in the classical case we describe the experiment of n ( E N) independent coin tossings by the n-fold tensor product (A®n , T�n) . The algebra correspond ing to the k-th trial is described by the elements 1 129 · · • 129 1 ®x 129 1 129 · · · 129 1 = : X k '-v-" (k- 1 ) times ( n -k) times

'-v-"

194

13.

Quantum Bernoulli Experiments

where x E A is arbitrary. Describing k times repetition of our quantum coin tossing we obtain Sw. k := L;= l Xw1 . We have k [Sw ,k, s- iw, k] = 2i k·

La

J =l

z,

The pair (Sw, k l S- iw, k ) is called a quantum Bernoulli experiment. Setting Ak := A0 k ® 1 ® · · · ® 1 = {a ® 1 ® · · · ® 1 : a E A0 k } '-v-" '-v-" n - k times n - k times we obtain a natural filtering lC · 1 C A1 C · · · C An = A0n . Moreover there also exists a conditional expectation !Ek from An onto Ak which is given by IEk (a ® b) = T; ( n- k ) (b) · a , where b E A0 ( n- k ) , and linear extension.

13.4

The internal quantum processes

Now we consider a polysaturated model *V(JR:.) of the full structure V(JR:.) over JR:.. As usual we replace the standard integer above by an infinitely large integer N. Then all considerations in the previous section remain true for internal objects. We use the discrete time interval T = { � : 0 <:.:: k < N } denoting its elements by §., t, JJ . . . . Moreover we rescale also the family (Ak) by setting At- := ANt · Set p � JN · Then we introduce n

a±t_ a'l_ •

at

pbf , b'j , p

2

(0 �). 1

( 13.4. 1 ) ( 13.4.2) (1 3.4.3)

These elements are the internal increments by which we construct the basic internal stochastic processes A£ = L: o :S;�
( A i ) -t is called the internal creation process, (Ai )t the annihilation pro ( A! ) t the time process and finally (Ai)t th� �umber process. The internal Brownian motion corresponding to the parameter w E lC then is given by Bw,t wAt + wA_t ( l w l = 1 ) . It satisfies the internal difference equation dBw,t = w at+ + w at .

ce ss ,

13.4. The internal quantum processes

195

Like in the previous section we consider the state Tf: N obtain

T.\ ,N and we

JE.\,N ( Bw,t_ ) : = T,\,N ( Bw,t_ ) = 0, JE.\ ,N(B�,t. ) = t_.

(13.4.4)

The second equation follows from the facts that

((wa; wa-:;- ) 2 )

for §. # 11:, but T>-,N - + The commutator is

= T>-,N (1/N) =

1/N.

The so-called internal Ito-table for the increments is easily computed:

a• a+o a§. a;,

pa-2:a: I a0-; I a0-� I p2a-a:;; p2 a; 0 0 p2 a� 0 a+§. a§.o 0 0 a• a;, 0

p-tv =

A-� + lvi B-"-,t. + l v l 2 A-; lvl

s

s

§.

Let 0 # v E *IC be arbitrary. Then the process (P.�_) given by is called the Poisson Process with parameter v . Now we compute the characteristic operator functions of these pro cesses. Whenever V = (ifi)t_ is a stochastic process its characteristic oper ator function is defined by fv (u, t) = exp( i uvt), where u E *IF!:.. Obviously A ( t ) l u= O = iifi and fv ( u, t) is unitary iff Vi is selfadjoint . The processes V = (Vi) we considered so far are all of the form u,

Vi =

I: d§.,

O:S;§.
where

d8 = 1 Q9 · · • Q9 1 Q9 d Q9 1 Q9 · · · Q9 1 '-v--"' '-v--"' ' N13- 1 times N-N13 times and d E A is appropriately chosen. Since d'IJ,, dy_ commute for 11: # 1!. we obtain

fv (u, t)

=

@ (exp( i ud) ) 13• O:S:i3<.1.

13. Quantum Bernoulli Experiments

196

The most important formulas are the following ones:

@ (cos( pu)! + i sin(u)Xw).e_ O <;.s
13.5

From the internal to the standard world

The operator norm topology on A is too fine in order to obtain reason able nonstandard hulls. Moreover the processes discussed so far are internally bounded but not necessarily S-bounded. For example Bw, kjN is selfadjoint, hence its operator norm is equal to its spectral radius. This number is easily computed as )N, or in other words I I Bw.t. ll = VN · t In the following we only consider T;.. for >. = 1 . The other case will be treated elsewhere. We set T1 = : T and we construct the Hilbert space 7-{T corresponding to (A, T) (A the algebra of 2 2-matrices, see section 13.2) . To this end we denote the columns of the 2 2-matrix a by a = (ai , a�) . Then T(a* b) = (ai I bi ) where the latter is the canonical scalar product on IC 2 . Hence LT = {a : a � = 0} and nT (a) (b + LT) = abi + LT. In particular 7-{T S=' IC2 , where the isometric isomorphism is given by a + LT ---+ ai. We have previously introduced the random variable X1 ax . Its equiva lence class in 7-{T is denoted by xl = x l + LT We set eo = b�1 = ( 6 ) = xp and e 1 = b11 = ( ? ) = X1 . Now we consider TN = T® N Obviously 7-{TN S=' (*IC2)® N holds. Let w E {0, 1 }T be an arbitrary internal function. Then we define ew by e w = @�=[} e w ( k/N ) · The set {e w : w E {0, 1 V } forms an orthonormal basis in 7-{TN · 7-{TN is an internal Hilbert space, so its nonstandard hull if.TN is a Hilbert space. The formulae (13.3.1) up to ( 13.4.3) above lead to the following equations, which result from the fact that 7-{TN is the GNS-Hilbert space of (A N , TN ) · In order to make the formulae as simple as possible we denote the addition mod 2 by EB, i.e. we consider D = {0, 1 V as the cartesian product of the group (Z2 , EB). Then we set w + = 1r EB w and w - = w for w E {0, 1 }T . Moreover we write 1.t. in place of 1 {.1.} . x

x

=

13.5. From the internal to the standard world

197

With all theses conventions we obtain

w± (.t.) e

(13.5. 1)

w+ (.t.) ew - a - a + ew,

(13.5. 2)

VFJ wEB1t '

t t ----_;::;w (.t.) ew = N · ata-; ew. -

13. 5 . 1

(13.5.3)

Brownian motion

First of all we show that our internal Brownian motion ( Bw,t) leads to the standard Brownian motion on a subspace of HTN . To this end we recall that the standard Brownian motion (f3t) on L 2 ( D , L'") (with D = {0, 1}r, f-L f-LN f-L� N , L the corresponding Loeb measure) can be viewed as a family of multiplication operators f {3tf on L2 ( D L'") which are selfadjoint but unbounded (cf. the end of section 13.2) . From equation (13.4.4) it follows that Bw,t is S-bounded in 7-{N . We claim that classical Brownian motion (or better to say: Anderson ' s Brownian motion) is equivalent (in the sense of section 13.2) to our CBw , ot) where (Bw.ot) denotes the family of selfadjoint operators on a subspace of HTN coming from the family (Bw,t_). To this end let l w l = 1 and =

=

'"

f---+

,

Po = � 2

( -w

-w 1

l

).

For w E n we set Pw = @o-::;. k < N pw ( kjN) · It follows TN (Pw) = 2 - N = !-L({w}) . We denote the internal W*-algebra generated by {Pw : w E {0, 1}T} by B ( w ) . The formula p bw ,t .· - - - (P1 - - p1 ) w at+ + w at t - 2 t , shows that ( Bw t) L o < s < t bw.t is contained in B ( w). The internal L= -sp;-e -L= (n, f-L) is nothing else than L=({0, 1}, f-Lo) 0 N . Thus it is not very hard to see that 11N := 11� N (see section 13.2) maps L= (n, f-L) onto B(w) with TN 11N IB (w ) = f-L. Now let D = {f E L= (n, f-L) : f is S-bounded} . Then -

=

+

c

o

11N ( D ) C {a E AN : llall is S-bounded} C { x E 7-{TN : llxll 2 = VT ( x* x ) is S-bounded}.

Moreover for j, g E D we have

13.

198

Quantum Bernoulli Experiments

because of 1rN (Jg) = 1rN (j)*nN (g) . Since L 2 (D, L'") is the completion of ef : f E D} the mapping 0/ ---+ ;;;(}) can be extended to a linear isometry U from L2 (D, L'") onto a subspace M , say, of 7-{TN · For t E [0, 1] set t min{§. E T : §. > t}. We show that U(f3t ) Bw, t · Defi ne =

Yt

(w)

=

{ -11

=

w(.t.) w(.t.)

=

1 0 =

and l'i = }N L O :S.£oc f3t li\1n = f3t holds in L2 (D, L'") . Therefore for every standard E > 0 there exists a standard no with llf3t(1D - 1 Mn ) ll 2 < E for n 2 no standard. But this in turn implies 11Yt(1D - 1 MJ ll 2 < E for n standard, n 2 no . Since 7rN is an isometry also for the internal L 2 -norm on 7-{TN we obtain IIBw, t7rN(1 D - 1 Mn ) ll 2 < E for these n 2 no. I I Bw. t 7rN (1 Mn ) l l 2 = I I Yt1 Mn ll 2 � n implies that Bw, t 7rN ( 1 M, ) E 7-{TN ,fin and U(f3t 1 MJ Bw ,t--;;;{i AJJ . But this in turn gives U(f3t) = Bw, t · That the operator of multiplication by f3t is mapped onto the operator of multiplication by Bw ,t follows from an easy additional argument. =

13.5 . 2

The nonstandard hulls of the basic internal processes

The deeper problem is to find a joint subspace JC of HTN such that all pro cesses have standard parts as closable operators densely defined in JC. We begin the construction with some more notations: For w E { 0 , 1}T =: D we set Mw = {.t. : w(.t.) = 1 }, and lwl := I Mw l , that is the (internal) car dinality of lvlw . Using these notations we obtain the following formulae by means of ( 1 3.4.4)-(13.5.2) . Let y LwED Yw ew be arbitrary. Then =

A_t y Aiy Aty Az y

L �L O:S,s.
) ( L( L ) ) L (� L ) L (� L w

+

( •)

(13.5.4)

Yw'w

wED

wED

tpED

O:S,s.
w (')

O:S,s.
(13.5.5)

Yw'w

11,an,'P- ( •)

1!,an,'P

+

(,)

'"

(1 3 . 5 . 6 )

'"

(13.5. 7)

13.5. From the internal to the standard world

199

Consider now the projection IE.t. from 7-{ onto 1-l.t. which is the Hilbert space coming from A..t. (see section 13.3 ) . IE..t. is given by lEt- ( ew ) Then we obtain IE_.t.A�

=

{ ew 0

w (�) = 0 for all � 2> t

else

A�IE..t. for each rt E family (A�h< 1 is ada�ted t� (H..t.h < 1 ·

{ • , o, + , - }

=

or in other words the

We prove now some continuity properties. To this end we introduce 7-{ ( n)

:=

{ L Yw ew : Yw E *lC} l w l= n

and we set 7-{ : = 7-{TN for short. Then 7-{ = 1_;{= 0 7-{ (n) where _i denotes the orthogonal direct sum. More over A; (H( n) ) C 7-{ ( n) Ai (H( n) ) C 7-{ ( n) Ai (H( n ) ) C 7-{n + 1 , and finally Ai- (H( ;;:) ) C 7-{ ( n-1 ) where 7-f ( -1 ) := {0 } = : 7-{ ( N+ 1 ) . For y = L l w l =n Yw ew and 0 :S §. < t we obtain ,

,

L I� L

II A� y - A� y ll 2

lw l =n

§."5cy
w+ ( s ) I 2 1Yw l 2

1 1 2 1 Yw l 2 L L I� "5cy <.t. w

<

=

( t - �) 2 II Y II 2 . (13.5. 8)

l l=n §. It follows that II A; - A: ll :S (t - §.) , in particular the mapping t S-continuous from -T to L(H) with respect to the operator norm. Let Y = L l w l = n Yw E 7-{ ( n) , and 0 :S � < t

II Ai y - A� y ll 2

=

L(L

l w l=n §."5cY.<.t.

f---+

A� is

w( �) r1Yw l 2 ·

It follows that II A� IJt(n) I :S but the mapping t A�y is not S-continuous in general. So finally we have to choose an appropriate subspace in order to get S-continuity of t --+ A�y. Next we show the following inequality: n,

Proposition 1

holds.

For- 0 :S � < t

f---+

(13.5.9 )

13. Quantum Bernoulli Experiments

200 Proof.

Let y = L l w l = n Ywew E Ji( n ) be arbitrary. Then

holds. If ip(u)

=

0 then

l iP

EB

t,,,l

=

<

n + 1 hence Y'Pffil:Yc

=

0. It follows

n+1 N

Let w E D be arbitrary with lwl = n . Then there exist at most N(t. - ..s.) - lw · 1 [� ,![ 1 different ip with ip EB 11' = w and .§. :S 1J < t_. This gives I (Ai - A! ) Y II2 :S D N (t;�) ( n + 1 ) II Y II2, and the asserted inequality follows. Corollary 1

( 13.5. 1 0)

Ai is the adjoint of Ai. By co�struction we have 'H( n) l_Ji( m ) for n # So the subspace JC= of H , generated by all the spaces i{( n ) ( n E N) is nothing else than the orthogonal direct sum JC= = l_ n ENo i{(n ) . Recall the definition t = min{JJ E T : > t.}. Proof.

m.

Then from the inequality (13.5.8) it follows that by

1J

A;,n Y (A[,ny ) /\ there is uniquely defined an operator on i{( n) bounded by t. So we obtain the operator A� = l_ n ENo At . n on JCoc which is selfadjoint and bounded by t. Moreover I I A� - A: II :S t - holds. D In order to define the nonstandard hull of one of the other operators A-� (� E { +, - , 0}) we choose the following joint domain of definition: JC� : = { Y E }(00 : Y = l_n ENo Yn, Yn E i{(n ) , � nll :iJn ll § < 00} =

s

which is dense in JC=. From the inequalities (13.5.9) to (13.5.10) we obtain that by

13.5.

201

From the internal to the standard world

:

the operators A� K0 --+ K 00 are uniquely defined. Moreover the following proposition holds: Proposition 2

A�

is selfadjoint and Ai and Ai are adjoint to each other.

Proof. Set At n = A� I joint, whereas Atn and is obvious.

H17• These operators are bounded and A�, n is selfad A�( n+ l ) are adjoint to each other. Now the assertion D

The final problem we have to solve is to find an appropriate filtration with the necessary conditions of continuity. To this end we calculate the difference IE.t - IE.e_ for :<:: §. < t For y E 7-{ ( n) we obtain

0

_e.

(IE_t - IE Let t. =

)y =

k/N, = l/N. It follows §.

l

1 Yw 2

l w l = n , w9 [o: ..t[

l

max 1 Yw 2

<

lwl = n

(k n- l)

17 max 1 Yw 2 Nn (t_ - §.)

l

lwl =n

•

n 1 1 (1 - k --l ) · · · ( 1 - k -- l ) . I

So the mapping y --+ IE.tY will be S-continuous if max lwl = n Yw 1 2 N17 will be finite. This consideration leads to the internal measure m on D = given by m ( w } ) = NL . Then (D, m) is the direct sum of (D17, mn)n :5o N with Dn = = and mn = m bn · Notice that m(Dn) = (�) · Jn , so that N m(D) = L�= O ( � ) · = ( + it ) :::::J e. Moreover L�= L m(D k ) :::::J for all L :::::J oo. Let (D, Lm) be the Loeb measure space associated to (D, m) . Then D ) = e and D is an Lm - null set. Finally L 2 (D, Lm) � _l n N L (D17 , LmJ where Lmn is the Loeb measure associated to m17 • Consider the external subspace of all S-bounded internal *complex val ued functions on D . Then for E the complex-valued function : --+ is in L 2 (D. Lm) and the space E is dense in L 2 (D, Lm) . Mor eover = 0fk in the L2 -sense, where = Now we map onto a subspace Y of 7-{ by the mapping V defined as follows: ( ) - I wl / 2 ew , V( ) =

{ {w E D : l w l

n}

Jk

Lm (Uk EN k E 2 0( j (w ))

0f

1

{0 . 1 }T 0

\ (U kE N Dk)

X f X

LkENo

X

ef : f X} fk f Ink .

f Lfw N wED

°f

w

13.

202

Quantum Bernoulli Experiments

and set

W(j) = 0]). Then Jr2 l f l 2 dm II V ( f ) ll 2 . Let .C b e the closure o f {W(j) : f E X} = : .[( b) in JC=. Then the last equation shows L2( D, Lm ) � .C. Moreover Xk = {f E X : f ( De ) = 0 for £ # k} is mapped onto a subspace of Ji ( k) ( k standard) . Let .C k be the closure in JC= of {W(f) : f E Xk } =: .C�b) . Then .C = ..lkENLk and .Ck � L 2( D b Lm k ) . The following facts arc easily seen to hold:

Theorem 1

_c k(b) .Ai• _c k(b) Aio _c k(b) .A �+ _c (b) k .A - _c k(b) �

c JC'ff for k E No c .C k c .C k c _c (b) k+l

c .c�bl_l .

Moreover .C n JC0 is dense in .C and it is the joint domain of definition of the operators A� , � E { +, - } which are closable and essentially selfadjoint in case � = and adjoint to each other in case � = +, - . •, o,

•, o,

Remark: We map an internal set !vi of T onto its indicator function This mapping 7/J is bijective and therefore we obtain an isometric isomorphism U of m ) onto the internal Guichardet space of Leitz-Martini. This shows that our considerations concerning the continuity of the filtering leads in a very natural manner to the Loeb-Guichardet space for the basic quantum stochastic processes cf. section 13. 1).

1M E {0, 1 }T =

D.

L2( D,

(

13.6

The symmetric Fock space and its embedding into .C

Let 'H be a Hilbert space and let 'Hr:om be the nth tensor product of 'H . We denote the full symmetric group of n elements by Then by

§n .

there is uniquely defined an orthogonal projection, the range of which is called the nth symmetric tensor product The symmetric Fock space

JiO n .

13.6.

The symmetric Fock space and its embedding into .C

203

+ ( H ) over H is the orthogonal sum :F+ ( H ) = l_':= o H O n where as usual H O 0 lC. For the special case H = L 2 ( [0, 1], >.) (>.: Lebesgue measure) the tensor n

:F

=

product H® is nothing else than L2 ( [0, 1] 71 , .\71 ) where A 71 denotes the dimensional Lebesgue measure. and the projection Pn is given by

n

Now we give the more or less classical definition of the basis quantum processes. To this end we set

h o · · · fn : = Pn ( h Q9 · · · � fn ) · 0

For

0 :S t

<

1

the definition of the creation operator is given by

Ai (( h · · · fn ) : = 1 [o, t [ h · · · fn 0

0

0

0

0

(13.6. 1)

and linear extension t o the largest possible domain. Similarly the annihilation operator is given by

n Ai ( h o . . . o fn ) = L (1 [o, t [ l fJ ) h o . . . o jj o . . . o fn , J= l

(13.6. 2)

where the expression jj indicates that this factor is omitted. The number operator is given by

A� (fl o · · · o fn ) = 1 [o, t [ · fl o · · · o fn + · · · + fl o f2 o · · · o fn - l o 1 [o, t[ · fn , (13.6. 3) whereas the time-operator is

A� (h o · · · o fn ) = t fl o · · · o fn ·

(13.6.4)

Remark: These operators are those ones introduced by Hudson and Parthasarathy [5] as the basic quantum processes. Consider now the set Xn = {(t 1 , t 2 , . . . , t n ) : 0 :S t 1 < · · · < t71 :S 1}. It is a measurable subset of [0, 1] 71 • Let f be an element of L 2 (X71, .\ 71 ) . Setting

f-( t 1 , · · · , tn ) =

{ OJ (t 1 , . . . , tn)

(t 1 , . . . , t n ) E Xn

otherwise

13 . Quantum Bernoulli Experiments

204

we obtain an element j of L 2 ( [0 , l]n, An) . Then Pn (]) E L 2 ( [0, l] , A) O n and II Pn (]) ll 2 = fxn l l 2 An , i.e. llfll = II Pn (]) ll holds. On the other hand let E L2 ( [0, 1] , A )0 n be arbitrary. Then = Pn (jl xn ) (equality in the L2sense) . It follows that the mapping ---+ Pn (]) is a linear isometry from L2 ( Xn , An) onto L 2 ( [0, 1] , A) O n. Its inverse is given by 91 xn =: Un (g ) . Now we embed L2 (Xn , An) isometrically into the space Ln constructed at the end of the previous section. To this end let w E Dn be arbitrary. Set (w) = min(� : w(�) = 1) and by induction �k+ 1 (w) = min(� > h : w(�) = 1). Then I(w) := (h (w) , . . . , �n (w)) E *Xn n Tn . Moreover if f := (kl /N. . . . , kn/N) E *Xn n Tn then f = f( w) for w = 1 M where M = {h (w) , . . . , �n (w) }. We consider the dense subspace Cn of L2 (Xn , An) consisting of the re strictions to Xn of continuous functions on the closure Xn . For E Cn we set S(f) (w) = * (f(w)) . This is an element of L2 (D n , mn ) satisfying l l f l l 2 :::::J II S(f) ll 2 · It follows that Cn is linearly and isometrically embedded into L2(D n , LmJ , and this embedding can obviously be extended to the whole of L2 (Xn , An) . Since L2 (D n , Lmn ) is linearly and isometrically embedded in Ln , we obtain that the symmetric tensor product L2 ( [0, l] , A) O n can be em bedded in Ln . An obvious extension then yields an embedding denoted by U of the symmetric Fock space F+ ( L2 ( [0 1 ] , A) into £. Call y the image of this embedding and set Yo = G n K0 .

fd

f

f

f n!

g n! --->

t1

f

f

,

2 Yo is dense in y. Moreover the restrictions of A� to Yo yield closed densely defined operators which are selfadjoint for � E { •} and adjoint to each other in the other cases and which moreover satisfy

Theorem

on u- 1 (Qo) where to (13. 6.!J

o,

A�

aTe the operators given by equations

(13. 6. 1)

'UP

The proof is obvious.

References [ 1 ] R. M . ANDERSON, "A nonstandard representation for Brownian motion and Ito integral", Israel .J. Mathern. , 25 15-46. [2] S . ATTAL, Approximating the Fock space with the toy Fockspace, in .J. Azema, I\1 . Emery, M. Ledoux, I\L Yor (eds.), XXXVI, Lecture Notes in Mathematics 180L Springer-Verlag, Berlin New York, 2003.

Seminaire de Probabilites

References

[3] A.

GUICHARDET,

205

Symmetric Hilbert spaces and related topics, Lecture

Notes in Mathematics 26L Springer-Verlag, Berlin, 1972.

[4] R.

L . HUDSON and J . M. LINDSAY, "A noncommutative martingale rep resentation theorem for non-Fock- Brownian motion", .T . Funct. Anal., 6 1

(1985) 202-221 .

[5] R.

R . PARTHASARATHY , "Quantum Ito ' s formula and stochastic evolutions", Comm. Math. Physics, 93 (1 984) 311-323. L. HUDSON and K .

[6] R.

L. HUDSON and R. F. STREATER, "Ito ' s formula is the chain rule with Wick ordering", Phys. Letters, 86 ( 1982) 277-279.

[7]

"Structure des cocycles markoviens sur respace de Fock", Prob. Theory Re. Fields, 75 (1987) 291 -316.

.T . L . .T OURNE.

[8] B .

KDMMERER and H. MAASSEN , "Elements of quantum probability", Quantum Probab. Communications, QP-PQ, 10 (1998) 73-100.

[9]

H . MAASSEN, Quantum Markov processes on Fock space described by integral kernels, in L. Accardi, W. von Waldenfels, Lecture Notes in Math ematics 1136, Springer-Verlag, Berlin. New York, 1985.

Quantum probability and applications II, Proceedings Heidelberg 1984, [10] M. LEITZ-MARTINI. Quantum stochastic calculus using infinitesimals, Ph.-D. Thesis, University of Tubingen, 2001 . [11] P . A . MEYER. Quantum probability for probabilists, Lecture Notes Mathematics 1 538, Springer-Verlag, Berlin, New York, 1993.

[12]

m

K. R. PARTHASARATHY, The passage from random walk to diffusion in quantum probability, (1988) , Ap plied Probability Trust, Sheffield, 231 - 245.

Celebration Volume in Appl. Probability [13] K. R. PARTHASARATHY, An introduction to quantum stochastic calculus, Birkhauser. Basel, 1992.

Applications of rich measure spaces formed from nonstandard models Peter Loeb*

Abstract \i'Ve review some recent work by Yeneng Sun and the author. Sun'i:l work shows that there

are

results, some used for decades without a rigourous

that arc only true for spaces with tho rich structmo of Loeb measure spacel:l. His j oint work with the author uses that structure to extend an important result on the purification of measure valued maps. foundat.ion,

14. 1 In

Introduction 1975 [ 1 1] ,

the author constructed a class of standard measure spaces

fanned on nonstandard models.

These spaces, nmv called "Loeb spaces" in

the literature, arc very close to the underlying internal spaces, and ru·c rich in

s t ruc t ure.

(Sec

1 121 by Yeneng Sun

the author's throe clmptcrs and Osswald's two chapters in

for background. ) 'vVe will briefly review here some recent work

that uses these measure spaces. Sun has shown that there are results, some

used in applications for decades without a rigorous foundation, that are only true for spaces with the rich structure of Loeb measure spaces. We will fol l ow the description of Sun'::; work with a detailed proof of a special case of t.hc recent

result in

[H] by Yeneng Sun and the author on A counterexample shows that the

valued maps.

the purification of measure

result we give is false if the

Loeb space we use is replaced by the unit interval with Lebesgue measure. We

note in passing hero that the result in

[13]

by Osswald, Sun, Zhang and the

author presents yet another example needing rich measure ::;paces.

*

Department

of

Mathematics, 'University

loeb@math. uiuc . edu

of

Illinois, Urbana, IL 61801.

14.2. Recent work of Yeneng Sun

14.2

207

Recent work of Yeneng Sun

We begin with some work in the late 1990's of Yeneng Sun ( [16] , [17] , [18] , and Chapter 7 of [12] ) . That work, Sun's alone, is published elsewhere, so what is said here is an invitation to read, and not a substitute for, the original articles. To set the background we consider the following question: Does it make sense to speak of an infinite number of independent individuals or random variables indexed by points in a uniform probability space? Can one, for example, reasonably consider an infinite number of independent tosses of a fair coin with the tosses indexed by a uniform probability space, and if so, does it make sense to say that half the tosses should be heads? Of course, there is no problem in speaking of independent random variables indexed by the first integers with each integer having probability 1/ An infinite index set with a uniform probability measure, however, must be uncountable; there is no uniform probability measure on the full set of natural numbers. The problem thus evolves to finding the possible meaning of an uncountable family of independent random variables. Whatever way one approaches this problem, the usual measure-theoretic tools fail. A natural attempt to generalize coin tossing replaces the natural numbers with points in the interval [0, 1], and thus replaces sequences of 1 ' s and - 1 ' s with functions from [0, 1] to the two-point set { -1, 1 } . This is a reasonable model for an uncountable family of independent coin tosses. In 1937, Doob [2] exhibited a problem with this and similar spaces of functions when standard techniques arc applied. To sec the problem, let denote the set of { - 1, 1 } valued functions on [0, 1] , and let be the product measure on constructed from the measure taking the values 1 / 2 at 1 and 1/2 at - 1 . In the usual construction of an appropriate a-algebra on measurable sets are formed from the algebra of cylinder sets, with functions in each cylinder set restricted at only a finite number of elements of [0, 1] to take either the value 1 or - 1 . It follows that each measurable set in is determined in a way described below by a countable subset of [0, 1 ] , and so the following result holds.

n

n.

P

D

D

D,

D Proposition 14.2.1 Fix any h E D. Set Mh := {w E D : w(t) = h(t) except for countably many t E [0, 1 ] } . Then Mh has P-outer measure 1 . Proof. For each measurable B � D, there is a countable set C [0, 1] such that for all o:, (3 in D, if o:(t) = (3 (t) for all t E C, then E B if and only if (3 E B. Suppose Mh � B . Given w E D, Let w' agree with w on C and agree with h on [0, 1] \ C. Then w' E Mh, so w' E B. It then follows that w E B . Thus B = D, so the outer measure P*( Mh ) = 1 . a

C

D

14. Applications of rich measure spaces

208

Remark 14.2.2 Note that if w E lvh, then since Lebesgue measure A of a countable set is 0, w = A-a.e. on [0, 1 ] . This is true if is nonmeasurable, or if 1, or if - 1 . Now outer measure when applied to the intersection of measurable sets with Mh is finitely additive, hence countably additive. Since the outer measure of Mh is 1 , one can trivially extend P to a measure P with P(Mh) = 1 . Thus, no matter what might be, one can claim that P almost every function is equal to at A-almost very point of [0, 1]. This, and other highly questionable arguments have been used for decades to work around the measure theoretic problem indicated by Doob's example.

h

=

h

=

h

h

h

h

Another approach to representing a continuum of independent random vari ables is to consider a function w , called a process, where is an index from an uncountable probability space called the parameter space (the probability measure need not be uniform) and w is taken from a second probability space called the sample space. The question then is whether it makes sense to work with the usual product of these two probability spaces. Sun has shown in Proposition 7.33 of [12] that no matter what kind of measure spaces, even Loeb measure spaces, one might take as the parameter space and sample space of a process, independence and joint measurability with respect to the classical measure-theoretic product, i.e. , formed using measurable rectangles as in [15] , are never compatible with each other except for a trivial case. Here is the exact statement of that proposition.

f(i, )

i

Let (I, I, ) and (X, X, v) be any two probability spaces. Form the classical, complete product probability space (I x X, I 129 X, 129 v ) . Let f be a function from I x X to a separable metric space. If f is jointly measurable on the product probability space, and for 11-almost all (i 1 , i 2 ) E !xi, h and fi2 are independent (call this almost sure pairwise in dependence), then, for tt-almost all i E I, f( i. ) is a constant f1m ction on X. Proposition 14.2.3 (Sun)

f1

f1

f1 129

·

Sun notes that Proposition 14.2.3 is still valid when f1 has an atom A . The almost sure pairwise independence condition implies the essential constancy of the random variables fi for almost all i E A. In the articles cited at the beginning of this section, Sun has shown that a construction overcoming these measure-theoretic problems is obtained by forming the internal product of internal factors, and then taking not just the Loeb space of each factor, but also the Loeb space of the internal product. This yields a rich extension of the usual product a-algebra formed from the Loeb factors, one that still has the Fubini Property equating the integral over the product space to the iterated integrals over the factor spaces forming that product. Here in more detail is that construction.

14.2. Recent work of Yeneng Sun

209

Sun's starts with internal spaces ( T, T, A) and ( D, A, P) . The space T may be a hyperfinite set with A given by uniform weights. He then forms the Loeb spaces (T, L>. (T) ) ) and (D, L!L (A) , P ) . He lets A Q9 P denote the internal product measure, while T Q9 A denotes the internal product a-algebra, and L>. (T) Q9 Lp (A) denotes the classical product a-algebra formed from L>. (T) and Lp (A) as in [15] . On the other hand, forming the Loeb space from the internal product T Q9 A produces a larger a-algebra L>.0P (T Q9 A) on T n . ( Sec the following propositions. ) Here arc some properties of what we shall call the big product space (T x D, L>.0p (T Q9 A) , �) . X

The big product space depends only on the Loeb factor spaces (T, L>. (T) , >:) and (D, Lp (A) , P) . Proposition 14.2.5 (Anderson [1] ) If E E L>. (T) Q9 Lp (A) , then E E L>.0p (T � A) and � (E) = ); ® P(E) . Proposition 14.2.6 (Hoover (first example) , Sun [ 1 7] ) The inclusion of L>. (T) Q9 Lp (A) 'in L>.0P (T Q9 A) is strict if and only if both ); and P have non-atomic parts. Proposition 14.2. 7 (Keisler [6] ) A Fubini Theorem holds for the big prod uct space, Proposition 14.2.4 (Keisler-Sun [7] )

In his articles, cited above, Sun shows that to work with independence in a continuum setting, as has been attempted in informal mathematical applica tions without rigor, and for decades, one needs a rich product a-algebra such as the a-algebra of a big product space. Here is that general result as stated in Proposition 7.4.1 of [12] and proved in Theorem 6.2 of [17] .

Let X be a complete, separable and metrizable topological space. Let M (X) be the space of Borel probability measures on X, wher·e M(X) is endowed with the topology of weak convergence of measures, Let be any Borel probability measure on the space M(X) . If both ); and P are atomless, then there is a process f from (T n , L>.0P (T Q9A), GP) to X such that the random variables ft = f ( t, ) are almost surely pairwise independent (i,e,, for �-almost all (t1 , t2 ) E T x T, fh and ft2 are independent ), and the probability measure on M(X) induced by the function Pft- 1 from T to M (X) is the given measure Remark 14.2.9 Here, for any given t E T, P ft- 1 is the probability measure on X induced by the random variable ft D ---+ X. It is the measurable mapping from T to M (X) taking the value P ft- 1 at each t E T that induces Proposition 14.2.8 (Sun) f-L

·

X

f-L .

:

the measure f-L on M (X).

14. Applications of rich measure spaces

210 Remark 14.2.10

(

A consequence of the proposition is that when the space

T. L>.. (T), :\) is a hyperfinite Loeb counting probability space and 'ji is a non

atomic Loeb probability measure, it makes sense to use the big product space as the underlying product space for an infinite number of equally weighted, independent random variables or agents. As part of his work with large product spaces, Sun has extended the law of large numbers. The usual strong law states that if random variables Xi , E N, are independent with the same distribution and finite mean m, then � L:i= l Xi tends almost surely to the constant random variable m. That is, for almost all samples the value of the sequence at tends to the constant m.

i

w, w Theorem 14.2. 1 1 (Sun) Let f be a real-valued integrable process on the big product space. If the random variables ft := f(t, ·) are almost surely pair wise independent, then for almost all samples w E D, the mean of the sample function fw := f(·,w) on the parameter space T is the mean off viewed as a random variable on the big product space. There is no requirement of identi cal distributions. Another facet of this same work deals with independence. It is well known that for a finite collection of random variables, pairwise independence is strictly weaker than mutual independence. Sun has shown that for processes on big product spaces, pairwise independent and mutual independent coalesce. and they coalesce with other notions of independence that are distinct for a finite number of random variables. This implies asymptotic results for finite families of random variables where the families are ordered by containment and have increasing cardinality.

14.3

Purification of measure-valued maps

We now turn to a special case of the joint work of Yeneng Sun with the author in [14] . That special case generalizes the following celebrated theorem of Dvoretzky, Wald and Wolfowitz (see [3] , [4] , [5] ) .

Let A be a finite set and M(A) the space of probability measures on A. Let (T, T) be a measurable space, and I-Lk ! k = 1 , · · · , finite, atomless signed measures on (T, T) . Given f : T ---+ M (A) so that for each a E A, f( ·) ( {a}) is T-measurable, there is aT-measurable map g : T A so that for each a E A, and each k l f(t)({a})df-Lk (t) = f-Lk ({t E T : g(t) = a}). Theorem 14.3.1

m,

---+

:<:: m,

14.3. Purification of measure-valued maps

211

This theorem justifies the elimination, i.e., purification, of randomness in some settings. For example, in games, T represents information available to the players, and A represents the actions players may choose, given E T. Every player�s objective is to maximize her own expected payoff, but that payoff depends on the actions chosen by all the players. For each player, a mapping from T to A is called a pure strategy. A mapping from T to M (A) is called a mixed strategy; in this case, each player chooses a "lottery on A". A Nash equilibrium is achieved if every player is satisfied with her own choice of strategy given the choices of the other players. In quite general settings, such an equilibrium can be achieved using mixed strategies. When results such as Theorem 14.3.1 apply, those mixed strategies can be purified yielding an equilibrium with the same expected payoffs. We re fer the reader to [14] for more information on the game-theoretic consequences of Theorem 14.3.1 and its extension. We will concentrate here on the proof of that extension, a proof simpler than that of the more general result in [14] . We extend the DWW Theorem from a finite set A to a complete, sepa rable metric space. There always exists a Borel bijection from such a space to a compact metric space. That is, if A is uncountable, then it follows from Kuratowski's theorem (see [15] , p. 406) that there is a Borel bijection from A to 1]. On the other hand, for a countable set A, one can use a bijection from A to {0, 1 , 1/2, . . , 1 / , . . . } . Therefore, it suffices to let A be a compact metric space. For simplicity we will work with a finite set of measures, but may extend to a countable collection.

t

[0,

. n

Let K be a finite set, and let A be a compact metric space. For each k E K, let be a nonatomic, finite, signed Loeb measure on a Loeb measurable space (T, T) . If f is a T-measurable mapping from T to M (A), then there is a T -measurable mapping g from T to A such that for each k E K and for all Borel sets B in A, f f (t ) ( B) ( dt) = (g- 1 [Bl) . This is equivalent to the condition that for each k E K and for any bounded Borel measurable function e on A, l l B( a)f(t)(da )f-Lk( dt) l B(g( t )) !-Lk (dt). Theorem 14.3.2

f-Lk

r

f-Lk

f-Lk

=

Example 14.3.3 Sun and the author show by a counter example in [14] that the Loeb measures of the theorem cannot be replaced with measures absolutely continuous with respect to Lebesgue measure on [0, 1 ] . In their example, the compact set A is the interval [ -1, 1], and the measure-valued map is given by E [0, 1 ] . The example uses just two measures, : = (5t + L t )/2 for each /-L l = >. on [0, 1 ] , and f-L 2 = 2 >. on 1]. If there is a function that works, then it is not hard to see using the functions on A given by and

f(t)

t t [0 ,

g B(a) l a l B(a) a2 =

=

14. Applications of rich measure spaces

212

(t l g(t)1 ) 2 >.. ( dt) g

g(t )

that J01 must take the value t or - t >..- a.e. = 0. That is, on [0, 1 ] . It is also not hard to see that this is impossible. The moral is that a Lebesgue measurable can't switch values on [0, 1] fast enough to work.

fik

To set up the proof of the theorem. we let be the signed internal measure generating for each k E K. Also, we set Po = where is the internal total variation of and c = Then Po is an internal probability measure on and for each k E K, < < Po . Let The probability measu � P is the Loeb measure generated P=

f-Lk

fik

T,

st(c) L.kE K l f-Lk 1.

f-Lk f3k

�L.k EK 1/ikl , L.kEK 1/ik I (T). fik

f3k f-Lk

1/ikl

by P0 , and for each k E K, < < P. Let be the internal Radon-Nikodym derivative of with respect to Po . Since < < P, is S-integrable with � respect to Po and := st is the Radon-Nikodym derivative of with respect to P. We let B denote the collection of Borel subsets of The following lemma is the only part of the proof that needs nonstandard analysis.

fik

f3k

{ik

A.

f-Lk

Let { ¢, : i E N} be a countable, dense (with respect to the sup norm topology) subcollection of the continuous real-val1Led functions on A. As sume that there is a sequence ofT-measurable mappings {gn , n E N} from T to A such that for each i E N and k E K, the sequence c/J2 (gn(t)) f3k (t)P(dt) converges; let c,,k E rn:. denote the limit. Then, there is aTfr-measurable mapping g from T to A such that for each i E N and k E K, l c/J, (g(t)) f3k (t ) P (dt) ci k Proof. For each n E N, let hn : T ---+ *A be a To-measurable lifting of gn with respect to the internal measure Po . Then for each and i E N and each k E K, Lemma 14.3.4

=

,

·

n

and so

�i�

n --t CXJ

( l l *¢i(hn(t)){ik (t)Po(dt) - i,k l ) st

c

hn

= 0.

Using N 1 -saturation, we may extend the sequence to an internal sequence and choose an unlimited integer H E *N so that for every E N and each k E K,

The desired function

t E T.

i

g is obtained by setting g(t) := st (hH ( )) E A at each t

D

14.3. Purification of measure-valued maps

213

( )

Now for the proof of Theorem 14.3.2, we note first that since M A is a compact metric space under the Prohorov metric p which induces the topol ogy of weak convergence of measures , there is a sequence of simple functions Un } �= l from T, T to M A such that

(

)

( )

(

)

Vt E T, nlim p (jn (t) , j(t)) = 0. ---t oo Assume we know that for each E N, there is a T-measurable mapping n

9n from T to A such that for each k E K, and any bounded Borel measurable function e on A ,

fr l e(a) fn(t) (da)p,k (dt) = fr e(gn(t))!Lk (dt) .

Let g = { cPi : i E N} be a countable dense subset of the continuous real-valued functions on A with the sup-norm topology. For a given cPi E g and each t E T, A ¢� (a)fn (t) (da) ----+ A cjJ, (a)j (t) (da) . Moreover, each integral is bounded by the maximum value of l¢i l on A , so by the bounded convergence theorem, for each k E K,

f

f

L cPi (gn (t) )fJk (t)P(dt) L cP2 (gn (t))tLk (dt) = L l cPi (a)fn (t) (da)p,k (dt) ----+ L l cPi (a)j (t) (da)p,k (dt) =

By the lemma, there is a T-measurable mapping g from T to A such that for each k E K,

L cPi (g(t) )tLk (dt) = L cPi (g(t) )f3k (t) P(dt) = L l cPi (a) j(t) (da)tLk (dt) .

Since this is true for each cPi in Q, it is true with cPi replaced by an arbi trary bounded Borel measurable function e on A. Therefore, without loss of generality, we may assume that f : T ----+ M A is simple. Next, we fix a sequence of Borel measurable, finite partitions pm { A , . . . , A } of A such that the diameter of each set in Pm is at most 1/2m , and Pm + l is a refinement of Pm . Also for each l, 1 ::; l ::; lm , we pick a point a[ E A . Since f is a simple function from T to M A , there is a T-measurable partition { Sj }f= l of T such that f = "'/j E M A on Sj . It follows that for

( )

1

=

Z:

[

( ) ( )

14. Applications of rich measure spaces

214

each B E B, and each k E K, fr f(t ) ( B ) p,k (dt) = Lf= l "(j ( B ) p,k(Sj ) . By Lyapunov's Theorem, each 51 can be decomposed by a finite, T-measurable partition {Tf' m , . . . , Tz:":'} so that for every l :::.; lm and k E K, f1k (Tf 'm ) = ,.YJ (A[ )p,k (Sj ) , whence

l j(t) (A[ ) p,k (dt) f; "(j (A[n ) !1k ( Sj ) =

N

N

=

( )

L /1k Tf 'm . j= l

(As an alternative to the above use of the Lyapunov theorem, one can with small modifications of the proof here use the author's Lyapunov theorem [10] , but then all of the simple functions must be modified on a P-null set T0 so that for each of them, the corresponding partition sets Sj of T are internal. ) Now for each m 2: 1 , we define a T-measurable mapping 9m : T ----> A so that for each l :::.; lm and each j :::.; N, 9m ( t) = a[ on T/'m . For each continuous, real-valued 7/J on A , for each m _2: 1 and each k E K,

lm N

= L L ( 7/J (a [ )rj (A[ ) ) f1 k (Sj ) l=l J = l

lm N

( )

= L L 7/J (a[ ) !1 k T/'m l=l j = l

approximates as m ----> oo the integral

£ (l >/� (a)f(t) (da) ) Mk (dt)

�

·

·

t, £ (1r t, � 1, (1r

)

>,b (a) j(t) (da) �k (dt)

)

>,b(a) o, (da) Mk (dt ;

215

References so

1/J (a[ ) !Lk (Tzj,m ) ----. L (1 1/J (a)j(t)(da) ) fLk (dt) It follows from the lemma that there is a T-measurable mapping g from T to lm N =LL l=l j = l

A such that for each k E K and for each continuous real-valued function e on A from a countable dense set of such functions, and therefore for each bounded Borel measurable function e on A,

L B(g(t))tLk (dt) L l B(a)j(t)(da) fLk (dt). =

D

References [1] R. M . ANDERSON, "A nonstandard representation of Brownian motion and Ito integration", Israel J . Math . , 25 (1976) 15-46. [2] .T . L . D ooB, " Stochastic processes depending on a continuous parameter", Trans. Amer. Math. Soc., 42 (1937) 107- 140. [3] A. DVORETSKY, A. WALD and .J . WoLFOWITZ, "Elimination of random ization in certain problems of statistics and of the theory of games", Proc. Nat . Acad. Sci. USA, 36 (1950) 256-260.

[4] A. DVORETSKY, A. WALD and .J . WoLFOWITZ, "Relations among certain ranges of vector measures", Pac . .J. Math. , 1 (1951) 59- 74. [5] A . DVORETSKY, A . WALD and .J . WoLFOWITZ, ""Elimination of ran domization in certain statistical decision problems in certain statistical decision procedures and zero-sum two-person games", Ann. Math. Stat . , 22 (1951) 1 -21.

[6] H . .J . KEISLER,

An infinitesimal approach to stochastic analysis, Memoirs

Amer. Math. Soc.

48, 1984.

[7] H .. T . KEISLER and Y . N . SUN, "A metric on probabilities, and products of Loeb spaces", .Jour. London Math. Soc . , 69 (2004) 258-272.

14. Applications of rich measure spaces

216

[8] 1\ I . A . K HAN , K . P . RATH and Y. N. SuN. "The Dvoretzky-Wald Wolfowitz theorem and purification in atomless finite-action games", In ternational Journal of Game Theory, 34 (2006) 91 - 104.

[9] M . A . KHAN and Y. N . Su N , "Non-cooperative games on hyperfinite Loeb spaces", J. Math. Econ. , 31 (1999) 455-492. [10] P . A . LOEB, "A combinatorial analog of Lyapunov's Theorem for infinites imally generated atomic vector measures". Proc. Amer. Math. Soc . , 39

(1973) 585-586. [ 1 1] P . A . L OEB, " Conversion from nonstandard to standard measure spaces and applications in probability theory", Trans. Amcr. Math. Soc., 2 1 1

(1975) 1 1 3- 122.

Nonstandard Analysis for the Working Mathematician. Kluwer Academic Publishers, Amsterdam, 2000.

[12] P . A . LOEB and M. WoLFF, cds.,

[13] P . A . L OEB, H . OsswALD, Y. SuN and Z . ZHANG, "Uncorrclatcdncss and orthogonality for vector-valued processes", Tran. Amer. Math. Soc . , 356

(2004) 3209 -3225. [14] P . A . LOEB and Y. Su N , "Purification of measure-valued maps", Doob Memorial Volume of the Illinois Journal of Mathematics, 50 (2006) 747- 762. [15] H . L . RoYDEN,

Real Analyszs, third edition. Macmillan, New York, 1988.

[16] Y. SuN. "Hyperfinite law of large numbers", Bull. Symbolic Logic, 2 (1996) 189-198. [17] Y. Su N , "A theory of hyperfinite processes: the complete removal of in dividual uncertainty via exact LLN". J. Math. Econ. , 29 (1998) 419-503. [18] Y. SU N , "The almost equivalence of pairwise and mutual independence and the duality with exchangeability", Probab. Theory and Relat. Fields, 1 1 2 (1998) 425-456.

More on S-measures David A. Ross

15.1

*

Introduction

In their important. (but. often overlooked ) paper [1] . C. Ward Henson and Frank \iVattenberg introduced the notion of S-measura.biz.ity, and showed that S-measurable functions are "approximately standarcP' ( in a sense made precise in the next section) . In a recent paper ([3]), the author used this machinery to transform a well known nonstandard plausibility argument for the Radon-Nikodym theorem into a correct and complete nonstandard proof of the theorem. The problem of finding an "essentially nonstandard" proof for Radon-Nikodym had been one of long standing. A ltho ugh Luxemburg gave a nonstandard proof as long ago as 1972 ([2] ) , he obtained the result as a consequence of another equally deep theorem in analysis clue to Riesz. ( Beate Zimmer [6] has recently proved vector-valued extensions of Radon-Nikoclym starting from the same plausibility argument, though using different standardizing machinery than that in [3] or the present paper. ) In this paper I use an S-measure argument very like the one in [3J to give an intuitive nonstandard proof of the Riesz result used by Luxemburg. Along the way [ gi ve a new proof for the main technical result from [1] on S-measures, a uew noustaudard proof for Egoroff's Theorem, and a nonst.
15.2

Loeb measures and S-measures

I will assume that we work in a nonstandard model in the sense of Robin son , and that this model is as saturated as it needs to be to carry out all constructions; in particular, it is au enlargement. *Department of Mathematics, University of Hawaii, Honolulu, HI 96822. ross@math . hawai i . edu

218

15. More on S-measures

Suppose X a set and A is an algebra on X. There are two natural algebras on *X : Ao = {*A : A E A} (the algebra of subsets of *X) , and *A. Note that except in the simplest cases, (i) neither Ao nor *A are a-algebras; (ii) Ao is external; and (iii) *A is vastly larger than Ao . These leads to two distinct a-algebras:

standard

1 . As

=

the smallest a-algebra containing Ao

2. AL

=

the smallest a-algebra containing *A

Recall that if IL is a (finitely- or countably-additive) finite measure on (X. A) then *IL maps *A to * [0 , oo ) , and in particular is not normally a measure, un less the range of IL is finite. However, 0*1L takes its values in [0 , oo ) , and so (*X, *A, 0*1L ) is an external, standard, finitely-additive finite measure space. The following was first noticed by Loeb. It is an immediate consequence of N rsaturation and the Caratheodory Extension Theorem, or can be proved directly; sec [4] for details. Theorem 1 °*1L VE E AL

extends to a a-additive measure ILL on (*X , AL ) . Moreover, ILL(E)

=

=

inf{ILL (A) : E c:;; A E *A} sup{ILL(A) : A c:;; E, A E *A}

( 1 5 . 2. 1 ) ( 1 5 . 2 . 2)

Of course, the measure ILL remains a-additive when restricted to any a algebra contained in AL, in particular As; this corresponds to replacing *A by Ao as the generating algebra. However, it is not so obvious that the approx imation properties ( 1 5 . 2 . 1 ) and ( 15 . 2 . 2) hold with this replacement as well. The next lemma, which asserts that they do, guarantees that a set E c:;; *X has 0" in the sense of Henson and Wattenberg [1] precisely when E c:;; D for some D E As with ILL(D) = 0. For this lemma, and the duration of the paper, it will be convenient to assume that A is a a-algebra.

"S-measure

Lemma 1 VE E As, �LL (E)

=

=

inf{ �L(A) : E c:;; *A, A E A} sup{IL(A) : *A c:;; E, A E A}

Proof. Let

'B

{ E E As : Vc > 0 ::JA, B E A such that *A c:;; E c:;; *B and IL(B \ A) < c} { E E As : Vc > 0 ::JA, B E A such that *A c:;; E c:;; *B and ILL (*B) < ILL(E) + c and ILL (*A) > ILL (E) - c } .

(1 5.2.3) ( 1 5 . 2 . 4)

219

15.2. Loeb measures and S-measures

E

E

Evidently *A 'B for every A A. It suffices to show that 'B is a a-algebra, so that As � 'B. If A � E � B then B e � Ec � A c and f-LL(B c \ A c ) = f-LL (A \ B) , so 'B is closed under complements. It remains to show that 'B is closed under countable unions. Let E = U n En where En 'B ( n N) increases to E. and let c > 0. For some N, ILL( EN ) > f-LL(E) - c/2, and for some A A, *A � EN and f-LL (*A) > f-LL (EN ) - c/2; it follows that *A � E and f-LL(*A) > f-LL(E) - c. For the exterior approximation, there exists Bn A with En � *Bn and f-LL (*Bn \ En ) < c2 - (n + 1 ) . Put B = U n Bn , then E = U n En � U n *Bn � *B ' and

E

E

E

E

f-L(B)

=

f-LL ( *B)

Call a function

(

)

((

) )

/-L L *B \ U *Bn + ILL U *Bn \ E + f-LL(E) n n n 1 < 0 + L c T ( + ) + f-LL(E) S f-LL(E) + c. n S

f : *X ---+ rn:.

D

approximately standard provided:

1. f is As-measurable;

f

2.

the restriction to JR:. ; and

g = f ix

3.

f :::::J *g almost

everywhere (with respect to

of

to X is an A-measurable function from X

f-LL) .

For example, suppose h : X ---+ rn:. i s a bounded A-measurable function. For r rn:. , 0*h - 1 ( - oo, = n nE N * h - 1 ( - oo , + 1/n) , so 0 *h is As-measurable. For X, = = 0*h(x) , so h = (0*h) lx. It follows that 0*h is approximately standard. The main result of this section is the theorem of Henson and Wattenberg, that As-measurable functions are approximately standard. The proof here differs from theirs. and makes it possible to prove some new results about integrability. Denote by the set of all approximately standard functions. The following enumerates some useful propert ies of

r] x EE h(x) *h(x)

r

all

APS

APS.

Lemma 2

APS is closed under finite linear combmations. 2. If f, g E APS then min{!, g} E APS and max{!, g} E APS. 3. If fn E APS, n E N, and fn mono tonically increases pointwise to f, then f E APS . 1.

220

4.

15. More on S-measures

Let h : X rn:. be an A-measurable function, and put E = {x E *X : l *h(x) l < oo}. Then (i) E E As, (ii) f-LL(*X \ E) = 0, and (iii) 0(*hxE ) E APS (where XE is the characteristic function of E). ---+

Proof.

f, g *X rn:. and (af (3f) l x = (af l x f3f l x ) . 2. As in ( 1 ) , note that max{f, g} l x = max{ f l x , g l x } and min{!, g} l x = min{ f l x , g l x }. 3. Let 9n = fn l x , An = { x E *X : *gn (x) ';f:; fn(x)}, and A = U n EN A n. Since f-LL( An) = 0 for each f-LL( A) = 0. Let g = f i x and note g = upn 9n · Fix a standard c 0, and for E N let En = {x E X : 9n (x) g(x) - c }. Since X = Un EN En, f-LL(*En) = f-L(En) increases to f-L(X) = f-LL(*X), so f-LL(A U (*X \ U nE N *En )) = 0 . Suppose x E ( *X \ A) UnEN *En . For some m E N, x E *E and fm(x) f(x) - c. It follows that *g (x) < *g (x) + c ::::::; fn (x) + c f(x) + c < fm(x) + 2c ::::::; *gm (x) + 2c *g(x) + 2r::n. Since c was arbitrary, *g (x) ::::::; f(x). 4. Put En = {x E X : l h (x) l < } , so E = Un EN *En E As . X = U n EN En, so f-LL(*En) f-L(En) increases to f-L(X) f-LL(*X), and f-L L(*X \ E) 0. 1 . Follows immediately from the observation that if E rn:. then + +

a , (3

---+

n,

s

>

n

>

n

n,

:S

:S

n

=

n

>

=

=

The proof of (iii) is now like the example preceding the statement of this lemma. D

Every As-measurable function is approximately standard. Proof. Note that if A E A then X*A = *xA = 0*XA is in APS by Lemma 2 ( 4) .

Theorem 2

By ( 1 ) and (3) of Lemma 2 and a suitable version of the Monotone Class Theorem (for example, Theorem 3 . 1 4 of [5] ) , every bounded As-measurable function is in APS. If 2 is As-measurable then = supn max{f, n so is in APS by (2) and (3) of Lemma 2. Any general As-measurable can be written = max{!, - max{ so is in APS by (1) and (2) of Lemma 2. D

f 0

f

f

f

},

0} f, 0} For example, suppose E E As ; denote by S(E) the set X n E of standard elements of E . If f is the characteristic function of E then g = f i x is the characteristic function of S(E) . By the theorem, S(E) is A-measurable and f-L(S(E)) = f-LL(*S(E)) = f *g df-LL = f f df-LL = IL L( E) . Corollary 1 Let f : *X be approximately standard, and p 0. Put g = f i x . Then f E £P(ttL) if and only if g E £P(tt), in which case J fPdf-LL = J gPdf-L and l f l v = l g l v · ---+ JR:.

>

221

15.3. Egoroff's Theorem

Proof. Without loss of generality, f 2: 0. Let s 71 : *X ---+ rn:., n E N, be simple functions increasing pointwise to fP. Put t71 = s 71 l x, then t71 is a sequence of simple functions on X increasing pointwise to gP. Moreover, if s71 = L7:=1 CYk XEk then t n = L 7:= 1 CYk XS ( Ek ) , and j s n df-LL = L7:=1 CYk f-LL(Ek ) = '5:."7:= 1 CYk f-L(S(Ek )) = J t 71 df-L. Let n ---+ oo, and it follows that J fPdf-L L and J gPdf-L arc either both infinite or arc both finite and equal. D The next result lets us extend the notion of approximate standardness to completion-measurable functions.

Let f : *X ---+ JR:.; the following are equivalent: (i) f is measurable with respect to the ILL-completion of As; (ii) f i x is measurable with respect to the f-L-completion of A, and f :::::J * (f i x) off a f-LL-nullset in As .

Lemma 3

Proof. Recall (without proof) the standard fact that a function is completion measurable if and only if it agrees with a measurable function off a nullset . (i =? ii) There is an As-measurable f' and a f-LL-nullset A E As such that f = f' off A. Let g = f i x and g' = f' l x , then g' is A-measurable, *g' :::::J f' off a f-LL-nullset B E As, and {x E X : g(x) # g'(x)} � S(A) . Since f-LL(A) = 0, f-L(S ( A)) = 0 by the example above, and so g' is measurable with respect to the f-L-completion of A. Note also that *f-L(S(A)) = 0, so f-LL (A U B U *S(A)) = 0. For x E *X , x rJ. A U B U *S(A) , f(x) = f'(x) :::::J g'(x) = g(x) , proving (ii) . (ii =? i) Let A E As be a f-LL-nullset with f :::::J * (fix) of!:" A. Let g = fix, g' A-measurable such that g = g' off a f-L-nullset B E A, E = {x E *X : l *g'(x) l < oo }, and f' = 0(*g' XE ) · By Lemma 2, f' is in APS. For x rJ. E U *B U A, f'(x) :::::J *g ' (x) = *g(x) :::::J f(x) , so in fact f'(x) = f(x) . Since f' is As measurable, f is completion measurable. D

15.3

Egoroff's Theorem

The most striking application [1] made of the S-measure construction was a proof of Egoroff's Theorem. Their proof relied on a nonstandard condition shown by Robinson to be equivalent to approximate uniform convergence. Here I use the machinery above to give an alternate proof of the theorem not depending on Robinson"s condition. :

Let (X, A, f-L) be a finite meas1Lre space, j, fn X ---+ rn:. measur able, fn ---+ f a. e. Then Vc > 0 3A E A f-L(A) > f-L( X ) - c & fn ---+ f uniformly on A.

Theorem 3

222

15. More on S-measures

Proof. For n E N put 9n = inf m> n fm and h n = supm > n fm · Note that for almost all x, hn (x) - gn (x) is nonnegative and nondecreasing in n with limn_, = hn(x) - 9n(x) = 0. Put ?Jn = 0 *gn , h n = 0 *h n . Let E = {x E *X : limn _,= hn(x) - ?Jn (x) c/c 0}. Note E E As, so fLL(E) = p,(S(E)) = p,(0) = 0. It follows that for some A E A, *A c;;;; E C and p,(A) > p,(X) - c . It remains to show that fn converges to f uniformly on A; equivalently, that hn - 9n converges to 0 uniformly on A. Fix 6 > 0. For x E *A there is a k E N such that hk(x) - gk(x) < 6. Note *h k( x ) - *g k (x) < 6 as well. Let ¢(x) be the least k such that *h k (x) - *g k (x) < 6. The function ¢ : *A ---+ N is internal and finite-valued, so has a standard upper bound N E N. Then *h k - *g k < 6 on *A for every k 2 N, so h k - 9k < 6 on A; this completes the proof. D

15.4

A Theorem of Riesz

Let T be a continuous linear real functional on .C 2 (X, A, fL ); then there is a g such that for every f E .C 2 (X, A, p,), T(f) = J fg dp,.

Theorem 4

g is actually in .C 2 (X, A, p,) . By saturation let A be a *-finite algebra with A0 c;;;; A c;;;; *A . There

After the proof we show that the function

Proof. is an internal *-partition II of *X which corresponds to A in the sense that the latter is the internal closure of the former under hyperfinite unions. Now, define

i (x)

:=

{

*T( xv ) *p, (p )

0,

'

x E p E II and *p, (p )

>

otherwise.

Let a : II ---+ *X be an internal choice function, that is, For any bounded, measurable f : X ---+ rn:.,

T (f) = L * T (*f xv ) p p p

0;

ap E p for p E II. ( 1 5.4. 1 ) ( 1 5.4.2) ( 1 5.4.3) (1 5.4.4)

223

1 5 .4. A Theorem of Riesz

1p � L 1 *j(x yY (x) *df1 p p = J *f :Y *d/1 =

L *f(avYf (x) *d/1

(1 5.4. 5)

p

( 1 5.4. 6) ( 1 5.4. 7)

The steps from ( 1 5.4. 1 ) to ( 15.4.2) and (15.4.5) to ( 1 5.4.6) follow from boundedness of f and the definition of A. For (15.4.2) to ( 1 5 . 4.3) note that by continuity of T, if p E II and *!1(P ) = 0 then *T( xv ) = 0. Moreover, if *f is replaced in the above by any internal function h : *X ---+ *JR:. with the property that h is constant on each p E II, then all but the first equality in the above still holds, and we obtain *T(h) = J hi d*f1 . In particular, if h = XA for some A E A then *T( x A ) = fA :Y d*f1. The proof now proceeds as follows : (i) Show :Y is finite almost every where (and therefore 1 =0 :Y exists almost everywhere) . (ii) Show that :Y is S-integrable (and therefore 1 =0 :Y is integrable) . (iii) Put G = IE[riAs] (the conditional expectation of 1 ) . (iv) Let g be the restriction of G to X; by Theorem 2 g is A-measurable. (v) Show that g works. For (i) , write [:Y < n] = {x E *X : :Y(x) < n } ( n E *N) , and [i < oo] = U n EN [:Y < n] . Suppose (for a contradiction) that f1L ( [:Y < oo] ) < 1 - r for some standard r > 0. Then !1 ( [:Y < n ] ) < 1 - r for each standard n E N, so !1 ( [:Y < H] ) < 1 - r for some infinite H. But then oo > T(1) � *J1:Yd*f1 (by the note above) 2 *J [:Y ? H] :Yd*/1 2 H*!1 ( [:Y 2 H] ) > rH, which is infinite, a contradiction. For (ii) , the reader is referred to [4] for a discussion of S-integrability. In particular, to show that :Y is S-integrable, it suffices to show that VH in finite, *J :Y > H] :Yd*/1 � 0. (Indeed, this can be adopted as the definition of I S-integrability. ) So, fix such an H. Note [:Y > H] E A; it follows that * J :Y > H] :Yd*/1 = I *T(X[:Y > HJ ) · Suppose (for a contradiction) that *T(X[:Y > HJ ) > r > 0, r stan dard. By (i) , *!1 ( [:Y > H] ) � 0. It follows from transfer that for every stan dard n E N there is a Bn E A with T(XBn ) > r and f1(Bn) < 1 / 2 -n . Put B n := Um > n Bm . Then 11(B n ) < L m > n 2 - m = 2 - n + l , but T(XBn ) > r. As n ---+ oo, X;;n ---+ 0 in £ 2 but T(XBn ) > -:;., contradicting continuity of T. It follows from S-integrability of i that 1 =0 :Y is integrable, so its con ditional expectation with respect to As, IE[riAs] , exists; put G = IE[riAs] . (Conditional expectation is defined in the next section, where a simple non standard proof for its existence is given. The reader is referred to chapter 9 of [5] for properties of the conditional expectation.)

224

1 5 . More on S-measures

Finally, let g G l x , which ( as noted above ) is A-measurable by Theorem 2. It remains to show that g satisfies the conclusion of Theorem 4. Let f E J:} (X, A, p,) , and suppose first that f is bounded. This ensures that (*!)1 is S-integrable, and (0*j)r E J:} (*X, AL, p,L) . Then =

J *FI d *p, � J (0*!) { dp,L = j IE W *f)r i A s ] dp,L j C *f) IE [riAs] dp,L = J (0*j)G dp,L = J fg dp,

T(j) �

=

( 1 5.4.8) ( 1 5.4.9) ( 15.4. 1 0) ( 15.4. 1 1 ) (15.4. 12) (15.4. 1 3)

The step from ( 15.4.8 ) to ( 15.4.9 ) is by S-integrability, ( 1 5.4. 10 ) to ( 15.4. 1 1 ) is a standard property of conditional expectation ( using the fact that 0*j is As-measurable ) , and (15.4.12 ) to ( 1 5.4. 13 ) is Corollary 1 with p = 1 . For unbounded f E J:} (X, A, p, ) and n E N, let fn = max { -n, min { !, n } } . By the result above , T(jn ) = J fn g dp,. By continuity of T and Lebesgue's Dominated Convergence Theorem ([ 5 ] , Theorem 5.9 ) , T(j) = J fg dp,. D The Riesz Theorem is usually stated in the following nominally stronger form.

Let T be a continuous linear real functional on .C 2 (X, A, p,) ; then there is a g E .C 2 (X, A, p,) such that for every f E .C 2 (X, A, p,), T(j) = J fg dp,.

Corollary 2

Proof. It suffices to show that any g satisfying the conclusion of Theorem 4 is already in .C 2 (X, A, p,). Since T is continuous, it is bounded, so there is a constant C such that for any f E .C 2 ( X, A, p,) , T(j) � CIIJII 2 · Put 9n = min { g, n } . Then ll9n ll� = Jg� dp, S fgn g dp, = T(gn ) S Cll9n l l 2 · Divide both sides of this inequality by ll9n ll 2 and square, obtain J g� dp, ll9nll � S C 2 . The result now follows by Fatou's Lemma ([5] , Lemma 5.4) . D

15.4. A Theorem of Riesz 1 5 .4 . 1

225

Conditional expectation

Theorem 5 Suppose (X, A, f-L) zs a probabdzty measure, that 'B C:: A is another a-algebra, and that j E ,.C} (X, A, tt) . There is a 'B -measurable function g : X ___, ffi. such that for any B E 'B, f df-L 9 df-L.

JB

=

JB

The function 9 is called the conditwnal expectation of f on 'B, and denoted by IE [f i'B] . To avoid circularity, it is necessary to show that IE [f i 'B] exists without use of the Riesz Theorem (or related results, such as the Radon-Nikodym Theorem) . Such a proof appears in [3] . This section presents a modification of that proof which is even more elementary, in that it does not require the Hahn decomposition. Without loss of generality f 2 0. Put 9 {9 E ..CJ (X, 'B, f-L) : VB E 'B , 9 df-L <:.:: f df-L }. Note if 9 1 , 92 E 9 then max{9 1 , 92 , 0} E 9 . Let r = sup{fx gdf-L : g E 9}. and for n E N let 9n E 9 with fx gn df-L > r - l/2 n ; we may assume that 0 <:.:: 9 1 <:.:: 92 <:.:: · · · . Put 9 supn 9n · Note that if B E 'B , 9 df-L = supn 9n df-L <:.:: f df-L, so 9 E 9. Moreover, fx 9 df-L = r . It remains to show that for any B E 'B. f dtt = 9 df-L. Suppose not; then there is a B E 'B and c > 0 with f df-L - 9 df-L = c. For some 6 > 0, c 1 = JB (j - g)df-L > 0 where g = 9 + 6 · g almost witnesses a contradiction; the perturbation by 6 needs to be localized to a slightly smaller set. As in the proof of Theorem 4 let 23 be a * -finite algebra with {*B : B E 'B } C:: 23 C:: *'B, and let II be the internal hyperfinite *-partition of *X corresponding to 23 . Let B + = U{b E II : * (! - g )d *f-L > 0}. Put s = o * (! - g )d *f-L . Note that for every C E 'B , if C C:: B then Jc(f - g) df-L = * (! - g )d*f-L <:.:: s : in particular. c1 <:.:: s. For n E N consider the statement,

JB

JB

JB

=

JB

JB

JB

=

JB

JB

JB+ f.c

Jb

As this holds in the nonstandard model (with B + transfer in the standard model. For m E N put Bm B00 n m B m . Note B= E 'B and B = C:: B, so g)df-L > s - 2 -n , the other hand. since always Ln >m 2 - n = s - 2 -m , therefore . (j - g)df-L 2 s . Put 9 1 = 9 + If C E 'B, =

6XB=· _

for Bn) , it holds by U n >m Bn , and put (j - g)df-L <:.:: s. On

=

JB=

JBJJ JB

JB m (j - g)df-L > s

CX)

Jc\ Bx u

r u 9') df-L = r

Jc

JB XB

_

9) df-L +

.fCnB=u - g) df-L .

15. More on S-measures

226

The first term in this sum is nonnegative since g E 9 . The second is nonnegative since otherwise g')df-L 2 0, so g' E fj )df-L > s. It follows that 9 . Since s > 0, !-L(BcxJ) > 0, so g' df-L g df-L + 6f-L(B00) > r. a contradiction.

JBx\C ( j -

J

=

J

Jc ( J -

References [1] C . WARD HENSON and FRANK WATTENBERG , "Egoroff ' s theorem and the distribution of standard points in a nonstandard model", Proc. Amer. Math. Soc., 81 ( 1981) 455-461 . [2] W . A . J . LUXEMBURG, O n some concurrent binary relations occurring in analysis, in Contributions to non-standard analysis (Sympos., Oberwol fach, 1970), Studies in Logic and Found. Math. , Vol. 69, North-Holland, Amsterdam. 1972. [3] DAVID A. Ross, Nonstandard measure constructions - solutions and problems, in Nonstandard methods and applications in mathematics, Lec ture Notes in Logic, 25, A.K. Peters, 2006. [4] DAVID A . Ross. Loeb measure and probability, in Nonstandard analysis (Edmburgh, 1996), NATO Adv. Sci. Inst. Ser. C. Math. Phys. Sci., vol. 493, Kluwer Acad. Publ., Dordrecht, 1997. [5] DAVID WILLIAMS, Probability with martingales, Cambridge Mathematical Textbooks, Cambridge University Press, Cambridge, 1991. [6] B EATE ZIMMER. "A unifying Radon-Nikodym theorem through nonstan dard hulls", Illinois .Journal of Mathematics, 49 (2005) 873-883.

A Radon-Nikodym theorem for a vector-valued reference measure G. Beate Zimmer*

The conclusion of be represented

a

Abstract

Radon-Nikodym theorem iH that

as �m

that for all measurable sets Lebesgue)

a

measure f-1. can

integral with respect to a reference measure

A, p(A)

such

JA .f1,. (x) d).. with a (Bochner or

=

integrable derivative or density .fw The mcasme ).. is usually a

countably additive <J-finit.e measure on the given measure space and the

mcasmc JL is absolutely continuous with respect to ).. . Different theorems

have different range spaces for p., which could be the real numbers, or Banach s paces with or without the Radou-Nikodym property.

Iu this

paper we ge1ieralize to derivatives of vector valued measures with re spect a vector-valued reference measure. theorem for vector

measures

Vvc present a R.adon-Nikodym

of bounded variation that arc absolutely

continuous with respect to another vector measme of bounded variation. W'hile it is easy in settings sud1 on the interval

[0, 1]

as

Jl < <

,\ , where ,\ is Lebesgue measure nonstan

and Jl· is vector-valued to write down a.

clard Radou-Nikodym deri vative of

Lhe form t.p :

*[0, 1]

-->

fiu ( *E) by

t.p1, (:1:) 2:!1 :��:\ lA, (.1; ) , a vector valued reference measure does uot. =

allow this approach.

as

the quotient of

two vectors

in different B anach

spaces is undefined. Furthermore, generalizing to a vector valued control mcmmre uecessita(;es the usc of a generalization of the Bartle

bilinem· vector integral.

16 . 1

integral , a

Introduction and notation

For noustandard notions and notatious not defined here we refer for exam

ple to the book by Albeverio, Fenstad, 1-Ioegh-Krohn and Lindstrom

t.he survey on nonstandard hulls by Henson and 1\tloore *Department of Tvfathematics and Statistics, Texas Corpus

Christi, TX 781112.

[1]

and

[6] or the introduction

A&M Universit.y -

Corpus Christi,

16. A Radon-Nikodym theorem

228

to nonstandard analysis in [7] . Plenty of information about vector measures can be found in [5] . Throughout, D is a set and I: is a a-algebra of subsets of D . E, F and G denote Banach spaces. f-L : I: ---+ E and v : I: ---+ F are countably additive vector measures of bounded variation. The variation of v is defined as I vI (A) = sup L 1rEII llv(A") II where A E I: and II is a finite partition of A into sets in I: . B y Proposition I . l . 9 i n [5] the variation lvl of a countably additive measure v is also a countably additive measure on I:. We think of the nonstandard model in terms of superstructures V(X) and V(*X) connected by the monomorphism * : V(X) ---+ V(*X) and call an ele ment b E V(*X) internal, if it is an element of a standard entity, i.e. if there is an a E V (X) with b E *a. We assume that the nonstandard model is at least N-saturated, where N is an uncountable cardinal number such that the cardinality of the a-algebra I: of I vi-measurable subsets of D is less than N. Let (*D, Llvi (*I: ) , lvl ) denote the Loeb space constructed from the nonstan dard extension of the measure space (D, I:, I v i ). This measure space is obtained by extending the measure 0*1vl from *I: to the a-algebra generated by *I:. _I he completion of this a-algebra is denoted by Llvi (*I: ) . The Loeb measure l v l is a standard countably additive measure. The functions we work with take their values in a Banach space E or its nonstandard hull E. The nonstandard hull of a Banach space E is defined as E = fin(* E)/ the quotient of the clements of bounded norm by clements of infinitesimal norm. The nonstandard hull is a standard Banach space and contains E as a subspace. We denote the quotient map from fin(*E) onto E by or 'if£ . Whenever we encounter products of Banach spaces, we equip them with the floo norm: l i e J l l = max(ll e l l , I I J I I ) . B y a result of Loeb in [8] , there exists an internal *-finite partition of *D consisting of sets A1 , . . . , AH E *I: (H E *N ) such that the partition is finer than the image under * of any finite partition of D into sets in I:. The proof uses a concurrent relation argument. From this fine partition we discard all partition sets of *!v i -measure zero. The remaining sets still form an internal collection, which we also denote by A1 , . . . , AH . ::::J ,

1r

16.2

X

The exist ing literature

Ross gives a nice detailed nonstandard proof of the Radon-Nikodym the orem for two real-valued a-finite measures f-L < < v on a measurable space in [9] . Previously we have studied nonstandard Radon-Nikodym derivatives of vector measures tt : I: ---+ E with respect to Lebesgue measure on [0, 1 ] . We have shown that through nonstandard analysis a unifying approach to

16.2. The existing literature

229 or

Radon-Nikodym derivatives independent of the Radon-Nikodym property lack thereof of E can be found. The generalized derivatives we constructed are not necessarily essentially separably valued and therefore not always Bochner integrable. However, a generalization of the Bochner integral in [10] for func tions with values in the nonstandard hull of a Banach space allows one to integrate the generalized derivatives. In [ 1 1] with methods similar to those used by Ross [9] or Bliedtner and Loeb in [3] , and with the use of local reflex ivity we "standardize" the generalized derivatives from maps *D ---+ E to maps D ---+ E" with values in the second dual of the original Banach space E. The vector measures book [5] by Diestel and Uhl is still considered as the authoritative work on Radon-Nikodym theorems. The standard literature yields very little on vector-vector derivatives; only Bogdan [4] together with his student Kritt has made an initial foray into this field. In their article they assume that they have two vector measures f-L : I: ---+ E and v : I: ---+ F with f-L < < I v 1 . and they assume that the range space of f-L has the Radon-Nikodym property and the range space of v is uniformly convex. They then define a derivative and integral in the form f-L(A) = fA ��I (w) t l (w) dv, where tl is a function from D into F', the dual space of the range of v . The integrand is then of the form �� ( w) = e . !' ' a function from n into E X F'. A trilinear product on E x F x F' with values in E can be defined as e · J' · f f' (f) · e. This is needed to integrate simple functions of the form L�=1 e, . !I · 1 A , : n ---+ E X F' with respect to the F-valued measure v as follows: ·

=

L t e; · JI · 1A, dv = t e; · JI · v(A; ) = t e, · J; (v(A; )) E E . ,=1

,=1

,=1

The main difficulty is the definition of tl , the derivative of a real-valued measure with respect to a vector-valued measure, the opposite setting from the ordinary Radon-Nikodym theorems. Bogdan uses his assumption that F is uniformly convex to find this derivative. Since v < < l v l and since uniform convex spaces are reflexive and hence have the Radon-Nikodym property, there is a Bochner integrable function d��l : D ---+ F such that for all measurable sets A, the vector measure can be written as a Bochner integral v(A) = fA d�� l dl v l . A standard theorem ( e.g. Theorem 11.2.4 in [5]) about vector measures states that the norm of the Radon-Nikodym derivative is a Radon-Nikodym derivative for the total variation of the vector measure, i.e. for all A E I: l v i (A)

=

i l d�� d l v l .

This implies that I vi-almost everywhere on D, II d��1 1 1 1 . For uniformly convex spaces F there is a continuous map g : Sp ---+ SF' , where Sx denotes =

230

16 .

A Radon-Nikodym theorem

the unit sphere of X, such that ( !, g(f)) = 1 for all f E Sp. In this case, the composition g o d�� l is a map from D into the unit sphere of F'. The derivative of f-L with respect to v is defined as �� · (g o d�� l ) : D ----+ E F', or, if one regards the product e f' as a rank one continuous linear operator from F to E, the derivative can be considered a map: �� : D ----+ L(F, E) . Since d��l and d�� l arc both Bochner integrable and g is continuous and g o d�� l is almost everywhere of norm one, it is easy to show that the result is integrable with respect to the vector measure v with an integral that uses approximation by simple functions. =

x

16.3

x

The nonstandard approach

With nonstandard analysis we can significantly weaken the assumptions on the Banach spaces E and F. We use Bogdan's idea of writing the derivative dlvl -_ djV[ dp · ----;IV dfl . (g o djV[ dv ) . as djV[ The derivatives ��I and d��l are derivatives of a vector measure with respect to a real valued measure. They can be found and integrated with the methods described in [10] , [11] or [12] . In [10] we defined a Banach space M(lvi, E) of e�ended integrable func tions as the set of equivalence classes under equality lvl-almost everywhere of functions f : *D ----+ E for which there� an internal, *-simple, S-integrable f l vl-almost everywhere on *D. Such a cp : *D ----+ *E such that 71£ o cp cp is called a lifting of f. On M ( lvl, E) we defined an integral by setting JA f d!vl 71 ( fA cp d * lvl ) , for all A E *I:, where the integral of the internal simple function cp is defined in the obvious way. This integral can be extended to sets in the Loeb a-algebra and generalizes the Bochner integral in the sense that M ( lvl, E) contains L1 ( lvl, E) and the integrals agree on that subspace. However, .iH(Ivi, E) also contains functions which fail to be essentially separa bly valued and hence fail to be measurable. Lemma 1 in [11] translates to this situation as: =

=

Let E be any Banach space and let f-L : I: ----+ E be a ! v i absolutely continuous countably additive vector measure of bounded variation. Then the internal *-simple function cp 11 : *D ----+ *E defined by

Lemma 16.3 . 1

is S -integrable, where A1 , . . . , AH is the fine partition of *D introduced above.

16.3. The nonstandard approach

231

S-integrability implies that the function is l v l -almost everywhere fin(*E) valucd. This allows us compose an S-intcgrable internal function with the quotient map 71E : fin(*E) ----+ E to make it a nonstandard hull valued function defined on the Loeb space ( *D , L i v i (*L: ) , !vi) . Define f11 E M(lvi, E) by

f!l = 71£ o CfJw

Theorem 4 in [11] asserts that if the vector measure f-L has a Bochner integrable Radon-Nikodym derivative, then the generalized derivative "is" the Radon Nikodym derivative in the following sense:

Let f-L : I: ----+ E be a countably additive vector measure with a Bochner integrable Radon-Nikodym derivative f : D ----+ E. Define the gener alized Radon-Nikodym derivative f11 : *D ----+ E as above. Then 71£ o *f = f11 on each set Ai in the fine partition of *D.

Theorem 16.3.2

Of course, the same arguments also work for finding a F-valued generalized derivative fv E lVI(Ivi, F ) of the vector measure v : I: ----+ F with respect to its total variation I v i . Differentiating v with respect to its total variation gives one extra feature of the derivative:

Let F be a Banach space and let v : I: ----+ F be a countably additive vector measure of bounded variation. Define

Lemma 16.3.3

Then fv = 71F o C{Jv is I vi -almost everywhere of norm one. Since v << lvl , the S-integrability of fv follows from Lemma 16.3.1 . The norm condition follows from an adaptation of a basic standard result (Theorem 11.2.4 in [5] ) : if f is Bochner integrable and a vector measure F is defined by F(A) = fA f dlvl, then I F I (A) = fA ll f l l dl v l . By construction of the generalized derivative, Proof.

for all A E *I:. In this situation it implies that * l v i (A) = * d* l vl and hence * II CfJv (w) ll

=

l I d��

1 for lvl-almost every w E *D.

D

232

16.4

16. A Radon-Nikodym theorem

A nonstandard vector-vector integral

The next theorem allows us to generalize the generalized Bochner integral to a vector-vector integral which for any simple function obviously agrees with the Bartle integral developed in [2] . For the definition of the Bartle integral, E, F and G are Banach spaces equipped with a continuous bilinear multipli cation E x F ----+ G and v : I: ----+ F is a countably additive vector measure of bounded variation. Such a product can arise by taking E F G to be a C*-algebra, or by taking E = F' and G = rn:., or by taking E = L ( F, G ) . A function f : D ----+ E is Bartle integrable if there is a sequence of simple functions sn that converges to f almost everywhere and for which the se quence An ( A ) fA S n dv converges in the norm of G for all A E I:. Then fA f dv = limn_, = >-n (A ) exists in norm uniformly for all A E I:. Essentially bounded measurable functions are Bartle-integrable. The Bartle integral is a countably additive set function. =

=

=

Let E, F and G be Banach spaces equipped with a continu ous bilinear multiplication E x F ----+ G. Assume that v : I: ----+ F is a countably additive vector measure of bmmded variation and f E Af (!vi, E) . The integrand f has an internal lifting i.p J and there is an internal *-simple lifting i.pv of the generalized derivative of v with respect to I v i . Then for all A E *I:, the integral fA f dv defined by

Theorem 16.4 . 1

l f dv

= no

(l

i.pJ · i.pv

d* l v l

)

is a well-defined countably additive G -valued integral. The function f E 1\!I (!vi, E) has a internal, S-integrable and *-simple lifting i.pJ : *D ----+ fin ( *E ) . By Lemma 16.3.3 i.pv : *D ----+ fin ( *F ) is internal, *-simple, S-integrable and of norm one almost everywhere. Continuity of the multiplication E x F ----+ G implies that the product i.p j ' i.pv is an S-integrable and internal *-simple function from *D ----+ fin ( *G) . This means that it is a lifting of an extended Bochner integrable fun�on in M(!vi, G ) and hence the G-valued generalized integral with respect to l v l applies and gives a well-defined vector vector integral. Since the extended Bochner integral is countably additive, so is this integral. D Proof.

The use of nonstandard analysis simplifies standard arguments here, since in the standard setting this argument would only work if the Banach space F had the Radon-Nikodym property.

16.5. Uniform convexity

16.5

233

Uniform convexity

Bogdan needed the assumption that F is uniformly convex to convert the F-valued derivative d�� l into the F'-valued derivative tl . He composed the derivative d�� l with a continuous map g from the unit sphere of F to the unit sphere of its dual space F' such that for all f E Sp, (!, g(j)) 1 . The resulting map g o d�� l : D ----+ F' is lvl measurable and inherits the integrability of d�� · If the Banach space F happens to be uniformly convex, we compose the nonstandard extension of g with VJv. In this case, (!, g(j) ) 1 for all . f E SF t ransl ates . t o /\ **l vv i((AA,, )) , *g *l*vv (i (AA,,)) - 1 for . - 1 , . . . , H 1.e. =

(

m

))

can

-

z -

=

Let E be any Banach space and let F be a uniformly convex vector space. If the vector measure f-L : I: ----+ E is absolutely continuous with respect to the total variation of the vector measure v : I: ----+ F, then there is an extended Bochner integrable function f : *D ----+ ExF' S1Lch that for all A E I:

Theorem 16. 5 . 1

f-L(A) =

J*A f dv.

Define VJ J Yp · (*g o VJ v ) , where Yp is a lifting of the generalized derivative d��l , tpv is a lifting of the generalized derivative d�� l and g is the continuous map from Sp to Sp . This lifting is internal, S-integrable, *-simple and takes its values in fin(*E) *Sp . Define f 71ExF' o :PJ · Then

Proof.

=

fn f d v

x

=

71£ 7IE rrc

(!n tp (w)

=

) (�� *lvi*f-L(Ai)(Ai) . *g ( *lv*v(Ai)i (Ai) ) *v(Ai) ) (f ,;��(�;) 'l vi (A;)) p

· *g ( VJv ( w ))

d *v

·

71£ ( *f-L(*D))

f-L(D) ,

and similarly for f-L(A) , where (Ai) is replaced by (A; n *A) throughout. Since the fine partition refines the standard partition of D into A and its complement, Ai n *A is either the empty set or A,. D

16. A Radon-Nikodym theorem

234

If E has the Radon-Nikodym property, then this lifting coincides with the image of the derivative found by Bogdan in the nonstandard hull of the space of vector-vector integrable functions on D.

16.6

Vector-vector derivatives without uniform convexity

We can weaken the assumption on F: it follows from the Hahn-Banach theorem that for each nonzero f E F there is an f' E F' with 1 1 .!' 1 1 1 and f'(f) = llfll · Without uniform convexity, the choice of f' is not necessarily unique, and the map f f----+ f' need not be continuous on the unit sphere of F. We can forego the use of a function defined on all of Sp. All we need is a derivative on each set A, in the fine partition of *D. =

g

Let F be any Banach space and let v : I: ----+ F be a countably additive vector measure of bounded variation. Then there is a function 9i vl E M(lvi, P) such that Theorem 16.6. 1

1A 9ivl dv = o(*lv i ( *A)) = lvi (A)

for all A E I: .

Proof. As the collection of sets A, is internal, we can use the axiom of choice to get an internal collection of norm one functionals E *Sp with ( *v(A; )) = * lv i (A; ) for i = 1 , . . . , H.

g;

g;

Define an internal *-simple S-integrable function .

1/Ji vl

=

H

L i=l

:

1/J i vl *D ----+ *Sp by

g; . A

1 ,·

This function is a lifting of an extended integrable function 9i vl E M(lvi, P) . We can define a continuous bilinear multiplication F' x F ----+ JR; by f' · f = f' (f) and then use the integral from theorem 16.4. 1 . In this case the quotient map no becomes the standard part map o : fin (*JR;) ----+ R By the choice of the fine partition. A; n *A is either the empty set or equals Ai. d*v dv = o

lA 9ivl

(lA 1/l i vl ) 0(1A t, g; ·1A, d*v)

16.6. Vector-vector derivatives without uniform convexity

(� o(� o

gi · *v(Ai n *A) *lvi (Ai n *A)

o ( * l v i ( *A ))

)

235

) =

for any set A E I:. This finishes the proof, as for all A E I:, lvi (A)

0 ( * 1 v i ( *A )) . D

The last theorem constructed a derivative for any countably additive vector measure of bounded variation with respect to its total variation as an element of M ( lvl, P) . This was the main difficulty in differentiation with respect to a vector measure. Now we can differentiate one vector measure with respect to another vector measure. :

Theorem 16.6.2 Let E and F be any Banach spaces, let f-L I: ----+ E and v I: ----+ F be a countably additive vector measures of bounded variation such that f-L < < lvl . Then there is a function h E M(lvi, ExF' ) such that for all A E I: f-L ( A ) h dv. :

=

j

*A

x

x

The continuous trilinear multiplication E F' F ----+ E defined by f' (f) · e extends to a continuous bilinear multiplication ExF' F ----+ E defined by � f') n(f) 7rE(J'(f) · e). The extended integrable function h *D ----+ E F' is defined through its lifting cp cp11 · 1P1v1 , the product of the lifting cp11 of d��l E M(lvi, E) and the lifting 1/J i v l of t l E M(lvi, P) found in theorem 16.6. 1 . Setting Proof. x

x

e f' f

= :

x

x

=

x

=

*

x

we obtain a simple internal function with values in fin ( *E ) *SF' . The S-integrability of cp follows from the S-integrability of cp11 and Lemma 16.3.3 which asserts that *111/l i v l ll :::::J 1 almost everywhere. Then for all A E I: 1rE ( *f-L ( *A )) f-L (A )

"" (f (�� 1rE

M ( A; n 'A )

'

)

*fJ (Ai n *A ) * l v i ( A n *A) * l vi (Ai n *A ) ·

2

)

16. A Radon-Nikodym theorem

236

:

and h *D . df.lv t 1ve d .

16.7

----+

ExF' is an extended integrable generalized vector-vector derivaD

Remarks

All the constructions above depend on the choice of a fine partition A 1 , . . . , AH of *D that refines all finite standard partitions of D into measurable sets. A different choice of the partition would yield different derivatives, especially in the case of non-uniformly convex Banach space F. However, for the standard analyst there is no noticeable difference between different choices of fine partitions, as the integrals over any standard set will be the same, no matter which fine partition is used. Representing a vector measure v I: F as an integral of its generalized derivative d1� l simplifies some of the results on Loeb completions of internal measures [13] by Zivaljevic. a

:

----+

References [1] S. ALBEVERIO, J . E . FENSTAD, R. H 0EGH-KROHN and T. LINDSTR0M,

Nonstandard methods in stochastic analysis and mathematical physics,

Academic Press, Orlando, F L, 1986.

[2] R. G . BARTLE, "A general bilinear vector integral", Studia Math. , 15 ( 1956) 337-352. [3]

.J . BLIEDTNER

and P . A . LOEB, "The Optimal Differentiation Basis and Liftings of L00", Trans. Amer. Math. Soc., 352 (2000) 4693-4710.

[4] W . M . BOGDANOVICZ and B. KRITT, "Radon-Nikodym Differentiation of one Vector-Valued Volume with Respect to Another", Bulletin De L' Academic Polonaise Des Sciences, Scric des sciences math. astr. ct phys. , XV ( 1967) 479-486.

References

237

[5] J. DIESTEL and J . J . U HL , Vector Measures, Mathematical Surveys 1 5, American Mathematical Society, Providence, RI, 1977. [6] C .W . HENSON and L. C . MOORE, Nonstandard Analysis and the the ory of Banach spaces, in A.E. Hurd (ed.) Nonstandard Analysis-Recent Developments, Springer Lecture Notes in Mathematics 983, 1983. [7] A . E . HURD and P .A. LOEB, An Introduction to Nonstandard Real Anal ysis, Academic Press, Orlando, FL, 1985. [8] P . A . LOEB , "Conversion from nonstandard to standard measure spaces and applications to probability theory", Trans. Amer. Math. Soc., 2 1 1 (1 975) 1 13- 122. [9] D. A. Ross, Nonstandard Measure Constructions - Solutions and Prob lems submitted to NS2002, Nonstandard Methods and Applications in Mathematics, June 1 0-16 2002, Pisa, Italy. [10] G . B . ZI M MER , "An extension of the Bochner integral generalizing the Loeb-Osswald integral", Proc. Cambr. Phil. Soc., 123 ( 1998) 1 1 9- 131. [ 1 1] G . B. Zrl\IMER, "A unifying Radon-Nikodym Theorem through Nonstan dard Hulls", Illinois Journal of Mathematics, 49, (2005) 873-883. [12] G . B. Zil\IMER, Nonstandard Vector Integrals and Vector Measures, Ph.D . Thesis, University of Illinois at Urbana-Champaign, October 1994. [13]

R.

T. Z IVALJEVIC, "Loeb completion of internal vector-valued measures", Math. Scand. , 56 (1985) 276-286.

Differentiability of Loeb measures Eva Aigner*

Abstract We introduce a general definition of S-diffcrcntiability of m1 internal measure and compare different special cases. It will be shown how S-differentiability of an illtcrnal measure yielc Is differeutiability of the associated Loeb measure. vVc give some exmnplcs.

17. 1

Introduction

In this pa per we present some new results about differential properties of Loeb measures. The theory of differentiable measures was sugges ted by Fomin [9] as aJl infinite dimensional substitute for the Sobolev-Schwartz theory of distributions and has extended rapidly to a strong field of research . In partkular during the last ten years it has become the foundation for many applications in different fields such as quantum field t heory ( sec e.g. [10] or [14]) or stochastic analysis ( see e.g. [4] , [5] , [6] and [151 ) . There are many different. notions of measure differentiability. The following two are most, common (see e.g. [3[ , [6[ , [9[ , [13[ and [151). Let v be a Borel measure on a locally convex space E and y an element of E. l) The measure 11 is called Fom·in-d-ifferentiable along y if for all Borel sub sets B C E the limit lim

r�O, rElR

11(B + ·ry) - z; ( B) T

ex i sts . v is called Skorohod-differentiable with respect to y, if there exists another Borel measure v1 on E, such that for all continuous real-

2) The measure •

Ludwig-Maximilians- Universitiit , Miincheu , Germany.

17.1. Introduction

239

valued bounded functions g on E limr

r ---+ 0.

J g(x - ry) dv( x) - J g(x ) dv( x) ER r

=

J g(x ) dv'(x ) .

The relationships between these two approaches were studied by Averbukh, Smolyanov and Fomin [3] and Bogachev [6] . A very general definition of differentiability of a curve of measures is given by Weizsacker in [16] . In this paper we define measure differentiability as follows.

Let (D, :F, v) be a measure space, (vr) - c
=

=

J

Note that Definition 1 7. 1 . 1 covers the cases mentioned above, since for a locally convex space D and a fixed vector y E D a curve (vr ) - c
=

:

1 7. Differentiability of Loeb measures

240

17.2

S-different iability of internal measures

Standing Assumption

Throughout this paper let D be an internal set, A an internal *t'Y-field on D and f-L :2: 0 an internal *t'Y-additive S-bounded measure on A. Let ( !-Lt)tEJ be an internal curve of nonnegative *t'Y-additive S-bounded measures on A. Since we want to obtain an external curve of Loeb measures, we assume that for some c E rn:. + the internal parameter set J is either an interval of *JR containing the standard interval I =] - c; c [ or J is a discrete interval { Jl , -r;-l , . . . , ki/ , fl } with H E * N \ N, k E { 1 , . . . , H } and c :'S: .fl . Moreover, the internal curve ( !-Lt)tEJ shall be S-continuous in the following sense. If t, s E J with t :::::J s, then f-Lt( A) :::::J f-Ls (A) for each A E A. Finally we assume that f-L = f-LO · We now introduce S-differentiability for internal measures.

Suppose we have a (not necessarily internal) set C of in ternal *JR:.-valued functions on D, each being A-measurable and S-bounded. We say that the internal measure f-L is S -differentiable with respect to the set C if there exists an internal S- bounded signed measure f-L1 on A so that for all f E C and for all infinitesimals t E J, t # 0 J f(w) df-Lt (w) - J f(w) df-L(w) :::::J f(w) df-L (w ) .

Definition 17.2.1

We call f-L1 an (internal) derivative of f-L ·

J

'

Note that a derivative is not uniquely determined by the above definition. If f-L is S-differentiable with respect to a set C and if for some infinitesimal t E J the internal measure 11 ' ;11 has limited values, then 11 ' ;11 is a derivative of f-L· In this section we will regard and compare S-differentiability for differ ent sets C of functions. Following the standard literature we define S-Fomin differentiability.

The internal measure f-L is called S -Fomin-differentiable if the differentiability is with respect to C = {1 A A E A} where 1 A is the indicator functwn. Note that in the case of S-Fomin-differentiability each measure 11 ' ;11 with t :::::J 0, t E J \ {0}, is a derivative of f-L · The following proposition shows the

Definition 17.2.2

:

power of S-Fomin-differentiability.

If f-L is S-Fomin-differentiable and f-L1 is a derivative of f-L, then f-L is S-differentiable wzth respect to the set C of all S-bounded *JR:. valued A-measurable functions. The Fomin-derivative 1/ is also a derivative with respect to C . Proposition 17.2.1

17.2. S-differentiability of internal measures

241

Let f-L be S-Fomin-differentiable and let f-L1 be a derivative of f-L · Let t # 0 be an infinitesimal of J and set [L := 11 ' ;'" . Since f-L1 :::::J [L on A we obtain J f(w)df-L'(w) :::::J J f(w)d[L(w) for all S-bounded A-measurable functions f : D ----+ *R D Proof.

When considering measures with Lebesgue densities, then the following lemma is useful. =

Let D *JR:., A the internal field of Borel subsets, y a fixed element of *JR:., f-L an internal S -bounded measure on A. Take J = *JR:. and for all t E J define f-Lt (A) = f-L(A + ty) for all A E A. Assume that f-L has an internal Lebesgue density f satisfying the following three conditions: 1) f 2 0, f = 0 only on a set of internal Lebesgue measure 0. 2) f is * differentiable in the direction ofy (with derivative f� (x) := f'(x) · y) . 3) If t 0, then for all x E *JR:. with f (x) # 0 f� (x) � f(x + ty) _ 1 . t f(x) f(x)

Lemma 1 7. 2 . 1

:::::J

(

)

:::::J

Now if the internal f1mction (3� : *JR:. ----+ *JR:., defined by f� (x) . (3� (x) = f(x) if f(x) # 0 0 if f(x) 0,

{

=

is 511 -integrable, then f-L is S-Fomin-differentiable and if f-L1 is a derivative, then for all A E A f-L1 (A) (3� (x) df-L(x) . :::::J

l

Since (3� is 511 -integrable, A f---+ JA (3� (x)df-L(x) defines an S-bounded measure. Now let N = {x E *JR:. : f(x) = 0}. Then >-(N) = 0, where A is the internal Lebesgue measure. If t 0, t # 0, then

Proof.

:::::J

I

!-Lt (A) � f-L(A)

- l (3� (x)df-L(x) l l l f(x + ty) - f(x) - f3J,(x) . f(x) l d>-(x) r l � ( f(x + ty) _ 1 ) - f� (x) l df-L(x) f(x) f(x) JA \ N t �

= :::::J

0.

Here we have used condition 3 ) and the fact that f-L is S-bounded.

D

1 7. Differentiability of Loeb measures

242

Example 1 7. 2 . 1 Let D = *JR:., A be the internal field of Borel subsets and f-L defined by f-L(A) fA l;x2 d>.. ( x) . Fix an element y E *JR:. and define the curve by f-Lt (A) .JA + ty l ;x2 d>.. ( x) , t E *JR:.. If y is limited. then it ' s easy to see that the assumptions of Lemma 17.2.1 are satisfied. Hence the measure f-L is S-Fomin-differentiable and if f-L1 is a derivative, then f-L'(A) :::::: fA ��¥ df-L(x) for each A E A. If y is an unlimited element of *JR:., then f-L is not S-Fomin differentiable, because for t = � and for the internal interval A = * [ 0 , 1] C *JR:. the value flt ( A);p ( A) is unlimited. =

=

Note that Lemma 17.2.1 is also true for D *JR:.L , L E *N. In the author's thesis [1] this is used to show the S-Fomin-differentiability of a nonstandard representation of the Wiener measure introduced by Cutland and Ng in [8] . We now turn to other forms of S-differentiability. Again. following the standard literature, we define S-Skorohod-differentiability. =

Remark

Let D be a subset of *M where NJ is a metric space and let A be an internal *a-field on D. The measure f-L is called S -Skorohod differentiable if it is S -differentiable with respect to the set of all S -bmmded, A-measurable functions f : D *JR:. that are S-continuous. This means, f ( w) :::::: f ( w) for w , w E D, w :::::: w . Definition 1 7.2.3

----+

I f D is as described in Definition 17.2.3, then, as a consequence of Proposi tion 17.2. 1 , S-Fomin-differentiability implies S-Skorohod-differentiability. The following example shows that the converse is not true. =

Fix a natural number H E *N \ N and let D c *JR:., D * Z } . The measure f-L is the counting measure defined on the field of internal subsets of n, i.e.

Example 1 7.2.2

{ 1-£

·

z : z E

f-L(A)

=

lA

n

[o H - 1 ] I �H .

Here [0, Hif 1 ] {0, 1-£ , . . . , Hii l } and I A I is the internal number of elements. Now let J c D, J = [- Jr , Jr] with l E *N, Jr :::::: � and let (f-Lt)tE J be defined by f-Lt (A) f-L(A + t). Note that for any k E *N with fl E J we obtain: =

=

and

17.3. Differentiability of Loeb measures Now choose k E *N with fl

E

J and fl 0 and

::::::

243

0 and set A = [0, k}/ ] . Then f-L -k - f-L H-k H

(A)

1.

Hence f-L is not S-Fomin-differentiable. It is not hard to check that f-L is S Skorohod-differentiable with internal derivatives 1"1;'-1 for all infinitesimals t E J \ {0} . We will give a standard application of this example in Example 17.3.1 . Finally we consider S-differentiability with respect to *continuous func tions. We will see that this is equivalent to S-Fomin-differentiability. The following proposition can be easily shown by transfer of Urysohn ' s Lemma.

Let D be an internal *normal space, A the internal field of Borel subsets. If ( !-Lt ) t EJ is a curve of *regular measures, then f-L is S differentiable with respect to the set C of all internal * continuous S -bounded functions if and only if f-L is S- Fomin-differentiable. Proposition 1 7.2.2

17.3

Differentiability of Loeb measures

In this section we show how S-differentiability of an internal measure yields differentiability (in the sense of Definition 17.1 .1) of the corresponding Loeb measure. Recall the Standing Assumption on page 240. Now we can define a curve of Loeb measures in a unique way. Let c E rn:.+ as described in the Standing Assumption (page 240) , I =] - r:: , r:: [ . For each r E I choose t E J such that t :::::: r and set f-Lr := f-Lt· Let us denote the associated Loeb spaces by (D, L 11r (A) , (f-Lr )L) · Since the Loeb a-fields L 11r (A) are not necessarily identical we choose a joint a-field :F c n rE I Lllr (A) . We now define the curve ((f-LL)r)rE I of measures on :F by (f-LL)r := (f-Lr)L restricted to :F. Let us mention some obvious connections between S-differentiability and differentiability.

Let f-L be S -differentiable with respect to a set C of internal *JR:. valued, A-measurable and S -bounded functions on D and let 1-L' be an internal derivative of f-L · 1) Then for all f E C

Lemma 1 7. 3 . 1

rlim ---> 0

J_ ) d_ f-Lr_(w_)'--)--_ f (_ 0 (=J-_f_ -_ w_ ( w_)d_f-L_ ( w )) --0 -'(cc_ r

______

The convergence is uniform, if C is internal.

1 7. Differentiability of Loeb measures

244

2) Suppose C

:=

{g : D ----+ ffi. : there is an f E C with 0 ( f (w ) )

and

:F.,, :=

=

g ( w ) for all w E D}

(nr L.,r (A) ) n £.,, (A) . EI

Then the Loeb measure ILL , restricted to :F.,, , is differentiable with respect to the set C and the Loeb extension ( tl ) L of o ( tl ) , restricted to :F1L' , is a derivative (ILL ) ' of ILL . This means that for all g E C : J g (w)d(ILL)r (w) - J g(w )d�LL (w ) g (W )d( 1L' ) L (W ) . l. r1m --->0 r =

J

Note that, if IL' and 1 ' are two internal derivatives of IL, then J g(w)d(IL')L (w) = J g( w )d(r')L for all g E C, but the Loeb measures ( �L ')L and (r')L on A and hence also the a-fields £.,, (A) and £1, (A) may be different. In Example 17.2.2 we defined an internal counting measure IL· Since IL is S-Skorohod-differentiable with internal derivatives T, t E J \ { 0 } , t � 0, we can17apply Lemma 17 17.3 . 1 . But as we have seen in Example 17.2.2, for k E *N with E J and � 0 and A = [0, k}/]

Example 1 7. 3 . 1

( IL Hk 17- IL ) (A) L

=

IL =!5:. - IL 0 and ( HI'l ) (A) L

=

1.

Nevertheless, there exists a a-field, on which the Loeb measures ("1 �" ) L coin cide for all infinitesimals t E J \ { 0 } . Let B(ffi.) be the ( standard ) a-field of all Borel subsets of R For B E B(ffi.) let sC 1 [B] = {w E D : 0W E B} . According to the usual approach to standard Lebesgue measure by a nonstandard count ing measure ( see [ 7] or [ 12] ) , the external set st - 1 [B] is an element of L.,r (A) for all r E I and (ILr ) L ( s C 1 [B ] ) = vr ( B ) , where ur ( B ) = JB +r 1 [o . l j (x ) d>. (x ) (A) with standard Lebesgue measure >.. But st - 1 [B ] is also an element of L for all infinitesimals t E J \ { 0 } and PrP t

( ILt � IL )

L

( sC1 [B ] ) = 1B( 1) - 1 B (O) .

Hence the Loeb measures ( "1�" h , restricted to the a-field {sC1 [B ] : B E B(ffi.)}, coincide for all infinitesimals t E J \ { 0 } . If we define a measure v' on B(ffi.) by v'(B ) = 1 B ( 1) - 1B(O), then the S-differentiability of the internal counting measure tt yields the Skorohod-differentiability ( with respect to 1 E ffi.) of the standard measure v with derivative v'.

17.3. Differentiability of Loeb measures

245

We will now see that in the case of S-Fomin-differentiability the a-field :F doesn ' t depend on the chosen internal derivative and the derivative (ILL)' of ILL is uniquely determined. 1\Ioreover, the differentiability of ILL is true not only with respect to standard parts of internal functions, but also with respect to all :F-measurable S-bounded real-valued functions on D. We gather these facts together into the following, which is the main theorem of this paper.

Let 1L be S-Fomin-differentiable and :F = n rE I L11r (A). Then ILL is differentiable on :F with respect to the set C = { 1 B B E :F} and the differentiability is uniform on C. The derivative (ILL )' is uniquely de termined and is absolutely continuous with respect to ILL . If IL' is an internal derivative of IL, then the Loeb extension (IL')L is defined on :F and coincides with (ILL)' . In particular this is true for all internal measures � , where t E J \ { 0} is infinitesimal. Theorem 1 7. 3 . 1

:

Let IL' be a derivative of IL· We will show the following statements: (A) For all internal sets A E A the limit

Proof.

(ILL) -'--- ( A)--- -' ----�-'--LL (A) ---'-- - --'-ll. m --"-----r---'r--> 0 r exists and is equal to ( �L')L ( A) . The convergence is uniform on the inter nal field A. (B) If N E :F is a ILL-nullset, then (ILL) r (N) - �LL (N) lim = 0. r r--->0 The convergence is uniform for all ILL-nullsets of :F. Any ILL-nullset of :F is also a (IL')L-nullset of :F. (C) If B E :F and if A E A is ILL-equivalent to B, then (ILL) r (A) - ILL ( A) (ILL) r (B) - ILL ( B) = lim lim ' r--->0 r---> 0 r r in particular the left limit exists. The convergence is uniform on :F. (D) The Loeb extension (IL')L is defined on :F and for all B E :F

(A) follows from Lemma 17.3.1. To prove (B) let N b e a ILL-nullset of :F, i.e. there exists a sequence (N� )n EN C A so that for all n E N we have N C N1_ , Nn� c:;; N� and 1

1 7. Differentiability of Loeb measures

246

p,(Nl ) :<:: � · Let N all r nE I

:=

n�=l N� . Then N E :F, p,L (N)

=

0 and we obtain for

Since p,L (N) = 0 we get

Therefore it is sufficient to show that lim0 (!n ); (JV) r---+

=

0. Now since p,L (N)

=

0,

r

limn_, = ( !1L ) r ( N� ) - limn_, = /1L ( N� ) r

( !1L) r (N� ) - !1L (N� )

lim

" It follows from (A), that for each E N the limit limr---+ O '\ exists and is equal to (p,')L (N� ). Since (p,')L is defined on the smallest a-field containing A, the limit limn ---+ = (p,')L (Nl ) (p,')L (N) also exists. Because of the uniform convergence stated in (A) and since (N� )n EN C A we can exchange the limits as follows: (11L)r( N 1 )-pL( N 1 )

n

=

lim nlim --+CXJ r--+ 0

(p,L) r (Nl ) - p,L (N1_ ) n

r

lim lim r--+ 0 n--+CXJ

(p,L) r (N� ) - ttL (N� )

n

So the measure /1L is differentiable at N and the value of the derivative is (p,')L (N) . It remains to show that (p,')L (N) = 0. To this end we usc an argument due to Smolyanov and Weizsacker [15] . Let us consider the function

Then f is differentiable at t = 0 and f'(O) = (p,')L (N) . Since f is nonnegative and j(O) = 0, the first derivative of f in 0 must be 0. Hence (p,')L (N) = 0. The uniformity of the convergence can be seen by using again the uniform convergence on A. Of course (t/)L (N) = 0 since N C N. Hence the P,L nullsets of :F are also (p,')L-nullsets of :F.

17.3. Differentiability of Loeb measures

247

(C) Here we show the differentiability for an arbitrary element of :F. So let B E :F, A E A ILL-equivalent to B and r E I. Since ILL (B ) = ILL (A) , it is sufficient to show that (ILL )r ( B ) - ( ILL)r(A) O. rlim ---> 0 r Since ILL is nonnegative the following estimate is easy to verify. =

where A 6 B denotes the symmetric difference ( A \ B ) U (B \ A ) . Now (B) yields

I

(ILL )r(B) - (ILL)r( A ) rlim ---> 0 r

I

< lim - r---> 0

(ILL)r( A 6 B) I I T

=

O.

The uniform convergence follows from (A) and (B). Hence (C) is proved. (D) follows from (A), (B) and (C).

D

Let IL be a measure with internal Lebesgue density satisfying the assumptions of Lemma 17.2.1 for a fixed y E *R Recall that the internal curve is given by ILt( A ) IL(A + ty) for all Borel subsets A C *R Let c E rn:. + , I =] - c, r:: [ . Since IL is S-Fomiu-differentiable, we can apply Theorem 17.3.1. Hence ILL is differentiable on :F = n r EI L ,lr (A) with respect t o c = { 1B B E :F}. Moreover, for the uniquely determined derivative (ILL)' we obtain: Example 1 7.3.2

=

for all B E :F, where (3� is defined in Lemma 17.2.1 The power of Fomin-differentiability is also shown in the last result. Corollary 1 7. 3 . 1 If IL is S -Fomin-differentiable with an internal derivative IL' and :F is defined as in Theorem 1 7. 8. 1, then the Loeb measure ILL , restricted

to :F, is differentiable with respect to the set C of all :F -measurable real-valued bounded functions on D. The Loeb measure ( �L')L, restricted to :F, is the deriva tive of (IL )L .

Proof.

This is routine integration theory using the uniform differentiability on

:F and the approximation of measurable functions by simple functions. D Remark An application of Fomin-diffcrcntiability is given in [15] by Smolya

nov and Weizsacker. There, a nonnegative measure v on a locally convex space

1 7. Differentiability of Loeb measures

248

E is considered which is Fomin-differentiable along all elements of a Hilbert subspace of E. Using this measure differentiability, Smolyanov and Weizsacker introduce an operator on a subspace of £ 2 (E, v) . In the Gaussian case this operator is the derivative operator of the Malliavin calculus (see [11]). The question of the assumptions that are needed for differential properties of Loeb measures to yield such a derivative operator is the subject of ongo ing investigation. References

[1]

A IGNER , Differentiability of Loeb measures and applications, PhD the sis, Universitiit Munchen, (in prep.) E.

[2] S. ALBEVERIO, .J . E . FENSTAD, R. H 0EGH- KROHN and T. LINDSTR0M, Nonstandard Methods in Stochastic Analysis and Mathematical Physics, Academic Press, New York, 1986. [3] V.I.

AVERBUKH, O . G . S MOLYANOV and S .V. FOl\1IN, "Generalized func tions and differential equations in linear spaces, I. Differentiable mea sures", Trudi of Moscow Math. Soc., 24 (1971) 140-184.

[4] V . I .

" Differential properties of measures on infinite dimen sional spaces and the Malliavin calculus", Acta Univ. Carolinae, Math. Phys., 30 (1989) 9-30. BOGACHEV,

[5] V.I.

BOGACHEV, "Smooth Measures, the Malliavin Calculus and Approx imations in Infinite Dimensional Spaces", Acta Univ. Carolinae, Math. Phys., 3 1 (1990) 9-23.

[6] V . I .

BOGACHEV, "Differentiable Measures and the Malliavin Calculus", Journal of :tvlathematical Sciences, 87 (1997) 3577-3731 .

[7] N .

CuTLAN D,

Loeb Measures in Practice: Recent Advances, Lecture Notes

in Mathematics 1751 , Springer-Verlag, 2000.

[8] N .

CUTLAND and S .-A. N G, A nonstandard approach to the Malliavin calculus, in Applications of Nonstandard-Analysis to Analysis, Functional Analysis Propability Theory and Mathematical Physics (eds. S. Albeverio, W.A . .J. Luxemburg and M. Wolff) , D.Reidcl-Kluwer, Dordrecht. 1995.

[9] S. V. FoMIN, Differential measures in linear spaces, in Proc. Int. Congr. of Mathematicians. Sec. 5, Izd. Mosk. Univ. , Moscow, 1966.

249

References

[10]

A .V. K IRILLOV , "Infinite-dimensional analysis and quantum theory as semimartingale calculus", Russian Math. Surveys, No. 3 (1994) 41-95.

[11]

D. NuALART, The Malliavin Calculus and and its Applications, Springer-Verlag, 1995.

[12]

H. O sswALD, Malliavin calculus tion, Book manuscript, 2001.

[13]

A.V. S KOROHOD, Integration in Hilbert Spaces, Ergebnisse der Mathe matik, Springer-Verlag, Berlin, New-York, 197 4.

[14]

O . G. SMOLYANOV and H.V. WEIZSACKER, "Formulae with logarithmic derivatives of measures related to the quantization of infinite-dimensional Hamilton systems", Russian Math. Surveys, 551 (1996 ) 357-358.

[15]

O . G . S l\!OLYANOV and H . V . WEIZSACKER, " Smooth probability mea sures and associated differential operators", Inf. Dim. Analysis, Quantum Prob. and Rcl. Topics, 2 (1999) 51-78.

[16]

H .V . WEIZSACKER, Differenzierbare Maj3e und Stochastische Analysis, 6 Vortragc am Graduiertenkolleg 'Stochastische Prozesse und probabilistis che Analysis' . Januar/Februar, Berlin, 1998.

Related Topics,

Probability

in abstract Wiener spaces. An introduc

The power of Gateaux differentiability

Vftor

N eves*

Abstract

The search for useful nou standard minimil:ation conditions on C1 func tiona]s defined on Banach spaces lead us to a very shows that

if a C1 function .f

:

E

--+

simple argument

which

F between Banach spaces is actu

finite points along finite vectors, then it iS UJlllOrll!J.y Conti nUOUS On bounded SetS il' and ouJ.y if it is Jipschitziau

ally GatealL'C differentiable on

on bounded sets. The following is a development of these ideas starting from locally convex spaces.

18. 1

Preliminaries

This section consists of

an

informal description of Lools to fi·ame the dis

course and therefore contains only a small number of theorems and few proofs. Ccu·eful foundational treat me nts may be found in [3] , [9[, [14[ or [ 1 [ ; we shall also use Nelson's quantifiers vst an d 3st as in [ 8] . 'vVe assume the existence of two set-theoretical structures A an d f3 and a function *(·) : A -+ B satisfying the properties we proceed to describe. Elements of eit:her of t he structures which are sets shall be called entities 1 •

(Pl)

A is a model of the Televant A nalysis in the sense that any object of Classical IVIathematical Analysis as well as the mathematical structures under s t udy have an interprctat.ion in A and theorems about them are true in A. In particul m the complete ordered field lR of real numbers or any other classical space cu·e elements of A.

•Departamento de Matemat;ica, Universidade de Aveiro. vneves®mat . ua . pt

Work for this article was partially supported by FCT via both the grant POCTI\ MAT\41683\01 and funds from the R&D unit CEOC. 1We shall also arlmit the existence of atoms i.e. objects wit.hout clements which arc not the empty set; for instance numbers are atoms.

18. The power of Gateaux differentiability

254

The image by *(-) of any given element a E A will be denoted *a and be called standard as well as a itself; elements of standard sets in B shall be called internal ; non internal sets in B are called external. A set of sets verifies the Finite Intersection Property, abbreviated f.i.p. , if all its finite subsets have non-empty intersection. Denote the cardinal (num ber of clements) of a set S by card(S) and its power set by P(S) . (P2) Polysaturation 2

Given E E A and C C *P(E), if C verifies the f. i.p. and card( C) card(A) then n C # 0

<

There is an encompassing formal first order language £ with equality = and E (to be read as the membership relation) , with the usual connectives ' for negation, 1\ for conjunction, V for disjunction, =? for implication, ¢? for equiva lence, universal V and existential ::J quantifiers and enough constants, predicate and function symbols to denote any elements, relations and functions under consideration either in A or in B. Say that a formula of £ is bounded if its quantified subformulae, hereby including the formula itself, are of the form

Vx [x E a =? ¢]

or

::Jx [x E a 1\ ¢]

for some constant a and formula ¢; a sentence is a formula without free variables. B contains a formal copy of A in the following sense. (P3) Transfer

If ¢( a1 , , an) is a bounded sentence with occurrences of constants ai ( 1 :<.:: i :<.:: n) and no more constants, then ¢( a 1 , , an) is true in A iff ¢(*a 1 , , *an) is true in B 3 . ·

·

·

·

·

·

·

·

·

It is always useful to keep in mind theorems 18. 1 . 1 and 18. 1.2 below. Say that a formula of £ is standard (resp. internal) if it is bounded and all its constants denote standard (resp. internal) elements of B. Theorem 18. 1 . 1 (Principles of Standard and of Internal Definition) A set B E B is standard (resp. internal) iff there exists a E A and a standard (resp. intemal) formula ¢ such that B = {x E *al ¢(x) } or, more informally,

a set in B is standard or internal iff it is definable by a respectively standard or internal formula. 2 Actually

formulate.

this is stronger than needed for our purposes, but it is powerful and easy to

38 is a kind of elementary extension of A with embedding *( · ) .

18.1.

255

Preliminaries

Call monad any intersection

n cEC *C with 0 # c E A.

Any internal set which contains a monad n cEC *C also contains one of the standard sets *C.

Theorem 1 8 . 1 . 2 (Cauchy's Principle) Hyper-real numbers

A E *JR:. are classified the following way

A is finite . A is

-

infinitesimal . A is infinite .

-

::lr E lR:. lxl .:::; *r Vr E lR:. [r > 0 =? lxl A is not finite.

.:::;

*r]

0 and f-L denote respectively the set of finite and infinitesimal hyper real numbers. For any given ( real ) locally convex space S. let fs denote a gauge, i.e. a directed set, of semi-norms defining the topology of S; fin(*S) and f-L(*S) shall denote respectively the sets of finite and infinitesimal elements in *S, i.e. , for any given x E *S,

V1 E fs r(x ) E 0 V1 E fs r (x ) E f-L

x E fin(*S) XE

f-L(*S)

(18. 1 . 1) (18. 1 . 2)

When rs is unbounded, in the sense that Vx E

S fs (x) .- { r (x ) l r E fs}

is unbounded,

(18.1.3)

it so happens that Vx E

*S [x E f-L(*S)

¢?

vst l E fs r(x)

_:::;

1] .

(18.1 .4)

Condition (18.1 .4) has the syntactical advantage of dispensing with a quanti fier; as we will present strongly syntactical reasoning,

we assume from now on that condition ( 18. 1 .3) is always verified. Denote uc the set of standard elements of *C, i.e.,

uc

:=

{*xl x E C}.4

If E and F are locally convex spaces, L ( k ) (E, F) denotes the space of continuous k-linear maps l : E k ---+ F; a map l E *L ( k ) (E, F) is said to be finite ( resp. infinitesimal) if l(fin(*E) k ) � fin(*F) ( resp. l(fin(*E) k ) � tt(*F) ) . I t is useful to keep in mind the following results ( [4] is a most complete source; also see [14, chap. 10] ) . 4

We sometimes identify C with "C for the s ake of simplifying notation.

18.

256

The power of Gateaux differentiability

In a locally convex space S, 1. x E f-L(*S) if and only if \f).. E 0 Ax E f-L(*S) 2. x E fin (*S) if and only if \f).. E f-L Ax E f-L(*S)

Theorem 1 8 . 1 . 3

3.

(x E *S) .

( x E *S) . Vx E f-L(*S) ::3 ).. E * ffi. \ 0 Ax E f-L(*S), therefore i t is also true that Vx E *S [x E f-L(*S)

¢?

::3 ).. E * ffi. \ 0 Ax E f-L(*S)]

Let x ::::l y mean that x is infinitely near y, i.e., denote standard part whenever appropriate, i.e . ,

y

= st (x)

:=

y E us & x

::::l

x - y E f-L( *S) ; also let st y:

ns (*S) will denote the set of near-standard vectors of ns (*S)

*S, i.e. ,

: = us + f-L( *S) .

f : *E ---+ *F is S-continuous on A c:;: *E if for all x, y E A , whenever x ::::l y ; by definition, any standard function f is S-continuous if *f is. Theorem 18. 1 .2 implies that S-continuity of internal functions is nothing else than continuity measured with standard tolerances; this and condition (18. 1 .3) imply: A function

f(x)

::::l

f(y).

Suppose A E *P(E) . An internal function f A ---+ *F is S-continuous at x E A iff Vv E U fp Vc E U ffi. ::lr E u rE ::36 E Uffi. Vy E A [r(x - y) < 6 =? v(f(y) - f(x)) < c] .

Lemma 18. 1 . 1

In particular

A function f : A c:;: E ---+ F between locally convex spaces E and F is 1. continuous (on A) iff it is S-continuous on *A n ns (*E) ; 2. uniformly continuous (on A) iff it is S-continuous on *A.

Theorem 18.1.4

And

Given locally convex spaces E and F and an internal *k linear function l *E k ---+ *F, the following conditions are equivalent 1. l is S-contmuous.

Theorem 1 8 . 1 . 5 :

18.2.

257

Smoothness

2. l ( fin (*E)) c:;: fin(*F) (i. e. the internal S-continuous maps are the internal finite maps). 3. l (tL(*E) ) c:;: tL(*F) . It might also be interesting to notice the following Theorem 1 8 . 1 . 6 Let E and F and ¢ : (A x fin (*E)) ---+ *F be an

be locally convex spaces, A be a subset of *E internal function which is linear in the second coordinate. If ¢ is S-continuous in the sense that Vx. y E A Vu, v E fin( *E) [x :::::J y & u :::::J v =? cjJ(x, u) :::::J cjJ(y, v)] , then, for all x E A , ¢(x, ·) is a finite map. Proof. Assume ¢ : (A x fin (*E)) ---+ *F is S-continuous, x E A c:;: *E and that u E fin (*E) ; for all t E fL , tu :::::J 0, therefore t¢(x, u) ¢(x, tu) :::::J ¢(x, 0) 0, hence, by theorem 18.1 .3, ¢(x, u) E fin(*F) as required. D Observe that, when S is normed, an element x E *S is infinitesimal (resp. finite) iff ll x ll E fL (resp. llxll E 0) and the following holds. =

=

Corollary 18. 1 . 1 Given normed spaces E and F and a k-linear function l : E k ---+ F, the following conditions are equivalent 1. l is continuous 2. Vx E *S [ llxll :::::J 0 =? ll l (x) ll :::::J O J 3. Vx E *S [ ll xll E 0 =? ll l(x) ll E 0 ] . Of course theorem 18. 1.5 and corollary 18. 1.1 essentially say that , for linear maps continuity is equivalent to continuity at zero.

18.2

Smoothness

Let E and F be complete locally convex spaces, A be an internal subset of *E, f : A ---+ *F and l ( - ) : A ---+ *L(E, F) be internal and write f (x + tv) = f(x) + tl x (v) + tL (x E A , v E *E, t E *ffi., L E *F) . (18.2. 1) :::::J

f i s GS-differentiable at x with GS-derivative lx, i f L 0 whenever v E fin (*E) and t :::::J 0; f is uniformly differentiable at a E *E with uniform derivative l () if f is GS-differentiable at x and lx is finite, for all x :::::J a. We shall also call l : A x fin (*E) ---+ *F the derivative of f and define (x, v E *E) . df(x, v) := l(x, v) : = lx (v) := dfx (v)

18.

258

The power of Gateaux differentiability

Observe that , when one takes x and v in E within equation (18.2. 1), the GS derivative df ( x. v) is just the Gateaux derivative ( G-derivative for short) of f at x along v. The following theorem is well known too (see [13] ) .

Theorem 18.2.1 If E and F are Banach spaces, the function f ---+

:

E

---+

F

has G-derivative df(x, y), for all x E E, and x df(x, ) : E ---+ L (E, F) is continuous for the uniform topology on L (E, F), then x ---+ df(x, ) is actually a Fnichet derivative. ·

The following is a simple generalization of proposition

·

2.4 in [15] .

Theorem 18.2.2 If the internal map f : *E

at all points of an internal set A E 1.

*P( E)

---+ *F is uniformly differentiable with derivative df, then

f is S-continuous on A

2. df : A x fin(*E) ---+ *F verifies (a) �f (A x fin(*E)) c:;: fin (*F) (b) df is S-continuous in the sense that Vx, y E A Vu, v E fin ( *E) [x :::::J y & u :::::J v =? dfx (u) :::::J dfy (v)] . Proof. Borrowing from [15] : suppose that f : *E ---+ *F is internal and uni formly differentiable at all points of the internal set A, that x, y E A and that x :::::J y; take A E *ffi. \ 0 such that .A(x - y) E f-L ( *E) (theorem 18.1 .3) and define t := ±; t is infinitesimal and, for some L E f-L(*F),

f(y + t.A(x - y)) - f(y) tdfy (.A(x - y)) + tL dfy (x - y) + tL E f-L(*F)

f(x) - f(y)

by theorem

18.1 .5,

and

1 is proven .

2 (a) is an immediate consequence of the fact that the maps dfx are finite and this is useful in proving S-continuity of df in (b) : take x, y E A and u, v E fin(*E) such that x :::::J y & u :::::J v; observe that df(x, u) - df (y, v)

=

dfx(u - v) + dfx (v) - dfy (v).

As the maps dfx arc finite, by theorem 18.1 .5, df(x, u - v) E f-L(*F) and it is enough to show that, under the present assumptions, dfx (v) - dfy (v) :::::J 0.

18.2.

259

Smoothness

Now, taking A and t as above and v E fin ( *E ) , there exist infinitesimal vectors L. rJ, J E *F such that

djy (x - y) + t L ( 18.2.2) f(x) - f(y) (18.2.3) tdfx(v) + tr] f(x + tv) - f(x) tdjy (v) + djy (x - y) + tJ. (18.2.4) f (y + t(v + .A(x - y))) - f(y) Summing (18.2.2)+( 18.2 .3)-(18.2.4), we obtain t(dfx (v) - djy (v)) = t ( L + rJ - 6) and dfx (v) - djy (v) :::::J 0 as required. D Moreover Stroyan also showed in [15] that uniform differentiability is a very good generalization of G-differentiability in that, among other properties, it overcomes the impossibility of topologizing L(E , F) in such a way that the eval uation map (x, y) f---+ l(x) : L(E, F) x E ---+ F be continuous when E and F are not nonnable locally convex spaces 5 ; we established the equivalence between uniform and Cqb differentiability of standard functions on complete locally con vex spaces, and discuss relations with a weaker kind of differentiability when fin ( *E ) = ns ( *E ) , in [12] . The main results in this context read as follows. Assume that U is a non-empty open subset of E, and k E N.

:

f U ---+ F, a E 17U + f-L (*E)

Definition 18.2.1 (Stroyan [15] ) The function f is k-uniformly differen tiable at a if there are finite maps d ( i ) f() U ---+ L( i ) ( E, F) , ( 1 <:.:: i <:.:: k) - the derivatives of f - such that, with d d ( l ) , for all x E fin ( *E ) and all t E f-L(*IR'.) , there exist rJ E f-L(*F) and infinitesimal maps T/z (2 <:.:: i <:.:: k), :

:=

so that

=

(18.2.5) f(a) + tdfa (x) + tr] f (a + tx) + l ( ( ) ) ) ( d z fa (·) + td z fa (x , · ) + tr]i ( - ) (1 <:.:: i < k) . (18.2.6) d " fa + tx(·) f is k-uniformly differentiable in U if it is k-uniformly differentiable at all a E 17 U + f-L(*E) . When f is 1-uniformly differentiable we just say that it is uniformly differentiable. The following may also be found in [15] . Theorem 18.2.3 When f is k-uniformly differentiable, not only .f itself is S-continuous but also the derivatives d ( z ) f ns ( *E ) x fin ( *E ) ---+ *F are S cordinuous: Vx, y E ns ( *E ) Vu, v E fin ( *E ) [x :::::J u & y :::::J v =? d (i ) fx (u) :::::J d( i ) fy (v)] (1 S: i <:.:: k) . =

:

5A

quite simple and clear explanation of this can be found in

[7,

page

2] .

18.

260

The power of Gateaux differentiability

Theorem 18.2.4 ( Taylor's formula [15]) f is k-uniformly differentiable in U if there are maps dj ( i) U ---+ L � '? (E, F) (1 <:.:: i <:.:: k), such that, whenever :

a E au + f-L(*E), x E fin(*E), t E f-L, there exists rJ E f-L(*F) such that k i j(a + tx) = f(a) + L �t df�i ) (x, · · · , x) + t k rJ. i= l z.

Relations with F-differentiability on Banach spaces are actually studied in [14, chap. 5] . On what regards standard definitions of smoothness we take from [6] , whereto we refer the reader for details, namely on convergence

structures.

Definition 18.2.2 Let N denote the filter of neighborhoods of zero in JR:., B be

a filter on E and, for any filter :F in a (real) vector space, N:F denote the filter generated by { {rb l r E N & b E F} l N E N & F E F } ; also, for k E N, let B k be the filter in E k generated by {B 1 x · · · x Bk l B1 , · · · Bk E B} and, for any filter :F in L ( k) (E, F), F(B k ) be generated by {U 1 E F l (B) I F E :F & B E B}; finally, if¢ is afunction and :F is a filter on the domain of¢, ¢(F) is generated by {¢(F) I F E :F} . B is quasi-bounded if NB converges to zero; Aqb denotes the conver gence structure of quasi-bounded convergence; when a filter :F converges to x E E with respect to Aqb, we write :F E Aqb (x) . 2. A filter :F in L ( k) (E, F ) qb-converges to zero if :F(Bk ) converges to zero in F whenever B is quasi-bounded in E. 3 . A junction l : U ---+ L ( k ) (E, F) is qb-continuous if l(:F) E Aqb(l(a)) whenever a E U and :F is a filter in E converging to a. 1.

4.

Recall that U is an open subset of E and f : U ---+ F. We say that f is

of class c;b if

(a) f is qb-continuous. (b) There exist maps d'i( . ) : U ---+ L ( z ) (E, F) (1 <:.:: i <:.:: k) such that i. d' i( . ) is qb- continuous, for all i = 1 , · · , k ii. For all a E U, all x E E and all i = 0, · · · , k - 1 ·

i = d + 1 fa( x, · ) lim t --> 0 �t ( d'fa+ tx - Ja) ' for the topology of pointwise convergence on L i (E, F) .

18.3.

261

Smoothness and finite points

Theorem 18.2.5 ([12] ) The function f is of class c;b , with derivatives di i( . ) , ( 1 :S i :S k) iff it is k-uniforrnly differentiable with the sarne derivatives. The following is also a consequence (for example see theorem ( 18.2.5) or of theorem 18.2.4.

[6] )

either of this

:

Theorem 18.2.6 The function f E ---+ F, between Banach spaces E and F, is of Frechet class C k , with derivatives di fc ) , ( 1 :S i :S k) iff it is k-uniforrnly

differentiable with the sarne derivatives.

18.3

Smoothness and finite points

Recall that E and F denote real locally convex spaces.

Lemma 18.3.1 Suppose A E *P(E) . The following conditions are equivalent for any internal rnap f A ---+ *F which is GS-differentiable at all x E A n fin(*E), with GS-derivative df : (A n fin(*E)) X fin(*E) ---+ *F. 1. f is uniformly differentiable at all x E A n fin (*E) 2. f is S-continuous on A n fin (*E) 3. df( (A n fin(*E)) x fin(*E)) C:: fin(*F) :

Proof. For the sake of simplicity, we assume A = *E; the proof of the general case is an easy adaptation thereof. Suppose that df : fin(*E) x fin(*E) map f : *E ---+ *F.

---+

*F is the GS-derivative of the internal

(1 =? 2) This is 1 in theorem 18.2.2 above. (2 =? 3) Assume that f is S-continuous on fin(*E), let df() : fin(*E) ---+ *L(E, F) be the GS-derivative of f. Pick x and v in fin(*E); we must show that dfx (v) is finite; suppose this is not the case, take a semi-norm 1 E fp such that r(dfx (v)) t/c 0 and let t := r(df: ( v ) ) ; t is infinitesimal and, by hypothesis, there exists L E f-L(*F) such that f(x + tv) = f(x) + tdfx (v) + tL; but then 1 r(tdfx(v)) I (f(x + tv) - f(x) - tL) � 0 =

=

which is impossible; it follows that dfx (v) must be finite as required. (3 =? 1) In the presence of GS-differentiability, 3 completes the definition of uniform differentiability. D Say that a standard function f : E ---+ F is GS, or uniformly, differentiable at x E *E with derivative dfx if respectively *f is GS, or uniformly differentiable, at x with derivative *df.

18.

262

The power of Gateaux differentiability :

Theorem 18.3.1 If a standard function f E ---+ F is GS-differentiable with standard derivative dfx at all x E fin(*E), the following conditions are equivalent 1.

f is uniformly differentiable on bounded sets with derivative df .

2. f is umformly continuous on bounded sets. 3.

For any bounded subset B of E2 , df(B) is bounded in F.

4.

df is uniformly continuous on bounded subsets of E2 .

As we said before, the proof is easy. Start by keeping in mind that

Theorem 18.3.2 A subset B of the locally convex space S is bounded iff *B C:: fin ( *S ) ; in particular, if E and F are normed and L(E: F) is endowed with the usual uniform norm, a subset B of L(E, F) is bounded iff

V¢ E *B Vx E fin(*E) ¢(x) E fin(*F).

(18.3.1)

Proof of thm. 18.3 . 1 . Equivalences (1 ¢? 2 ¢? 3 ) are immediate conse quences of lemma 18.3.1 and theorem 18.3.2 (note that , by theorem 18.3.2 above. *B n fin(*E) = *B if and only if B is bounded) .

(3 =? 4) Follows from theorems 18.3.2 and 18.2.2. ( 4 =? 2) If we show that condition 4 implies that dfx is a finite map when ever B is a bounded subset of E and x E *B, we are done; but this follows from theorem 18.1 .6. D As an application to Banach spaces: :

Corollary 18.3.1 Let E and F be Banach spaces and f E ---+ F be a S continuous and GS- differentiable function at all x E fin (*E) with standard

denvatzve df, then 1.

df zs actually the Fnichet derivative of f and f is of class C 1 .

2. (Mean Value) Denotmg [x, y] the line segment from x to y, Vx, Y E E 3z E [x, y] ll f(x) - f(y) ll :::; ll dfz ll · ll x - Y ll ;

(18.3.2)

therefore f is Lipschitzian on bounded sets, i.e., for each bounded subset B of E, there exists K E rn:. such that Vx, y E B ll f(x) - f(y) ll :::; K ll x - Y ll .

(18.3.3)

263

18.4. Smoothness and the nonstandard hull

For all infinitely near finite vectors x, y, z E *E , there exists an infinites imal L E *F such that

3.

f(x) - f(y)

=

dfz (x - Y ) + ll x - Y IIL

( 18.3.4)

Proof. 1 . As f is S-continuous on fin(*E) . it is uniformly continuous on bounded sets and hence, by theorem 18.3. 1 , of Frechet class C 1 . 2 . Condition (18.3.2) is true in view of part 1 above; therefore condi tion ( 18.3.3) is true (by theorems 18.3.1 and 18.3.2) because fin(*E) is *convex. 3 . We borrow from [14, page 97] : the Frechet derivative x r--+ dfx is S-continuous at finite points by 2b in theorem 18.2.2 and we may apply Trans fer of the Mean Value condition ( 18.3.2) to the function

g

='

V r--+

J ( V ) - dfz ( V - ) Z

,

whose derivative v r--+ dgv = dfv - dfz is infinitesimal whenever therefore verifies, for some c E [x , y] , hence c :::::J z,

II J ( x ) - J ( y ) - dfz (x - y ) ll ll x - Y ll 18.4

=

ll g(x) - g(y) ll

s;

v :::::J

l dgc ( l l xx -- yY l l ) I :::::J

z, and

0.

D

Smoothness and the nonstandard hull

This section evolves along main ideas due to Manfred Wolff. Recall that · :::::J is an equivalence relation on *S, with equivalence classes x := x + f-L(*S) and let S be the nonstandard hull of *S with semi-norms i' or •

S

:=

fin( *S ) ;

�

& ry (x)

:=

st ( r ( x ) ) (x E fin(*S) ) .

Nonstandard hulls are complete spaces, but may vary with the chosen model of analysis A: actually Banach spaces with invariant nonstandard hulls are finite dimensional, but this is not the case with locally convex non-normable ones; spaces with invariant non-standard hulls were characterized in [5] and were named HM-spaces by Keith Stroyan (relevant data is summarized in detail in [14, chap. 10] ; also see [10, sec. 3.9] for nonstandard hulls of metric spaces) . Any internal function between locally convex spaces. f : A C:: *E r--+ *F that is S- continuous on fin(*E) n A and such that J (fin(*E) n A) C:: fin(*F) has a natural non-standard hull j : C:: E ---+ P too defined by

A

A ](x)

{x l x E fin(*E) n A } j(x) (x E A n fin(*E)) ;

18.

264

The power of Gateaux differentiability

note that j is not a standard function in the sense we have been considering, for it certainly is not a priori an element of the model of analysis A, nevertheless an application of Nelson's Algorithm ( [8] , [11]) will provide us with definitions of differentiability for functions without reference to nonstandard extensions. From now on f : *E ---+ *F denotes an internal GS-differentiable function with derivative df such that

f ( fin(*E))

C:::

(18.4. 1 )

fin(*F)

and

(18.4.2)

whenever this is a good definition, namely when df is S-continuous on fin(*E) and x, v E fin(*E) . For any GS-differentiable function h : E ---+ F ,

t:.. ( h x , v, t) 18.4.1

:=

( � (h(x + tv) - h(x) - dh(x v) ) ) ,

(x, v E *E; t E * ffi. \ {0}) .

Strong uniform differentiability :

Definition 18.4 . 1 An internal function f *E ---+ *F is Strongly Uni formly Differentiable (SUD for short) if it is uniformly differentiable on fin(*E) and f(fin(*E)) C::: fin(*F) . Let P and In these terms, ously verified

Q f

denote respectively gauges of semi-norms for E and F. is SUD when the two following conditions are simultane

\i(x, v) E *E 2 [(x, v) E fin(*E) 2 =? df(x, v) E fin(*F)] \i(x, v) E *E2 \it E * ffi. [(x, v) E fin( *E) 2 1\ 0 cj. t E f-L =? t:.. (j , x , v, t) E f-L ( *F ) j ; these expand respectively to

\i(x, v) E *E2 [vstp E *P 3 8t m E *N p(x) + p(v) � m =? \i8t q E *Q3 8t n E *N q ( df(x , v)) � n] \i(x, v) E *E 2 \it E 'R vstp E *P 3 8t m E *N p(x)+p(v) � m 1\ \ist n E *N 0 < i t I � � =? \ist q E *Q q(t:.. ( f, x, v, t)) � 1 ;

[

]

which reduce respectively to the following, where we leave the domains implicit,

\i(x, v) vst q 3 8t (p, n) vst m [p(x) + p(v) � m =? q (df(x, v)) � n] \i(x, v, t) \i8t q 3 8t (p, n) \i8t m p (x) + p(v) � m

[

1\ 0 < l t l

�

� =? q(t:.. ( j , x , v , t)) � 1] ;

18.4.

265

Smoothness and the nonstandard hull

Nelson's algorithm then gives

\j8t q \18t m 3 st fzn p x N V(x, v) 3(p, n) E P x N [p(x) + p(v) � m(p, n) =? q (df(x, v)) � n ] \18t q \18t m 3 st fin p x N V(x, v, t) 3(p, n)

[

p(x) + p(v) � m(p, n)

1\

0 < ltl

(18.4.3)

PxN

E

1

� ; =?

q( !:::. (j, x, v, t)) � 1 ] .

(18.4.4)

Formulas (18.4.3 ) and (18.4.4) together define strong uniform differentia bility of an internal function and we actually proved the following

Theorem 18.4. 1 An internal GS-differentiable function f : *E ---+ *F is S UD

iff

\j8t q \18t m 3 st fin P x N V(x, v, t) 3(p, n)

[ [p(x) + p(v)

�

m(p, n)

1\

0 < ItI

�

E

PxN

� ] =?

[ q (df(x, v))

�

n

1\

q (!:::. ( j, x, v, t) )

�

]

(18.4.5)

1] :

When f is standard, transfer of ( 18.4.5) provides the definition, 18.4.2 below, of SU differentiability for functions between locally convex spaces in any "universe" A:

Definition 18.4.2 A standard function h : S ---+ T between locally convex spaces S and T, with unbounded gauges of semi-norms respectively I: and 8 , is Strongly Uniformly Differentiable, with derivative dh ( ) : S ---+ L(S, T) , (S UD for short) when, leaving implicit that a E I:, T E e, m : I: X N N , x E S, v E S and t E JR \ { 0}, --->

VT Vm

::J f z n c

[ [a (x) + a(v)

X

N c:;: I:

�

X

m(a, n)

e

V(x, v, t) 3 (p, n)

1\

0 < ltl

�

Ec

X

N

� ] =?

[T (df(x, v)) � n

1\

T( !:::. (h, x, v, t))

�

]

(18.4.6)

1] .

NB: variants of this formula where any inequality � is taken to be

strict, i.e., to be < , are equivalent.

18.

266

18.4.2

The power of Gateaux differentiability

The non-standard hull

Theorem 18.4.2 Let E and F be locally convex spaces and f : *E ---+ *F be

an internal GS-differentiable function with derivative df such that f ( fin(*E))

( 18.4. 7)

� fin(*F) .

f is SUD iff j : E ---+ F is uniformly differentiable with derivative dj : E2 ---+ F given by d](x , v) := djr;;:; ) . Proof. Let us first assume that the internal function f : *E ---+ *F is SUD with derivative df, and takes finite vectors of *E into finite vectors of *F. Define j3

Q := { I] I q E Q } ;

:= {P I p E P}

although there might exist continuous semi-norms not of the type ,-y, P and Q arc unbounded directed families of semi-norms, so that condition ( 18. 1 .4) still holds with obvious adaptations. Also recall that

](x) = f(x) and observe that equation

defines dj well, in view of theorems 18. 1 . 5 and Take q E Q and rrL : Q x N ---+ N . Define

M(p, n)

:=

m(p, n)

18. 1.6 and lemma 18.3. 1 .

(p E P, n E N) ;

M is a standard function. therefore we may deduce from exists a standard finite set P x N � *P x 'N such that

(18.4.5)

that there

V(x, v, t) E *E2 x 'R 3(p, n) E P x N

[p (x) + p(v) < M(p, n) 1\ 0 < l t l

:<::

�

=?

]

q(df(x, v)) :<:: n 1\ q( !:::. (f, x, v, t)) < 1 . P and N being standard and finite, N

:= N

'

p

define

:= {P I p E P} ,

267

18.4. Smoothness and the nonstandard hull

and take (x, v, t) ; whatever the representatives x and v, there is a correspond ing pair (p, n) E P x N, which may be vary with x and v, but always veri fies ( 18.4.3) ; suppose that

�; ( 18.4.8) n all the�bers involved here are standard, p(x) :::::J p(x) & p(v) :::::J p(v) as well as q(df(x, v)) q(df(x, v)); in particular, it follows p(x)

+ p(v) < M(p, n)

1\ 0

< ltl

�

:::::J

p(x) + p(v)

M(p, n )

m(p, n)

< so that

q(df(x, v))

n

< an thus

-------

(j(df(x, v))

<

n

that is

(j(d}(x, v ) )

<

n

and the first factor in the consequent of ( 18.4.5) follows for j as required; for the remaining factor of the consequent, observe that, actually

p(x) + p(v) < M(p, n )

1\ 0

< l tl

1

� -;

n

implies

q

( � (f(x + tv) - j(x) - dj(x, v) ) )

too; t and n are standard, 0 #- t and 18.3. 1 , thus 1 t finally

( f (X + tv) - j (X )

-

dj ( X, V )

)

= q(t:.(J, x, v, t)) <

1

f as well as df are S-continuous by lemma :::::J

(

1 ' t j (X + t v )

A

- j (X)

-

-------

dj ( X, V )

)

;

(j(t:.(}, , x, v, t) ) � 1 . The proof that j is SUD with derivative dj is finished. Now suppose that j is itself uniformly differentiable with derivative d} : E2 ---) F. Take q E ()"Q and a standard function m : *(P X N) ---) *N; since ( ) is 1 - 1 M (p, n) := m(p, n ) ·

268

18. The power of Gateaux differentiability

defines a function from P x N into N and we may apply (18.4.6) appropriately written in terms of j, and obtain the corresponding finite set P x N � P x N; pick x , v E E, t E ffi. \ {0} and use ( 1 8 . 4 . 6 ) , with < for <:.:: , in order t o obtain (p, n) E P x N so that

[iJ (x) + fj(v)

<:.::

M(fj, n) 1\ 0 < l t l

<:.::

[<1 ( d](x, v))

�]

=?

< n 1\ q(l� (j, x, v, t)) < 1 ] ;

therefore, if

<:.:: -

fj ( x) + fj ( v )

<:.:: - ,

then hence or

1 n

p( x ) + p(v) < m(p, n) 1\ 0 < l t l <:.::

M (fj, n) 1\ 0 < l t l

( 18.4. 9)

( 18.4. 1 0)

1 n

q ( d](x, v)) < n 1\ q (!:::. ( ], x, v, t)) < 1 q ( dn;:;)) < n 1\ q(!:::. (j, x, v, t)) < 1 .

I t immediately follows that

q (df( x , v) )

<:.:: n,

because both q (ifr;;;:; ) ) and the n above are standard; moreover and when t >f, 0 q(!:::. ( f, x , v, t)) <:.:: 1 ;

f) is linear ( 18.4. 1 1 )

it so happens that t � q (!:::. (f, x , v , t ) ) i s always an internal function and we actually have shown that

[

Vq E uQ 3n E u N o < l t l < which is the same as

Vq E ()"Q [o # t � o

=*

� =? q(!:::. (f, x , v, t))

<:.::

1

]

q(!:::. (f, x , v, t ) ) � o]

and thus (18.4. 1 1 ) also holds when t � 0. Again because ( ) is 1- 1 , we may define p : = {P I p E P} -

and we have shown that ( 18.4.5) holds.

D

269

References

References [1] S . ALBEVERIO, J- E . FENSTAD, R. H 0EGH-KROHN and T. LINDSTR0M ( eds. ) , Nonstandard Methods in Stochastic Analysis and Mathematical Physics. Academic Press, 1 986. [2] LEIF 0. A RKERYD, NIGEL .J. CUTLAND and C. WARD HENSON (cds . ) , Nonstandard Analysis, Theory and Applications, Kluwer, 1997. [3] C. WARD HENSON, A Gentle Introduction to Nonstandard Extensions, in Nonstandard Analysis, Theory and Applications, Arkeryd, Cutland & Henson (eds. ) , Kluwer, 1997. [4] C. WARD HENSON and L . C . MOORE, "The Nonstandard theory of topo logical vector spaces", Trans. of the AMS, 172 ( 1 972) 405-435. [5] C . WARD HENSON and L . C . M ooRE , " Invariance of The Nonstandard Hulls of Locally Convex Spaces", Duke Math . .T . . 40 93- 205. [6] H . H . KELLER, Differential Calculus in Math 417, Springer-Verlag, 1 974.

in Locally Convex Spaces, Le

e

. Notes

[7] ANDREAS KRIEGL and PETER W . MICHOR, The Convenient setting of Global Analysis, Math. Surveys and Monographs 53, AMS ( 1 997) 405-435. [8] FRANCINE D IENER and KEITH STROYAN, Syntactical Methods in In finitesimal Analysis, in Cut land ( ed. ) , Nonstandard Analysis and its Ap plications, CUP, 1 988. [9] T oM LINDSTR0lvl , An invitation to Nonstandard Analysis. in Cutland (ed. ) , Nonstandard Analysis and its Applications, CUP, 1 988. [10] PETER LOEB and MANFRED WOLFF (eds . ) , the Working Mathematician. Kluwer, 2000.

Nonstandar-d Analysis for

[ 1 1] E. NELSON, " Internal Set Theory: a new approach to Nonstandard Anal ysis", Bull. AMS, 83 ( 1 977) . [12] VfTOR N EVES, " Infinitesimal Calculus in HM spaces", Bull. Soc. Math. Belgique, 40 ( 1 988) 1 77-198. [13] JACOB T. ScHWARTZ, 1969.

Nonlinear Functional Analysis, Gordon & Breach,

270

18. The power of Gateaux differentiability

[14] KEITH D. STROYAN and W . A . J . LUXEI\IBURG : ory of Infinitesimals, Academic Press, 1976.

Introduction to the The

[15] KEITH D . STROYAN, " Infinitesimal Calculus in Locally convex Spaces: 1 . Fundamentals", Trans. Amer. Math. Soc . , 240 ( 1978) , 363-383.

Nonstandard Palais- Smale conditions Natalia ��Iartins* and Vftor N eves**

Abstract

the Palais-Smale con cli tiou (PS) be low, some of them generalizations, but still sufficient to prove Mountain Pass Theorems, which arc quite import ant in Critical Point Theory. \!lie

19. 1

present uonst.anclarcl versious of

Preliminaries

For notation, other preliminaries and basic references on Nonstandard Analysis we suggest section 1 in article [13] in this volume, namely we assume that we have two set-theoretical structures A and B - the former a. model of "the relevant" Analysis - and a 1-1 function * ( · ) : A -. B which satisfies the Transfer and Polysatmation Principles. Specific notation and results follow. Recall that *JR denotes the set of hyperreal numbers and 0 the set of finite hyperreal numbers; if x E "'ll.t is infinitesimal we write x � 0. If x, y E *JR are such that x - y � 0, we say that x is infinitely close to y and write 'E � y. Also remember that finite hyperreal numbers have a standard part: .

Theorem 1 9 . 1 . 1 (Standard Part Theorem)

E

0 ihe1·e exists a ·u.niq'lLe r E lR sv,ch that :r; � t·; ·r is called the standard part of :r and is denoted by st ( x ) or 0x. Moreover, for all x, y E 0, st(x + y) = st (x) + st(y) , st(xy) = st(x)st(y) and if x � y then st (x) � st(y) .

rf x

*Departamento de Matematica, Universidade de Aveiro. nataliam@mat . ua . pt

Work for !:his article was partially supported by the R&D unit Center for Rc :>em·ch in Optimi?.atiou and Control ( CEOC) of t.hc University of Jlwciro and grm1t. POCTijMAT/41683/2001 of the Portuguese Foundation for Science m1d Technology

·�

(FCT)

via

FEDER.

vneves@mat . ua . pt

272

19. Nonstandard Palais-Smale conditions

Next we restrict some definitions to normed spaces; just recall that a normed space is a locally convex space whose topology is defined by a sin gle norm and also recall that, for any set A E A, "A :=

{*a : a E A},

and often A and "A are identified for simplicity.

Definition 19. 1 . 1 Let (E, 11 ·11) be a real normed space and x E *E. 1. x E fin(*E), i. e., x is finite if llxl l E 0;

2. x

E f-L(*E) , x E f-L(*E);

i. e.,

x

is infinitesimal if llxl l

3. x E ns(*E) , i.e., x is near-standard x :::::J a, which means x - a :::::J 0; 4. x E pns(*E), exists a E 17E

:::::J 0

in *JR; x

if there exists a

:::::J 0

E 17E

means

such that

i. e., x is pre-near-standard if for all positive c E lR there such that ll x - a ll < c.

It follows from the above definitions that ns(*E) c;; pns(*E) and ns(*E) fin(*E) . In general, ns(*E) of: pns(*E) and ns(*E) of: fin(*E).

c;;

Theorem 1 9 . 1 . 2 Let (E, 11 · 11) be a real normed space. 1. E is complete if and only if pns(*E) ns (*E); =

=

2. E is finite dimensional if and only if fin(*E) ns(*E); 3. A c;; E is compact if and only if *A c;; ns(*E) and st(*A) = A . The nonstandard extension of N, *N, is the set of hypernatural numbers and *N= denotes the set of infinite hypernatural numbers. It is also useful to keep in mind the following properties of sequences and functions in a real normed space (E,

11 · 11) .

Proposition 19. 1 . 1 Suppose ( xn ) n EN is a sequence in E. Then 1. ( X n ) n E N is bounded if and only if Vn E *N= X n E fin(*E);

2. ( xn ) nEN converges to a E E if and only if Vn E *Noc X n :::::J a; 3. ( x n ) n EN has a convergent subsequence if and only if ::lm Xrn E ns(*E).

E *N=,

Theorem 1 9 . 1 . 3 Let E and F be two real normed spaces and f E ---+ F . The function f is continuous o n a E E if and only if

Vx E *E [ x :::::J a =;. f(x)

:::::J

f(a)] .

273

19.2. The Palais-Smale condition

19.2

The Palais-Smale condition

:t\'Iany results in Critical Point Theory involve the following condition, orig inally introduced in 1 964 [10] by Palais and Smale:

If E is a real Banach space, the C 1 junctional j satisfies the Palais-Smale condition iffor all ( Xn ) n EN E EN ,

Definition 1 9 . 2 . 1

:

E ---+ lR

(f(xn )) n EN is bounded and nlim -HXJ j'(x n ) = 0 =? ( Xn ) n EN has a convergent subsequence.

(PS)

Suppose (E, 1 1 · 1 1 ) is a real Banach space with continuous dual E' and duality pairing ( ·, · ) . Let C 1 ( E, lR) denote the set of continuously Frechet differentiable functionals defined on E; a is critical point ( resp. almost critical point) of f if f'(a) = 0 (resp. f'(a) :::::J 0) , i.e . , if (j'(a), v) = 0 (resp. (j'(a), v) :::::J 0) for all v E E (resp. v E fin(*E) ) : j(a ) is a critical value of f if j -1 (J(a)) contains critical points. If f : E ---+ lR is Frechet differentiable we will denote by K the set of all critical points of f, that is,

K = {x E E : j ' (x) = O} and, for each that is,

c

E R

Kc

will denote the set of critical points with value

Kc = {x E E : .f ' (x) = 0 1\ f(x) = c} = K n f -1 (c) .

The following is easy to prove.

Su,ppose that f E C1 (E, IR) satisfies (PS) . Then 1. If f is bounded, K is a compact set. 2. For each a, b E lR such that a ::; b, {u E E : a :s; j(u) :s; b 1\ j'(u) = O} = f -1 ( [a, b] ) n K

Proposition 19.2.1

is a compact set. Note that

f satisfies and

(PS) and

K is compact

K is compact and f is bounded

c;l?

as can be seen with the following two examples.

c;l?

f is bounded

f

satisfies (PS ) ,

c,

19.

274

Nonstandard Palais-Smale conditions

Example 1 9 . 2 . 1 The real function f( x ) = x3 ( x E IR) satisfies (PS), is compact and f is not bounded . Example 19.2.2 Let

{ 2;2-

x2 if X E if X tt

[ - 1 , 1] [- 1 , 1]

f is bounded, K = {0} but f does not satisfies

(PS ) .

f (X)

=

K = {0}

Example 19.2.3 The real functions exp ( x ) , cos ( x ) , sin ( x ) and all the con stant functions defined on lR do not satisfy (PS ) . Now we will present an important class of functionals that satisfies ( PS ) . Proposition 19.2.2 C1 (E, IR) is coercive fies (PS ) .

If E is a finite dimensional real Banach space and f E (i. e. f(x) ----+ +oo whenever ll x ll ----+ +oo), then f satis

Proof. Let (x n ) n EN C:.: E be such that (f(x n ) ) n EN is bounded and limn _, = f'(x n ) = 0. Then, for all n E *Noc, f(x n ) E 0 ( Proposition 19. 1 . 1 ) . Since f is coercive, for all n E *Noc , X n E fin(*E) . E is finite dimensional then , by Theorem 19.1 .2, fin ( *E ) = ns ( *E ) and therefore, for all n E *N= , X n E ns ( *E ) . Finally, Proposition 19. 1 . 1 implies that (x n ) n EN has a convergent subsequence. D

19.3

Nonstandard Palais-S male conditions

The following definition often shortens statements

Suppose f E C1 (E, IR) . We say that a sequence (un ) n EN is a Palais-Smale sequence for f zf

Definition 19.3.1

(j(un )) n EN is bounded and nlim ->CXJ j'(un ) = 0. Therefore,

f satisfies the Palais-Smale condition ( PS ) if every Palais-Smale se quence for f has a convergent subsequence.

275

19.3. Nonstandard Palais-Smale conditions

Suppose E is a real Banach space and f E C 1 (E, IR) . Next we present some nonstandard variations of (PS ) : (PSO)

( un ) n EN is a Palais-Smale sequence -U::lm E *N= Um E ns (*E)

(PS1 )

f (u) E 0 1\ f'(u)

f(u) E 0 1\ f'(u) :::::J 0 -Uu E fin(*E) 1\ st ( f (u ) ) is a critical value of f

(PS2)

(PS3)

(PS4)

(un ) n EN

( Un) n EN is a Palais-Smale sequence -Uis bounded 1\ Vn E *N= st (f(un ) ) is a critical value of f

(un ) n EN

( un ) n EN is a Palais-Smale sequence -Uis bounded 1\ ::l n E *N= st (f (un ) ) is a critical value of f

Proposition 19.3.1 If f E C 1 (E, IR) (PS1 )

:::::J 0 =? u E ns(*E)

<=?

[f (u) E

0 1\ j' (u)

:::::J

0

then

=?

u E ns(*E)

1\ st (f(u))

is a critical value

]

of f .

Proof. The implication -¢== is trivial. For the proof of the other implication, suppose that u E *E is such that f (u) E 0 and f'(u) :::::J 0. By (PS1 ) , there exists a E 17E such that u :::::J a and, therefore. from the continuity of f and f' it follows that (Theorem 1 9 . 1 .3) f (a) too, so that f (a)

=

:::::J f ( u)

and f' (a)

:::::J f' ( u) :::::J 0

st (f(u)) and f (a) is a critical value of f.

Nonstandard versions of (PS) and (PS) itself are related as follows.

D

276

19. Nonstandard Palais-Smale conditions

Theorem 19.3.1 (PS1)

For any real Banach space =;.

(PS)

¢?

(PSO)

=;.

(PS2)

E we have ¢?

(PS3)

=;.

(PS4) .

Proof. It is clear that (PS) ¢? (PSO), (PS1) =;. (PSO) and (PS3) therefore (PS 1 ) =;. (PS) ¢? (PSO) and (PS3) =;. (PS4)

*E

=;.

(PS4) ,

are true. First we prove that (PSO) =? (PS2 ) . Let u E such that f(u ) E 0 and For each f ' (u) :::::J 0. Fix M E JR+ such that l f (u) l < M. Suppose u � n E N, define the standard set

Hn { x E E : :=

Since, for each

fin(*E).

�

IJ ( x ) l < M 1\ ll f' ( x ) ll <

1\ l l x l l > n } ·

n E N,

*Hn = { x E *E : l f( x) l < M 1\ ll f' ( x ) ll < � 1\ ll x l l > n } . we conclude that *Hn # 0 and the Transfer Principle says that Hn # 0. For each n E N, take Xn E Hn. Then ( x n)n E N is a Palais-Smale sequence but, for all n * N oc , Xn � fin(*E), and hence. for all n * N = , Xn � ns(*E). This i s a contradiction with (PSO) and therefore u E fin(*E). u

E

E

E

Now we will prove that if f(u) critical value of f. Let a = st (f (u) ) and, for each n

Fn {X E E : :=

Since, for each

E

0 and

f ' (u) :::::J 0, then st (j(u)) i s a

E N, define

I J ( x ) l < M 1\ I I J' ( x ) ll <

�

1\ I a - f( x ) l

<

� }.

n E N,

*Fn {X E *E : IJ ( x ) l < M 1\ ll f' ( x ) ll < � 1\ lo: - f(x) l < �} then , *Fn # 0 and therefore Fn # 0 . For each n E N take Xn E Fn. Then ( x n)n E N is a Palais-Smale sequence u

E

=

and from (PSO) we conclude

Since f

E C 1 (E, IR) , a :::::J f( xm ) :::::J j( a ) and 0 :::::J j' ( xm )

:::::J

j' ( a )

.

277

19.3. Nonstandard Palais-Smale conditions Therefore a = j(a) , f'(a) is proven. Lets prove that ( P 52) Then,

=

0,

a

is a critical value of f and (PSO)

=? (PS2)

=? ( P 53) . Let ( un ) n EN be a Palais-Smale sequence.

From (PS2) we conclude that

Vn E *N= [ un E fin(*E) 1\ st (j(un ) ) so we may conclude that

is a critical value of f ]

( un ) n EN is bounded and

Vn E *N= st(j(u n ) )

is a critical value of

f.

Finally we will prove that ( P 53) =? ( P 52) . Let u E *E such that f ( u) E 0 and f'(u) :::::J 0. Let M E JR+ such that l f(u) l < M. Suppose u � fin(*E) ; we construct a Palais-Smale sequence ( Xn )nEN using the sets H71 , as in the proof of the first part of the proof that ( P SO) =? ( P 52) , in such a way that ( Xn ) n EN is not bounded, which contradicts (PS3 ) . Suppose now that f ( u) E 0 and f' ( u) :::::J 0 . We need t o prove that a = st (j ( u)) is a critical value of f. .Just as in the second part of the proof of (PSO) =? (PS2 ) , we can build a Palais-Smale sequence (x n ) n EN such that for all n E *N= , j(x n ) :::::J o:. From (PS3) we conclude that for all n E *N= , D st(j(xn ) ) = a is a critical value of f. More can be said when

E is separable:

When E is a separable Banach space, (PS) and (PS1 ) are equivalent; in other words: if E is a separable Banach space, a C1 junction f : E ---+ lR verifies the Palais-Smale condition if and only if almost critical points where f is finite are near-standard. Theorem 19.3.2

Proof. Suppose f satisfies (PS) and u E *E is such that f(u) E 0 and f'(u) :::::J 0. If u � ns(*E) , it follows from Theorem 19. 1 . 2 that u � pns(*E), that is, ::lc E JR+ Vy E uE ll u - Y ll > c. Let V : = { Vp : p E N} be dense in E. We will construct a Palais-Smale sequence (xn ) n EN in E such that for all N E *N = , X N � ns(*E) which contra dicts (PS ) . Let M E JR+ b e such that l f(u) l < M. For each n E N, define Cn

:=

{ x E E : IJ(x) l < M 1\ llf'(x) ll < � 1\ Vp E N [p :::; n =? llx-vv ll > c ] } ·

278

19. Nonstandard Palais-Smale conditions

Since, for each n E N, u E *C71 , we conclude that *Cn # 0 and therefore Cn # 0 . For each n E N, take Xn E crl' Let N E *N = and v E 17E . Since v = E, there exists P o E N such that ll v - Vp0 II < � . Since we conclude that x N � ns(*E) which contradicts (PS ) .

D

It is not clear whether this equivalence is true in general. The following result is consequence of the fact that if E is finite dimensional, then E is separable and fin(*E) = ns(*E ) . Theorem 19.3.3 ( PS 1 )

If E is finite dimensional ¢?

(PS)

¢?

(PSO) ¢? (PS2) ¢? (PS3) ¢? (PS4) .

Another easy observation:

Any C1 functional in a Banach space which verifies (PS4) and admits a Palais-Smale sequence, has at least one critical point.

Proposition 19.3.2

Proof. Suppose that ( un ) nEN is a Palais-Smale sequence for the functional f E C 1 (E, IR) . Then there exist m E *N= such that st ( f ( um) ) is a critical value of f. Hence, there exists a E E such that j(a) st (j(um ) ) and f'(a) 0. D =

=

Next we present an example which shows that

(PS2) f? ( PS 1 ) . Example 19.3. 1 Let H be an infinite dimensional Hilbert space and define j : H ----+ lR x f---+ f ( x ) = g ( ll x ll 2 - 1 ) where g : lR ----+ lR i s given by

g(t)

�

{

0 t 2 exp

_

__L t2

if t :<.:: 0 if t > 0

Observe that g is a C1 function and

g ' ( t) :::::: 0 Also,

¢}

h:H x

[ t :<.:: 0

v

t

----+ lR f---+

llxll2 - 1

::::::

0 ].

( 1 9.3 . 1 )

19.3.

279

Nonstandard Palais-Smale conditions

is a C1 functional and \fa E H

Vx E H

(h'(a) , x)

=

•

2a x

where • denotes the inner product in H. Therefore, f is a C1 functional. We will prove that f does not satisfy ( P 51) but satisfies ( P 52) . By Theorem 19.1.2 we can take u E *H such that u E fin(*H) \ ns(*H) and l l u l l = 1 . Hence, j(u) = 0 and f'(u) = 0, which shows that f does not satisfy (P51 ) . Note that v tf_ fin(*H) =? f(v) tf- 0 ·

·

and

f'(v) :::::J 0

<=?

<=?

ll f'(v) l l :::::J 0 Vx E fin(*H) (f'(v), x)

=

(2v x)g'(l l v l l 2 - 1 ) :::::J 0. •

(19.3.2)

Next we will prove that

f'(v) :::::J 0 =? [ l lv l l

�

1

V

llvll :::::J 1 ] .

(19.3.3)

f'(v) :::::J 0 and v # 0, by (19.3.2) either 2v ll�l l :::::J 0, and thus ll v ll :::::J 0, or g '(llvll2 - 1) :::::J 0, so that, by ( 19.3. 1 ) , llvll � 1 or llvll :::::J 1; in all possible cases ( 19.3.3) holds. •

If

Since

[ llv ll

<:::

1

and 0 is a critical value of

j,

we may conclude that

f(v)

E 0 1\

proving that

V

ll v ll :::::J 1 ] =? f(v) :::::J 0

j'(v) :::::J 0 =? v E fin(*H) f

does satisfy

1\

st(f(v))

0 is a critical value of f

( P 52) .

Remark 19.3. 1 Other consequences of Example 1 . I f we assume H to b e separable, Theorem

(P52) 2.

=

Moreover, the fact that that the sets

19.3 . 1 :

19.3.2 shows that

f? (PS ) .

f E C1 (E, IR) and satisfies (P52), does not imply (a

are compact (see Proposition

19.2. 1 ) .

<::: b)

19.

280

Nonstandard Palais-Smale conditions

19.1.1

It follows easily from Proposition lent to

that condition (PS3) is equiva

( un ) n EN

(un ) n EN

is a Palais-Smale sequence -Uis bounded 1\ all convergent subsequences of converge to a critical value of f

(j(un ) ) nEN

and (PS4) is equivalent to

( un ) n EN

( un ) n EN

is a Palais-Smale sequence -Uis bounded 1\ there is a subsequence of (!( un ) ) n EN which converges to a critical value of f.

Moreover, in the finite dimensional case, these two standard conditions are equivalent .

19.4

Palais-Smale conditions per level

In this section we present a weaker compactness condition for C1 functionals introduced in 1980 [3] by Brezis, Caron and Nirenberg. In the survey book [7] the reader can find more variants of the (PS ) condition. c

Suppose f E C1 (E, IR) and E R We say that (un ) n E N is a Palais-Smale sequence of level {for f) if

Definition 19.4.1

c

lim f(un ) n---j . CXJ

=

lim f'(un ) c and n---j . CXJ

=

0.

f satisfies the Palais-Smale condition of level (PS )c, if every Palais Smale sequence of level has a convergent subsequence. c,

c

Remark 19.4.1 Suppose that

f satisfies

(PS ) , then

f E C1 (E, IR) .

f satisfies

Then

(PS) c for all

c

1.

If

2.

If f satisfies (PS) c, then the set of critical points of value compact.

E

R c,

Kc , is

Example 19.4. 1 The function exp(x) : lR ____, lR satisfies (PS ) c for all c except for c = 0. The real functions sin( x) and cos( x) defined in lR satisfy (PS )c for all c except for c = 1 and c = - 1 .

19.5. 19.5

281

Nonstandard variants of Palais-Srnale conditions per level

Nonstandard variants of Palais-Smale conditions per level

As above, E is a real Banach space, Obvious adaptations of conditions following conditions per level:

(PSO)c

( P 51 )

( un ) n EN

c

(PS2)c Note that if

and

c

and

ER (PS2)

is a Palais-Smale sequence of level -U::lm E *N= Um E ns(*E)

f(u) :::::J

c

1\

1\

c

=

c

1\ -U-

c

f'(u) :::::J 0

st (f(u) )

is a critical value of f

( un ) n EN is a Palais-Smale sequence of level

hence, the adaptation of conditions alent to the standard condition:

provide the

f'(u) :::::J 0 =? u E ns(*E)

f(u) :::::J u E fin(*E)

f E C1 (E, IR) (PSO) , (PS1)

c,

then

(PS3) and (PS4) to this context are equiv

is a Palais-Smale sequence of level c -U( un ) n EN is bounded 1\ c is a critical value of f.

( un ) n E N

19.3.1, 19.3.2 and 19.3.3 can be easily proven. Theorem 1 9 . 5 . 1 For any real Banach space E we have (PS1)c =} (PS)c B (PSO)c =} (PS2)c• Theorem 19.5.2 Suppose E is a real separable Banach space and f E C 1 (E, IR) . Then f satisfies (PS)c if and only if f satisfies (PS1 )c. Theorem 19.5.3 If E has finite dzmension, then (PS1)c
=

19.

282

19.6

Nonstandard Palais-Smale conditions

Mountain Pass Theorems

Some important results in Critical Point Theory, such as the Mountain Pass Theorems and some variants of Ekeland's Variational Principle, can be obtained using a deformation technique. This technique was introduced in 1934 by Lusternik and Schnirelman [8] and consists in deforming a given C 1 functional outside the set of critical points. In 1983 Willcm proved in [14] the Quantitative Deformation Lemma; a very technical lemma that involves the concept of pseudo-gradient vector field (sec [14] or [9] ) . For

S s;;; E , a E JR+

and

c E IR,

r : = {x E E : f(x) :S c} where

we usc the following notations and

Sa := {x E E : dist(x, S) :S a}

dist(x, S) = inf { l l x - Y ll : y E S}.

Lemma 19.6.1 (Quantitative Deformation Lemma) S S: E, c E IR, c, 6 E JR+ be such that

\iy E f -1 ( [c - 2c, c + 2r::l ) n S25 x

Let f E C1 (E, IR) ,

[ ll f' (y) ll 2 8; ] .

Then there exists rJ E C( [O, 1] E, E) such that 1. rJ(O, y) y; 2. rJ(t, y) = y if y It' f -1 ( [c - 2r:: , c + 2r:: ] ) n S2s; =

3.

rJ(1, r+E n S)

s;;;

r- E ,.

for all t E [0, 1] , rJ(t, ) : E ----+ E is a homeomorphism; 5. \it E [0, 1] , \iy E E ll rJ (t, y) - Y l l :S 6 ; 6. for each y E E, f ( rJ( y)) is non increasing;

4.

·

·,

7. \it E ]0, 1] , \iy E r n Ss f(rJ(t, y))

< c.

An easy consequence of the Quantitative Deformation Lemma is the fol lowing variant of Ekeland's Variational Principle (see [9, page 14] ) . Corollary 19.6.1 Let f E C 1 (E, IR) b e bounded from below. Then, for any c E JR+ , there exists u E E such that

f(u)

:<:: inf

xE E

f(x) + c and ll f '( u) ll

< vfc.

19.6.

283

Mountain Pass Theorems

Applying the Transfer Principle to Corollary Corollary 19.6.2 Let f a point u E *E such that

19.6.1 we obtain

E C1 (E, IR) be bounded from below. Then there exists

f(u) :::::J inf f(x) and j'(u) :::::J 0. x EE Now we can deduce =

Let f E C 1 (E, IR) be bounded below and c infxE E f(x) . If f satisfies (PS2)c then c is a critical value of f. Proof. By Corollary 19.6.2 we conclude that there exists u E *E such that f(u) :::::J c and f'(u) :::::J 0. Since f satisfies (PS2)c, c st(f(u)) is a critical value of f. D Theorem 19.6.1 is a generalization of the classical result ( see [7, page 16] ) : Theorem 19.6.1

=

Theorem 19.6.2 Let f E C1 (E, IR) be bounded f satisfies ( PS )c then c zs a cntical value of f.

below and c

=

infxEE f(x) .

If

We now state the Mountain Pass Theorem introduced by Ambrosetti and Rabinowitz in 1973 [1]. Theorem 19.6.3 (Mountain Pass Theorem, Ambrosetti-Rabinowitz) Let E be a real Banach space and f E C 1 ( E, lR) . Suppose that 1.

there exists e E E and r E JR+ such that llell > r and max{f(O), f( e ) } < b inf f(x); :=

[[ x [[=r

:

2. f = {r E C( [0, 1], E) : 1 (0) = 0 A r(1) = e } and c = inf max f(r(t) ); 1Er tE[O,l] 3. f satisfies (PS) . Then c :2: b and c is a critical value of f . We usually say that if f : E ---+ lR satisfies condition 1 of Theorem 19.6.3, then f satisfies the mountain pass geometry ( with respect to 0 and e) . Remark 19.6.1 Conditions 1 and 2 of Theorem 19.6.3 are not enough to imply that c is a critical value of f as we can see with the following example ( sec [7, page 36] ) . The function f : IR2 ---+ lR defined by f( x, y) = x2 + ( x + 1 )3y 2 satisfies the mountain pass geometry with 0 = (0, 0), e = ( -2, 3) and r = ! · (0, 0) is a strict local minimizer and is the only critical point of f. Therefore, there is no z E IR2 such that f(z) = c > 0 and f'(z) = 0.

19.

284

Nonstandard Palais-Smale conditions

In 1980 Brezis, Caron and Nirenberg obtained in [3] a generalization of the Mountain Pass Theorem of Ambrosetti-Rabinowitz for functionals satisfying the (P S) c condition: Theorem 19.6.4 (Mountain Pass Theorem, Brezis- Coron-Nirenberg) Let E be a real Banach space and f E C1 ( E, lR) . Suppose that 1.

there exists e E E and r E JR+ such that li e I I max{f(O) , f(e)}

2.

=

r {r E C( [O, 1] , E) : 1(0)

=

<

0 !\ 1(1)

b

:=

=

>

r and

inf f(x); ll x ll =r

e} and

c :=

inf max

1Er tE[ O .l]

f(r(t)) ;

f satisfies (PS)c · Then 2: b and is a critical value of f . 3.

c

c

This result and the Mountain Pass Theorem of Ambrosetti-Rabinowitz are easy consequences of the Quantitative Deformation Lemma, since it is possible to prove that for every C 1 functional that satisfies conditions 1 and 2 of both theorems. there exists a Palais-Smale sequence of level c (see [9, page 19]). We now deduce the following generalization of the Mountain Pass Theorem of Brczis-Coron-Nirenberg. Theorem 19.6.5

pose that 1.

Let

E

be a real Banach space and f E

there exists e E E and r E JR+ such that li e I I max{f (O), f(e)}

2. f 3.

Then

=

{r E C( [O, 1] , E) : 1(0)

f satisfies c

2:

=

<

0 !\ 1(1)

b =

:=

>

C1 (E, IR) .

Sup

r and

inf f(x); ll x ll =r

e } and

c :=

inf max

1Er tE[O,l]

f(r(t) ) ;

(PS4) c · c

b and is a critical value of f.

Proof. From the Quantitative Deformation Lemma there exists a Palais-Smale sequence of level c . Since f satisfies (PS4) c, c is a critical value of f. D

285

References

References [1] A. A!VI BROSETTI and P . RABINOWITZ, " Dual vatiational methods in crit ical point theory and applications", J. Funct . Anal., 14 (1973) 349-38 1 . [2] LEIF 0 . ARKERYD, NIGEL J . CUTLAND and C . WARD HENSON (eds . ) , Nonstandard Analysis, Theory and Applications, Kluwer, 1997. [3] H. B REZIS, J . M . C oRON and L. N IRENBERG, "Free vibrations for a non linear wave equation and a theorem of P. Rabinowitz", Comm. Pure Appl. Math. , 33 ( 1980) 667-684. [4] NIGEL J. CUTLAND (ed.) , CUP, 1988.

Nonstandard Analysis and its Applications,

[5] NIGEL J . CUTLAND, VfTOR NEVES, FRANCO OLIVEIRA and J OSE S ousA PINTO (cds.), Developments in nonstandard mathematics, Long man, 1995. [6] IVAR EKELAND "Nonconvex minimization Problems", Bulletin (New Se ries) of the American l\Iathematical Society, 1 ( 1979) .

An In troduction to Minimax Theorems and their Applications to Differential Equations, Kluwer Academic Publishers, 200 1 . L . LUSTERNIK and L . SCHNIRELMANN, Methodes topologiques dans le problemes variationnels, Hermann, Paris, 1934. .JEAN MAWHIN. Crztical Point Theory and Applicatwns to Nonlinear Dif ferential Equations, VIGRE Minicourse on Variational Methods and Non

[7] MARIA DO RosARIO GROSSINHO and STEPAN AGOP TERSIAN,

[8] [9]

linear PDE, University of Utah, 2002 . [10] R . S . PALAIS and S . SMALE, "A generalized Morse theory", Bull. Amer. Math. Soc., 70 (1964) 165-172.

Minimax Methods in Critical Point Theory with Applicatwns to Dzfferentwl Equations, CBl\IS Regional Conference, 65,

[ 1 1] PAUL H. RABINOWITZ,

AMS, Providence, Rhode Island, 1986. [12] K . D . STROYAN and W . A . .J . LUXEMBURG, Infimtesimals, Academic Press. 1976.

Introduction to the theory of

[13] V. NEVES, The power of Gateaux differentiability, this volume. [14] M . WILLEM, Lectures on Critical Point Theory, Trabalho de Matermitica 199, Fund. Univ. Brasilia, Brasilia. 1983.

Averaging for ordinary different ial equat ions and functional differential equat ions Tewfik Sari* Abstract A nonstauclard approach to averaging theory for ordiuary di fferential equations and functional differential equations is developed. We define a notion of perturbation and we obtain averaging results under weaker conditions than the results in the literature. The classical averaging the

orems apptoximate the solutions of the system by th e solutions of the averaged system, for L ipschitz continuous vector fields, and when the so lutions exist on the same interval as the �;olutions of the averaged system. vVe extend these resnlt.s to perturbations of vector fields which are uui

tormly continuous in the spatial variable with l'E'Spect to the time variable and without any restriction on the interval of existence of the solution.

20 . 1

Introduction

In the early seventies, Georges R.ceb, who learned about Abraham Robin son's Nonstandar-d A nalysis (NSA) [29[ , was convinced that NSA gives a lan guage which is well adapted to the study of perturbation theory of differen tial equations (see [6, p. 374[ or [251). The a,'domatic presentation Internal Set Theo1·y (IS T) [26[ of NSA given by E. Nelson corresponded more to the Reeb's dream and was in agreement with his con vi cti on "Les entie'f's na:i.fs ne remplissent pas N ". Indeed, no formalism can recover exactly all the actual phenomena, and nonslanda:rd objecls which may be considered as a. formal ization of non- naive objects are already elements of our usual (standard) sets. We clo not need any use of sta.-rs and enlarg ements. Thus, the Reebian school adopted IST. For more informations about Recb's dream and convictions sec the Recb's preface of Lut z and Gaze's book [25] , St.ewart,'s book [40] p. 72, or Lobry's book [23[. The Reebian school of nonstandard perturbation theory of differ-ential eqv.a, tions produced various and numerous studies and new results as attested by a * univcrsiic de Haute Alsace, l'vlulhousc, Fra.11ce. T , Sari@uha . fr

20.2.

Deformations and perturbations

287

lot of books and proceedings (see [2, 3, 4, 7, 8, 9, 10, 12, 23, 25, 30, 37, 42] and their references) . It has become today a well-established tool in asymptotic theory, see, for instance, [17, 18, 20, 24, 39] and the five-digits classification 34E18 of the 2000 Mathematical Subject Classification. Canards and rivers (or Ducks and Streams [7] ) are the most famous discoveries of the Reebian school. The classical perturbation theory of differential equations studies defor mations, instead of perturbations, of differential equations (see Section 2.1). Classically the phenomena arc described asymptotically, when the parameter of the deformation tends to some fixed value. The first benefit of NSA is a natural and useful notion of perturbation. A perturbed equation becomes a simple nonstandard object , whose properties can be investigated directly. This aspect of NSA was clearly described by E. Nelson in his paper Mathematical Mythologies [30] , p. 159, when he said "For me, the most exciting aspect of non

standard analysis is that concrete phenomena, such as ducks and streams, that classically can only be described awkwardly as asymptotic phenomena, become mythologized as simple nonstandard objects. "

The aim of this paper is to present some of the basic nonstandard techniques for averaging in Ordinary Differential Equations (ODEs) , that I obtained in [32, 36] . and their extensions, obtained by M. Lakrib [19] , to Functional Dif ferential Equations (FDEs) . This paper is organized as follows. In Section 20.2 we define the notion of perturbation of a vector field. The main problem of perturbation theory of differential equations is to describe the behavior of tra jectories of perturbed vector fields. We define a standard topology on the set of vector fields� with the property that f is a perturbation of a standard vector field fo if and only if f is infinitely close to fo for this topology. In Section 20.3 we present the Stroboscopic Method for ODEs and we show how to use it in the proof of the averaging theorem for ODEs. In Section 20.4 we present the Stroboscopic Method for FDEs and we show how to use it in the proofs of the averaging theorem for FDEs. The nonstandard approaches of averaging are rather similar in structure both in ODEs and FDEs. It should be noticed that the usual approaches of averaging make use of different concepts for ODEs and for FDEs: compare with [5, 31] for averaging in ODEs and [13, 14, 15, 22] for averaging in FDEs.

20.2 20. 2 . 1

Deformations and perturbations Deformations

The classical perturbation of differential equations

theory of differential equations x = F(x, c ) ,

studies families

(20.2. 1 )

20.

288

Averaging

where x belongs to an open subset U of JRn , called phase space, and c belongs to a subset B of JRk , called space of parameters. The family (20.2.1) of differential equations is said to be a k-parameters deformation of the vector field P0 (x) = P(x, co), where co is some fixed value of c. The main problem of the perturbation theory of differential equations is to investigate the behavior of the vector fields P(x , c) when c tends to c0 . The intuitive notion of a perturbation of the vector field Po which would mean any vector field which is close to Po does not appear in the theory. The situation is similar in the theory of almost periodic functions which, classically, do not have almost periods. The nonstandard approach permits to give a very natural notion of almost period (see [16, 28, 33. 41] ) . The classical perturbation theory of differential equations considers deformations instead of perturbations and would be better called defoTmation theory of dijjeTential equations. Ac tually the vector field P(x, c) when c is sufficiently close to co is called a perturbation of the vector field P0 (x ) . In other words, the differential equation

x

=

Po (x)

(20.2.2)

is said to be the unperturbed equation and equation (20.2.1), for a fixed value of c, is called the perturbed equation. This notion of perturbation is not very satisfactory since many of the results obtained for the family (20.2. 1) of dif ferential equations take place in all systems that are close to the unperturbed equation (20.2.2). Noticing this fact, V. I. Arnold (see [1] , footnote page 157) suggested to study a neighborhood of the unperturbed vector field P0 (x) in a suitable function space. For the sake of mathematical convenience, instead of neighborhoods, one considers deformations. According to V. I. Arnold, the situation is similar with the historical development of variational concepts, where the directional derivative ( Gateaux differential) preceded the derivative of a mapping (Frcchet differential) . Nonstandard analysis permits to define a notion of perturbation. To say that a vector f is a perturbation of a stan dard vector field fo is equivalent to say that f is infinitely close to fo is a suitable function space, that is f is in any standard neighborhood of fo . Thus, studying perturbations in our sense is nothing than studying neighborhoods, as suggested by V. I. Arnold.

20.2.2

Perturbations

Let X be a standard topological space. A point x E X is said to be infinitely close to a standard point x0 E X, which is denoted by x c:::- x0, if x is in any standard neighborhood of x0. Let A be a subset of X . A point x E X is said to be nearstandard in A if there is a standard x0 E A such that x c:::- x0.

20.2.

289

Deformations and perturbations

Let us denote by

N SA

=

{x E X : :3st xo E A x

c:::-

xo},

the external-set of nearstandard points i n A [34] . Let E b e a standard uniform space. The points x E E and y E E are said to be infinitely close, which is denoted by x c:::- y, if (x, y) lies in every standard entourage. If E is a standard metric space, with metric d, then x c:::- y is equivalent to d(x , y) infinitesimal.

Let X be a standard topological space X. Let E be a standard uniform space. Let D and Do be open subsets of X, Do standard. Let f : D ---+ E and fo : Do ---+ E be mappings, fo standard. The mapping f is said to be a perturbation of the mapping fo, which is denoted by f fo, if N SDo C D and f(x) fo (x) for all x E N SDo . Let :Fx,E be the set of mappings defined on open subsets of X to E : Definition 1

c:::-

c:::-

:Fx , E = { (!, D) : D open subset of X

and

f : D ---+ E} .

Let us consider the topology on this set defined as follows. Let The family of sets of the form

(fo, Do) E :Fx,E .

{(!, D) E :Fx.E : K C D Vx E K (j(x), fo (x)) E U}, where K is a compact subset of Do, and U is an entourage of the uniform space E, is a basis of the system of neighborhoods of (!0, D0 ). Let us call this topology the topology of uniform convergence on compacta. If all the mappings are defined on the same open set D, this topology is the usual topology of uniform convergence on compacta on the set of functions on D to E.

Assume X is locally compact. The mapping f is a perturbation of the standard mapping fo if and only if f is infinitely close to fo for the topology of uniform convergence on compacta.

Proposition 1

Proof. Let f : D ---+ E be a perturbation of fo : Do ---+ E. Let K be a standard compact subset of D0. Let U be a standard entourage. Then K C D and f(x) c:::- fo (x) for all x E K. Hence (j(x), fo (x)) E U. Thus f c:::- fo for the topology of uniform convergence on compacta. Conversely, let f be infinitely close to fo for the topology of uniform convergence on compacta. Let x E N SDo . There exists a standard xo E Do such that x c:::- xo . Let K be a standard compact neighborhood of x0, such that K C Do (such a neighborhood exists since X is locally compact) . Then x E K C D and (j(x) , j0 (x)) E U for all standard entourages U, that is N sDo C D and f ( x) c:::- fo ( x) on N sDo . Hence f is a perturbation of fo. D

20.

290

Averaging

The notion of perturbation can be used to formulate Tikhonov's theorem on slow and fast systems whose fast dynamics has asymptotically stable equi librium points [24] , and Pontryagin and Rodygin's theorem on slow and fast systems whose fast dynamics has asymptotically stable cycles [39] . In the fol lowing section we use it to formulate the theorem of Krilov, Bogolyubov and Mitropolski of averaging for ODEs. All these theorems belong to the singular perturbation theory. In this paper, by a solution of an Initial Value Problem (IVP) associated to an ODE we mean a maximal (i.e. noncontinuable) solu tion. The fundamental nonstandard result of the regular perturbation theory of ODEs is called the Short Shadow Lemma. It can be stated as follows [36, 37] . Let g : D ---+ IRd and g0 : Do ---+ IRd be continuous vector fields, D, Do C 0 IR+ x IRd . Let ag and a be initial conditions. Assume that g0 and a8 are standard. The IVP =

dXjdT g(T, X), X(O)

=

a0

(20.2.3)

is said to be a perturbation of the standard IVP =

=

(20.2.4) dXjdT go (T, X), X(O) a8, 0 if g g0 and a ag. To avoid inessential complications we assume that equation dX/ dT g0 (T, X) has the uniqueness of the solutions. Let ¢0 be the solution of the IVP (20.2.4) . Let I be its maximal interval of definition. Then, by the following theorem any solution of problem (20.2.3) also exist on I and is infinitely close to ¢o . Theorem 1 (Short Shadow Lemma) Let problem (20. 2. 3) be a perturba tion of problem (20. 2.4). Every solution ¢ of problem (20. 2. 3) is a perturbation of the solution c/Jo of problem (20. 2.4), that is, for all nearstandard t in I, ¢(t) is defined and satisfies ¢(t) ¢o (t). Let us consider the restriction 1jJ of ¢ to NsI. By the Short Shadow Lemma, for standard t E I, it takes nearstandard values 1/J(t) ¢0 (t) . Thus its shadow, which is the unique standard mapping which associate to each standard t the standard part of 1jJ ( t) , is equal to ¢0 . In general the shadow of ¢ is not equal to ¢0 . Thus, the Short Shadow Lemma describes only the "short time behaviour" "='

"='

=

"='

"='

of the solutions.

20.3

Averaging In ordinary differential equations

The method of averaging is well-known for ODEs. The fundamental re sult of this theory asserts that , fm small c > 0, the solutions of a nonau tonomous system

x = f (tjr:: , x, r:: ) ,

where

x = dxjdt,

(20.3. 1)

291

20.3. Averaging in ordinary differential equations are approximated by the solutions of the averaged autonomous system iJ

=

F (y) ,

where

F(x)

=

1 lim --+ T =T

1 f(t, x, T

0

0)

dt.

(20.3.2)

The approximation of the solutions of (20.3. 1) by the solutions of (20.3.2) means that if x ( t , c ) is a solution of (20.3. 1 ) and y(t) is the solution of the averaged equation (20.3.2) with the same initial condition, which is assumed to be defined on some interval [0, T] , then for c c:::: 0 and for all t E [0, T] , we have x(t, c ) c:::: y(t). The change of variable z ( T ) = x(u) transforms equation (20.3. 1) into equation 1 (20.3. 3) z = c f ( T, z , c ) . where z ' = dz j dT . Thus, the method of averaging can be stated for equation (20.3 .3), that is, if c is infinitesimal and 0 ::; T ::; T/c then z ( T, c) c:::: y(cT) . Classical results were obtained by Krilov, Bogolyubov, Mitropolski, Eck haus, Sanders, Verhulst (see [5, 31] and the references therein). The theory is very delicate. The dependence of f ( t, x, c ) in c introduces many complications in the formulations of the conditions under which averaging is justified. In the classical approach, averaging is justified for systems (20. 3 . 1 ) for which the vector field f is Lipschitz continuous in x. Our aim in this section is first to formulate this problem with the concept of perturbations of vector fields and then to give a theorem of averaging under hypothesis less restrictive than the usual hypothesis. In our approach, averaging is justified for all perturbations of a continuous vector field which is continuous in x uniformly with respect to t. This assumption is of course less restrictive than Lipschitz continuity with respect to x.

20.3.1

KBM vector fields

Let U0 be an open subset of IRd . The continuous vector field fo : IR+ x U0 ---+ JRd is said to be a Krilov-Bogolyubov-Mitropolski (KBM) vector field if it satisfies the following conditions Definition 2

1.

The function x ---+ fo ( t, x) is continuous in x uniformly with respect to the variable t.

2. For all x E Uo the limit F(x) 3.

=

=

limr--+ = � J0T fo(t, x)

dt exists.

The averaged equation y(t) F(y(t)) has the uniqueness of the solution with prescribed initial condition.

20.

292

Averaging

Notice that, in the previous definition, conditions (1) and (2) imply that the function F is continuous, so that the averaged equation considered in condition (3) is well defined. In the case of non autonomous ODEs, the definition of a perturbation given in Section 20.2 must be stated as follows. Definition 3 Let U0 and U be open subsets of IRd . A continuous vector field f : IR+ x U ---+ IRd is said to be a perturbation of the standard continuous vector x

field fo : IR+ Uo ---+ JRd if U contains all the nearstandard points in Uo , and f(s, x) fo (s, x) for all s E IR+ and all nearstandard x in Uo. c:::-

Theorem 2 Let fo : IR+ x U0 ---+ IRd be a standard KBM ao E Uo be standard. Let y( t) be the solution of the IVP

y (t)

=

F(y(t)),

y(O)

=

vector field and let (20.3.4)

ao,

defined on the inteTval [0, w[, 0 < w < oo. Let f IR+ U ---+ JRd be a perturbation of fo. Let c > 0 be infinitesimal and a ao. Then every solution x(t) of the IVP (20.3.5) x(O) = a, x(t) f (tjr:: , x(t)) ' is a perturbation of y ( t), that is, for all nearstandard t in [0, w [, x( t) is defined and satisfies x(t) y(t). c:::-

:

x

=

c:::-

The proof, in the particular case of almost periodic vector fields, is given in Section 20.3.4. The proof in the general case is given in Section 20.3.5.

20.3.2

Almost solutions

The notion of almost solution of an ODE is related to the classical notion of c- almost solution.

function x( t) is said to be an almost solution of the standard differential equation x G(t, x) on the standard interval [0, L] if there exists a finite sequence 0 to < < t N + 1 L such that for- n 0, , N we have

Definition 4 A

=

=

c:::-

·

·

·

tn+ l tn, x(t) and x(tn+ l ) - x(tn) tn+ l - tn

=

=

·

c:::-

x(tn) for t E [t n , t n+ l ] ,

c:::-

G(tn , x(tn )).

·

·

The aim of the following result is to show that an almost solution of a standard ODE is infinitely close to a solution of the equation. This result which was first established by .J. L. Callot (sec [1 1 , 27]) is a direct consequence of the nonstandard proof of the existence of solutions of continuous ODEs [26] .

293

20.3. Averaging in ordinary differential equations

Theorem 3 If x ( t) is an almost solution of the standard differential equation x = G(t, x) on the standard interval [0, L] , x(O) y0, with y0 standard, and the IVP y = G(t, y), y(O) = y0, has a unique solution y(t), then y(t) is defined at least on [0, L] and we have x(t) y(t), for all t E [0, L] . "='

"='

D

Proof. See [ 1 1 , 36]

Let us apply this theorem to obtain an averaging result for an ODE which does not satisfy all the hypothesis of Theorem 2. Consider the ODE (see [ 1 1 , 27. 36] )

tx . . X. ( t ) = Sln c

(20.3.6)

The conditions (2) and (3) in Definition 2 are satisfied with F(x) = 0. Thus, the solutions of the averaged equation arc constant. But condition ( 1 ) of the definition is not satisfied, since the function f(t, x) = sin (tx) is not continuous in x uniformly with respect to t. Hence Theorem 2 docs not apply. In fact the solutions of (20.3.6) are not nearly constant and we have the following result: Proposition 2 If c > 0 is infinitesimal then, in the region t 2 x > 0 solutions of (20. 8. 6) are infinitely close to hyperbolas tx = constant . In region x > t 2 0, they are infinitely close to the solutions of the ODE

the the

2 2 (20.3. 7) x = G(t, x) , where G(t, x) = vlx - t - x . t Proof. The isocline curves h = { ( t, x) : tx = 2k7rc} and I� = { ( t, x) : tx = (2k + �) 7rc} define, in the region t 2 x > 0, tubes in which the trajectories are trapped. Thus for t 2 x > 0 the solutions are infinitely close to the hyperbolas tx = constant. This argument does not work for x > t 2 0. In this region, we consider the microscope

T

=

t - tk

X

=

c ' where ( tk , X k ) are the points where a solution h· Then we have

X - Xk c

x( t)

' of (20.3.6) crosses the curve

X(O) = 0. By the Short Shadow Lemma (Theorem 1 ) , X(T) is infinitely close to a solution of dXjdT = sin (x k T + t k X). By straightforward computations we have

X -X tk + l - tk

k k+l '------

":::

G(t k , X k ) .

Hence, in the region x > t 2 0, the function x( t) is an almost solution of the ODE (20.3. 7) . By Theorem 3, the solutions of (20.3.6) are infinitely close to the solutions of (20.3.7). D

20.

294

20.3.3

Averaging

The stroboscopic method for ODEs

�+

In this section we denote by G : x D ----+ function, where D is a standard open subset of function such that 0 E I C Definition 5 We say that respect to G if there exists

�+.

�d.�d

�d

a standard continuous Let x : I ----+ be a

x satisfies the Strong Stroboscopic Property with > 0 such that for every positive limited t0 E I with x(to) nearstandard in D, there exists t1 E I such that f-L < t1 - to 0, [to , t1] c I, x(t) x(to) for all t E [to, t1], and f-L

c::::

c::::

x(t l ) - x(to) G ( t0, x t0 )) . ( t1 - to The real numbers t0 and t1 are called successive instants of observation of the �

--------'-'------'---'- _

stroboscopic method. Theorem 4 (Stroboscopic Lemma for ODEs) Let a0 E D be standar-d. Assume that the IVP y(t) G (t, y(t)), y(O) a0, has a unique solution y defined on some standard interval [0, L] . Assume that x(O) a0 and x satisfies the Strong Stroboscopic Property with respect to G. Then x is defined at least on [0, L] and satisfies x(t) y(t) for all t E [0� L] . =

=

c::::

c::::

Proof. Since x satisfies the Strong Stroboscopic Property with respect to G, it is an almost solution of the ODE x = G(t, x). By Theorem 3 we have x(t) c:::: y(t) for all t E [0, L] . The details of the proof can be found in [36] . D The Stroboscopic Lemma has many applications in the perturbation theory of differential equations (see [11, 32, 35, 36� 38 , 39]). Let us use this lemma to obtain a proof of Theorem 2.

20.3.4

Proof of Theorem 2 for almost periodic vector fields

·,

Suppose that fo is an almost periodic in t, then any of its translates fo (s + x0) is a nearstandard function, and fo has an average F which satis fies [16, 28, 33, 41]

s+Tfo (t, x) dt, 1 T-->rx> T s s E �+ · F 1 1 s+ T F(x) T s fo (t, x) dt,

F(x) uniformly with respect to

=

1

lim -

Since

c:::: -

is standard and continuous, we have

(20.3.8)

20.3.

295

Averaging in ordinary differential equations

IR

for all s E + , all T ::::::' oo and all nearstandard x in U0 . Let x : I ---+ U be a solution of problem (20.3.5) . Let t0 be an instant of observation: t0 is limited in I, and x0 = x(to) is nearstandard in U0 . The change of variables

X transforms

(20.3.5)

=

_x---'-(t o + cTc) _- x_o ' c _ _ __

_

into

dX/dT

=

f(s + T, xo + eX) ,

s

where

=

tojc.

By the Short Shadow Lemma ( Theorem 1 ) , applied to g(T, X) f(s + T, xo + eX) and go (T, X) fo ( s + T, xo) , for all limited T > 0, we have X(T) ::::::' J0Y fo (s + r, xo)dr. By Robinson's Lemma this property is true for some unlimited T which can be chosen such that cT 0. Define t1 t0 + cT. =

=

::::::'

Then we have

=

1

l

+ y 1 s Y ::::::' -1 0 fo( s + r, xo) dr = T s fo (t, xo) dt ::::::' F(xo). T T Thus x satisfies the Strong Stroboscopic Property with respect to F. Using the Stroboscopic Lemma for ODEs ( Theorem 4) we conclude that x(t) is infinitely close to a solution of the averaged ODE (20.3.4). x(t l ) - x(to) t1 - to

----'----'---'----'- =

20.3.5

X ( T)

--

Proof of Theorem 2 for KBM vector fields

Let fo be a KBM vector field. From condition (2 ) of Definition 2 we deduce + that for all s E + , we have F(x) = limy_, = � fss Y fo (t, x) dt, but the limit is not uniform on s . Thus for unlimited positive s , the property ( 20.3.8 ) docs not hold for all unlimited T, as it was the case for almost periodic vector fields. However, using also the uniform continuity of fo in x with respect to t we can show that (20.3.8 ) holds for some unlimited T which are not very large. This result is stated in the following technical lemma [36] .

IR

IR

IRd

Let g : + x M ---+ be a standard continuous function where M is a standard metric space. We assume that g is continuous in m E M uniformly with respect to t E + and that g has an average G(m) limy_, = � J0Y g(t, m) dt. Let c > 0 be infinitesimal. Let t E + be limited. Let m be nearstandard in M . Then there exists a > c, a 0 such that, for all limited T 2 0 we have + 1 s YS g(r, m) dr ::::::' T G ( m ) , where s = tjc, S = o:jc. Lemma 1

S

1 8

IR

::::::'

IR

=

20.

296

2

The proof of Theorem found also in [36] .

Averaging

needs another technical lemma whose proof can be

Let g IR+ JRd JRd and h IR+ JRd be continuous functions. Suppose that g(T, T X) h(T) holds for all limited T E IR+ and all limited X E JRd , and J0 h ( r) dr is limited for all limited T E IR+ . Then, any solution X(T) of the IVP dXjdTT g(T, X), X(O) 0, is defined for all limited T E IR+ and satisfies X(T) "=' J0 h(r) dr. Lemma 2

:

:

----+

x

"='

=

----+

=

Proof of Theorem 2. Let x : I ----+ U be a solution of problem (20.3.5). Let t0 E I be limited, such that x0 = x(t0) is nearstandard in U0. By Lemma 1 , applied to g = fo, G = F and m = x(to), there is a > 0, a "=' 0 such that for all limited T 2 0 we have

T 1 rs + S fo (r, xo) dr "=' TF (xo), s .}s

where

s = to/c, S = o:jc.

(20.3.9)

The change of variables

X(T) transforms

(20.3.5)

=

x (to + o:T) - xo 0:

into

dXjdT

=

f(s + ST, xo + o:X).

By Lemma 2, applied t o g(T, X) = f(s and (20.3.9) , for all limited T >

+ ST xo + o:X) and h(T) = fo (s + 0, we have T s +TS X(T) "=' fo (s + Sr, xo) dr = -1 fo (r, xo) dr ":' TF (xo). 0 s s Define the successive instant of observation of the stroboscopic method t1 by t1 = to + a. Then we have ST, xo),

lo

1

x(t l ) - x(to) = X ( 1) "=' F(xo). t1 - to Since t1 - to = a > c and x(t) - x(to) = o:X(T) "=' 0 for all t E [to, t1] , we have proved that the function x satisfies the Strong Stroboscopic Property with respect to F. By the Stroboscopic Lemma, for any nearstandard t E [0 , w[, x(t) i s defined and satisfies x(t) "=' y(t). D

20.4.

297

Functional differential equations

20.4

Functional differential equations

Let C = C ([ - r, 0] , IRd ) , where r > 0, denote the Banach space of continuous functions with the norm 11 ¢ 11 = sup{ II ¢(B) II : e E [ - r, O] } , where 1 1 · 11 is a norm of JRd . Let L 2 t0. If x : [-r, L] ---+ IRd is continuous, we define Xt E C by setting Xt (B) = x(t + B), e E [ - r, OJ for each t E [0, L] . Let g : JR + x C ---+ lRd , (t, u) f--+ g(t, v. ) , be a continuous function. Let ¢ E C be an initial condition. A Functional Differential Equation (FDE) is an equation of the form

x(t) = g (t, xt) ,

xo = ¢.

This type of equation includes differential equations with delays of the form =

x(t) G(t, x(t), x(t - r) ) , where G : IR+ IRd IRd ---+ JRd . Here we have g(t, u ) G(t, u(O) , u( - r)). The method of averaging was extended [13, 22] to the case of FDEs x

x

=

the form where

of

(20.4. 1)

c is a small parameter.

In that case the averaged equation is the ODE

(20.4.2) where F is the average of of the form

f.

It was also extended

x(t)

=

f (t/c, x t ) .

[14]

to the case of FDEs

(20.4. 3)

In that case the averaged equation is the FDE

(20.4.4) Notice that the change of variables x(t) = z(t/c) does not transform equa tion (20.4. 1) into equation (20.4.3) , as it was the case for ODEs (20.3.3) and (20.3.1), so that the results obtained for (20.4. 1) cannot be applied to (20.4.3) . In the case of FDEs of the form (20.4. 1) or (20.4.3), the clas sical averaging theorems require that the vector field f is Lipschitz continuous in x uniformly with respect to t. In our approach, this condition is weak ened and we only assume that the vector field f is continuous in x uniformly with respect to t. Also in the classical averaging theorems it is assumed that the solutions z(T, c) of (20.4. 1) and y(T) of (20.4.2) exist in the same interval [0, T /c] or that the solutions x(t, c) of (20.4.3) and y(t) of (20.4.4) exist in the same interval [0, T] . In our approach, we assume only that the solution of the averaged equation is defined on some interval and we give conditions on the vector field f so that, for c sufficiently smalL the solution x(t, c) of the system exists at least on the same intervaL

298

20. Averaging

20.4.1

Averaging for FDEs in the form z'(r)

=

E:f (r, Z-r )

c is a small parameter zo c/J . The change of variable x(t) z(t/c) transforms this equation in (20.4. 5) x(t) f (tjc, Xt ,c ) , x(t) c/J(t/c), t E [ - cr, 0] , where Xt , E E C is defined by Xt ,E (B) x( t + cB) for B E [ - r, 0] . Let f IR+ x C -+ JRd be a standard continuous function. We assume that (H1) The function f : u f (t, u) is continuous in u uniformly with respect to the variable t. (H2) For all u E C the limit F(u) limr-Hx> � J0T f (t, u) dt exist s. We identify IRd to the subset of constant functions in C , and for any vector c E JRd , we denote by the same letter, the constant function u E C defined by u(B) c, e E [ - r, 0] . Averaging consists in approximating the solutions x(t, c) of (20.4. 5) by the solution y(t) of the averaged ODE (20.4. 6) y(t) F (y(t)) , y(O) ¢(0) . According to our convention , y(t) , in the right-hand side of this equation. is the constant function u t E C defined by ut (e) y(t), e E [-r, O] . Since F is We consider the IVP, where

=

=

=

=

=

:

f--+

=

=

=

=

=

continuous, this equation is well defined. We assume that

( H 3)

The averaged ODE (20.4.6) has the uniqueness of the solution with pre scribed initial condition.

(H4) The function f is quasi-bounded in the variable u uniformly with respect to the variable t, that is, for every t E IR+ and every limited u E C, f (t, u) is limited in JRd . Notice that conditions (H1), (H2) and ( H 3) are similar to conditions (1 ) , (2) and (3) of Definition 2 . In the case of FDEs we need also condition (H4) . In classical words, the uniform quasi boundedness means that for every bounded subset B of C, f (IR+ x B) is a bounded subset of IRd . This property is strongly related to the continuation properties of the solutions of FDEs (see Sections 2 . 3 and 3. 1 o f [1 5] ) . Theorem 5 Let f : IR+ x C -+ IRd be a standard continuous function satisfying

the conditions (Hl ) - (H4). Let cjJ be standard in C. Let L > 0 be standard and let y : [0, L] -+ IRd be the solution of (20.4. 6). Let c > 0 be infinitesimal. Then every solution x(t) of the problem {20. 4 . 5) is defined at least on [- cr. L] and satisfies x(t) "::: y(t) for all t E [0 , L] .

20.4.

299

Functional differential equations

20.4. 2

The stroboscopic method for ODEs revisited

In this section we give another formulation of the stroboscopic method for ODEs which is well adapted to the proof of Theorem 5. Moreover , this formulation of the Stroboscopic Method will be easily extended to FDEs ( see Section 20.4.4) . We denote by G : IR x IRd ---+ IRd , a standard continuous function. Let x : I ---+ JRd be a function such that 0 E I C IR .

+

+

We say that x satisfies the Stroboscopic Property with respect to G if there exists f-L > 0 such that for every positive limited t0 E I, satisfying [0, t0] C I and x(t) is limited for all t E [0, t0] , there exists t1 E I such that f-L < t1 - to "::: 0, [to, t1] c I, x(t) "::: x(to) for all t E [to , t1], and

Definition 6

x(t l ) - x(to) t l - to

----'-----'------'.----'-

_

�

G ( t0, x ( to )) .

The difference with the Strong Stroboscopic Property with respect to G con sidered in Section 20.3.3 is that now we assume that the successive instant of observation t1 exists only for those values t0 for which x(t) is limited for all t E [0, t0] . In Definition 5, in which we take D = IRd , we assumed the stronger hypothesis that t1 exists for all limited to for which x(to) is limited.

Let ao E D be standaTd. Assume that the IVP y(t) G (t, y(t)), y(O) ao, has a unique solution y defined on some standard interval [0, L] . Assume that x(O) "::: ao and x satisfies the Stroboscopic Property with respect to G. Then x is defined at least on [0, L] and satisfies x(t) y(t) for all t E [0, L] .

Theorem 6 (Second Stroboscopic Lemma for ODEs) =

=

c:::-

Proof. Since x satisfies the Stroboscopic Property with respect to G, it is an almost solution of the ODE x = G(t, x). By Theorem 3 we have x(t) "::: y(t) for all t E [0, L] . The details of the proof can found in [19] or [21] . D Proof of Theorem 5. Let x : I ---+ JRd be a solution of problem (20.4.5). Let t0 E I be limited, such that x(t) is limited for all t E [0, t0] . By Lemma applied to g = f, G = F and the constant function m = x(to), there is a > a "::: 0 such that for all limited T 2: 0 we have

1 S

ls+TS f(r, x(to)) dr "::: TF(x(to)), s

where

s = to/c, S = o:jc.

(20.4. 7)

Using the uniform quasi boundedness of f we can show ( for the details see or [21]) that x(t) is defined and limited for all t "::: t0 . Hence the function e

X ( , T)

=

x(to + aT + cB) - x(to) , B E [-r, O] . T E [0, 1] , 0:

1, 0,

[19]

20.

300

is well defined. In the variable

X ( - , T)

(20.4.5)

system

Averaging

becomes

ax (0, T) = f(s + ST, x(to) + aX( - , T)). oT Using assumptions (H1 ) and (H4) together with (20.4. 7), we obtain after some computations that for all T E [0, 1] , we have

X (O, T)

"'='

{T f(s + Sr, x(to)) dr = 1 1s+TSf (r, x(t0)) dr

Jo

S

8

"'='

TF(x(t0)).

Define the successive instant of observation of the stroboscopic method t1 = to + a. Then we have

t1

by

x(t i ) - x(to) X (O, 1 ) F(x(to)). t l - to Since t1 - to a > c and x(t) - x(to) aX(O, T) 0 for all t E [to, t1 ] , we have proved that the function x satisfies the Stroboscopic Property with respect to F. By the Second Stroboscopic Lemma for ODEs, for any t E [0, L] , D x(t) is defined and satisfies x(t) y(t). "'='

=

=

"'='

=

"'='

20.4.3

Averaging for FDEs in the form x (t)

We consider the IVP, where

x(t)

=

c

=

is a small parameter

f (tj c , Xt ) ,

xo

=

f (tjE:, Xt )

(20.4.8 )

¢,

We assume that f satisfies conditions ( H1 ) , ( H2 ) and ( H4 ) of Section Now, the averaged equation is not the O D E (20.4.6), but the FDE

y(t) = F ( Yt ) ,

20.4. 1.

(20.4.9)

Yo = ¢.

Averaging consists in approximating the solutions x(t, c ) of ( 20.4.8 ) by the solution y(t) of the averaged FDE (20.4.9). Condition (H3) in Section 20. 4. 1 must be restated as follows

(H3)

The averaged FDE ( 20.4.9 ) has the uniqueness of the solution with pre scribed initial condition.

d Theorem 7 Let f : IR+ C ---+ IR be a standard continuous function satisfying the conditionsd (Hl)-(H4) . Let ¢ be standard in C. Let L 0 be standard and let 0 be infinitesimal. y : [0, L] IR be the solution of problem (20.4.9). Let x

---+

>

c >

Then every solution x(t) of the problem {20.4.8) is defined at least on [ - r, L] and satisfies x(t) y(t) for all t E [-r, L] . "'='

20.4.

301

Functional differential equations

2 0.4.4

The stroboscopic method for FDEs

Since the averaged equation (20.4. 9) is an FDE, we need an extension of the stroboscopic method for ODEs given in Section 20.4.2. In this section we denote by G : IR x C ---+ IRd , a standard continuous function. Let x : I ---+ IRd be a function such that [ - r, OJ C I C IR .

+

+

We say that x satisfies the Stroboscopic Property with respect to G if there exists fJ· > 0 S1Lch that for every positive limited to E I, satisfying [0, to] C I and x(t) and G(t , Xt) are limited for all t E [0 , to] , there exists t 1 E I such that fJ < t1 - to 0, [to, t1] C I, x(t) x(to) for all t E [to, t1] , and

Definition 7

c:::-

c:::-

x(t x(to) ------'- l )---'- - -----'- ---'- - G ( to, X to ) . t1 - to �

Notice that now we assume that the successive instant of observation t1 exists for those values t0 for which both x ( t) and G ( t, xt ) are limited for all t E [ 0, t0 ] . In the limit case r = 0 , the Banach space C is identified with IRd and the function X t is identified with x(t) so that , G(t , X t ) is limited, for all limited x(t) . Hence the "Stroboscopic Property with respect to G " considered in the previous definition is a natural extension to FDEs of the "Stroboscopic Property with respect to G " considered in Definition 6.

Let ¢ E C be standard. A s sume that the IVP y(t) G (t, Yt ), Yo ¢, has a unique solution y defined on some standard interval [ -r, L] . Assume that the function x satisfies the Stroboscopic Property with respect to G and x0 c:::- ¢. Then x is defined at least on [-r, L] and satisfies x(t) y(t) for all t E [-r, L] .

Theorem 8 (Stroboscopic Lemma for FDEs) =

=

c:::-

Proof. Since x satisfies the Stroboscopic Property with respect to G, it is an almost solution of the FDE x = G(t, Xt ) · For FDEs, we have to our disposal an analog of Theorem 3. Thus x ( t) c:::- y ( t) for all t E [0, L] . The details of the proof can found in [19] or [21] . D Proof of Theorem 7. Let x : I ---+ JRd be a solution of problem (20.4.8) . Let t0 E I be limited, such that both x(t) and F(xt ) are limited for all t E [0, t0] . From the uniform quasi boundedness of f we deduce that x( t) is S-continuous on [0, t0] . Thus Xt is nearstandard for all t E [0, t0] . By Lemma 1 , applied to g f, G F and m Xt 0 , there is o: > 0, o: c:::- 0 such that for all limited T 2 0 we have =

=

1

S

1s +TS f (r, X 8

=

t 0 ) dr

c:::-

T F ( Xt0 ) ,

where

s

=

to/c , S

=

o:jc.

(20.4.10)

20.

302

Averaging

Using the uniform quasi boundedness of f we can show ( for the details see or [2 1]) that x(t) is defined and limited for all t ,--- t0. Hence the function

[19]

X ( e , T ) = x(to + aT + B) - x(to + B) , B E [-r, 0] , T E [0, 1] , a is well defined. In the variable X ( · , T) system (20.4.8) becomes ax · oT (0, T) = f(s + ST, X t0 + aX( , T) ). Using assumptions ( H 1) and (H4) together with (20.4.10), we obtain that for all T E [0, 1] , we have

{T

1 X(O, T) ,--- J f (s + Sr, xt 0 ) dr = s o

1s+ TS f (r, x ) dr ,--- TF (xt ) . 0 t0 s

Define the successive instant of observation of the stroboscopic method t1 = to + a. Then we have

t1

by

x(t l ) - x(to) X (O, 1) "-' F ) (x to t1 - to Since t1 - to a > c and x(t) - x(to) aX (O, T) 0 for all t E [to, h ] , we have proved that the function x satisfies the Stroboscopic Property with respect to F. By the Stroboscopic Lemma for FDEs, for any t E [0, L] , x ( t ) is defined and satisfies x(t) ,--- y(t) . D =

=

=

,---

References

[1]

V . I . ARNOLD ( ed. ) , Dynamical Systems Sciences, Vol. 5, Springer-Verlag, 1994.

V, Encyclopedia of Mathematical

[2]

H . BAR REAU and .J . HARTHONG (editeurs ) , dard, Editions du CNRS, Paris, 1989.

La mathematique Non Stan

[3]

E. BENOIT ( Ed. ) , Dynamic Springer-Verlag, Berlin, 1991.

Proceedings, Luminy

[4]

I . P . VAN DEN BERG , Nonstandard Asymptotic in Math. 1249, Springer-Verlag, 1987.

Bifurcations,

Analysis,

1990,

Lectures Notes

[5] N . N . BOGOLYUBOV and Yu. A . I\IITROPOLSKI, Asymptotic Methods in the Theory of Nonlinear Oscillations, Gordon and Breach, New York, 1961.

303

References

[6]

J . W . DAUBEN , Abraham Robinson, The Creation of Nonstandard Anal ysis, A Personal and Mathematical Odyssey, Princeton University Press, Princeton, New Jersey, 1995.

[7]

F. DIENER and M . DIENER (eds . ) , Universitext , Springer-Verlag, 1995.

[8]

F. DIENER and G . REEB,

[9]

M. DIENER and C . LOBRY (editeurs) , Analyse non tation du reel, OPU (Alger) , CNRS (Paris) , 1985.

Nonstandard Analysis in Practice,

Analyse non standard,

Hermann,

1989.

standard et repnisen

[10] M . DIENER and G. WALLET (editeurs) , Mathematiques finitaires et anal yse non standard, Publication mathematique de l'Universite de Paris 7, Vol. 31-1, 31-2, 1989. [11]

J. L. CALLOT and T. SARI , Stroboscopie et moyennisation dans les sys tcmes d'cquations diffcrcntielles a solutions rapidement oscillantes, in

Mathematical Tools and Models for Control, Systems Analysis and Sig nal Processing, vol. 3, CNRS Paris, 1983.

[12]

A . FRUCHARD and A. TROESCH (editeurs), Colloque Trajectorien a la memoire de G. Reeb et J. L. Callot, Strasbourg-Obernai, 12-16 juin 1995, Prepublication de l'IRtvi A, Strasbourg, 1995.

[13]

J . K . HALE, "Averaging methods for differential equations with retarded arguments and a small parameter", .J. Differential Equations, 2 (1966)

57- 73.

[14] [15]

J . K . HALE and S . M . VERDUYN LUNEL, "Averaging in infinite dimen sions", J. Integral Equations Appl . , 2 (1990) 463-494. .T . K . HALE and S . M . VERDUYN LUNEL , lntmduction to Functional Dif Applied Mathematical Sciences 99, Springer-Verlag,

ferential Equations, New York, 1993.

[16]

L . D . KLUGLER, Nonstandard Analysis of almost periodic functions, in Applications of Model Theory to Algebra, Analysis and Probabzhty, W.A.J. Luxemburg ed. , Holt, Rinehart and Winston, 1969.

[17]

M. LAKRIB, "The method of averaging and functional differential equa tions with delay", Int. J . Math. Sci. , 26 (2001) 497-511.

[18]

M . LAKRIB, " On the averaging method for differential equations with delay", Electron. J. Differential Equations, 65 (2002) 1-16.

304

20. Averaging

Stroboscopie et moyenmsation dans les equations differen tielles fonctionnelles a retard, These de Doctorat en Mathematiques de

[19] 1\ I . LAKRIB,

l' Universite de Haute Alsace, Mulhouse, 2004. [20] 1\ I . LAKRIB and T. SARI, "Averaging results for functional differential equations", Sibirsk. Mat. Zh. , 45 ( 2004) 375-386; translation in Siberian Math. J . , 45 (2004) 3 1 1-320. [2 1] 1\ I . LAKRIB and T. SARI , Averaging Theorems for Ordinary Differential Equations and Retarded Functional Differential Equations. http : //www . math . uha . fr/ps/20050 1 lakrib . pdf [22] B . LEHMAN and S . P . WEIBEL . "Fundamental theorems of averaging for functional differential equations", J . Differential Equations, 152 ( 1 999) 160-190. [23] C. LOBRY, 1 989.

Et pourtant . . . ils ne remplissent pas N !,

Aleas Editeur, Lyon,

[24] C. LOBRY, T. SARI and S. TOUHAMI, " On Tykhonov's theorem for con vergence of solutions of slow and fast systems", Electron. J. Differential Equations, 19 ( 1998) 1-22. [25] R. LUTZ and M . GOZE, Nonstandard Analysis: a practical guide applications, Lectures Notes in Math. 881, Springer-Verlag, 1 982.

with

[26] E . NELSON, "Internal Set Theory", Bull. Amer. 1\Iath. Soc . , 83 ( 1 977) 1 165- 1 1 98. [27] G. REEB, Equations differentielles et analyse non classique ( d'apres .J . L. Callot) , in Pmceedings of the 4th International Colloquium on Differ ential Geometry ( 1 978) , Publicaciones de la Universidad de Santiago de Compostela, 1 979. [28] A. RoBINSON , " Compactification of Groups and Rings and Nonstandard Analysis". Journ. Symbolic Logic, 34 ( 1 969) 576-588. [29] A. ROBINSON. 1 974.

Nonstandard Analysis,

American Elsevier, New York,

[30] .J .-l\1. SALANSKIS and H . SINACEUR (eds.) , Le Colloque de Cerisy, Springer-Verlag, Paris, 1992.

Labyrinthe du Continu,

[31] J . A . SANDERS and F. VERHULST, Averaging Methods in Nonlinear Dy namical Systems, Applied Mathematical Sciences 59, Springer-Verlag, New York, 1985.

305

References

[32]

T. SARI, "Sur la theorie asymptotique des oscillations non stationnaires", Asterisque 109-1 10 (1983) 141-1 58.

[33]

T. SARI , Fonctions presque periodiques, in Actes de l 'ecole d 'ete Analyse non standard et representation du reel. Oran-Les Andalouses 1984. OPU Alger - CNRS Paris, 1985.

[34]

T. SARI, General Topology, in Nonstandard Analysis in Practice, F. Di ener and M. Diener (Eds. ) , Universitext, Springer-Verlag, 1995.

[35]

T. SARI, Petite histoire de la stroboscopie, in Colloque Trajectorien a la Memoire de J. L. Callot et G. Reeb, Strasbourg-Obernai 1995. Publication IRMA, Univ. Strasbourg (1995) , 5-15.

[36]

T . SARI, Stroboscopy and Averaging, in Colloque Trajectorien a la Me moire de J. L. Callot et G. Reeb, Strasbourg-Obernai 1995, Publication IRMA, Univ. Strasbourg, 1995.

[37]

T. SARI, Nonstandard Perturbation Theory of Differential Equations, Ed inburgh, invited talk in International Congres in Nonstandard Analysis and its Applications, ICMS, Edinburgh, 1996. http : //www . math . uha . fr/sari/papers/i cms 1996 . pdf

[38]

T . SARI , Averaging in Hamiltonian systems with slowly varying parame ters, in Developments in Mathematzcal and Experimental Physics, Vol. C, Hydrodynamics and Dynamical Systems, Proceedings of the First Mexican Meeting on Mathematical and Experimental Physics, El Colegio Nacional, Mexico City, September 10-14, 2001 , Ed. A. Macias, F. Uribe and E. Diaz, Kluwer Academic/Plenum Publishers, 2003.

[39]

T . SARI and K . YADI, "On Pontryagin-Rodygin's theorem for convergence of solutions of slow and fast systems", Electron. J. Differential Equations, 139 (2004) 1-17.

[40]

I. STEWART,

[41]

K . D . STROYAN and W . A . .J . LUXEMBURG , infinitesimals, Academic Press, 1976.

[42]

I I I8

1987.

The Problems of Mathematics,

Oxford University Press,

Introduction to the theory of

rencontre de Geometrie du Schnepfenried, Feuilletages, Geometrie symplectzque et de contact, Analyse non standard et applications, Vol. 2, 10-15 mai 1982 , Asterisque 109-110, Societe Mathematique de France, 1983.

Path-space measure for stochastic differential equation with a coefficient of polynomial growth Toru Nakamura

A a-- additive measure over

*

Abstract

a spa.ce of paths

is constructed to give the so lution to the Fokker-Planck equation associated with a st.oehastic differ ential equation with coefficient fuuction of polynomial growth by making use of nonstauda.rd analysis.

21.1

Heuristic arguments and definitions

Consider a stochastic differential equation,

dx(t)

=

f (x(t)) dt + db(t),

(21 . 1 .1 )

where f ( x) is a real-valued function and b(t) a Browni an motion vvith vari an ce 2Dt for a time interval t. vVe wish to construct a m easme over a space of paths for (21 . 1 .1 ) . To my knowledge, the coefficient. fund.ion has been assumed to have at most linear growth, that is [.f(x) [ .::::; const· [ x [ for sufficiently lar ge x, otherwise some paths explode to infinity in finite tunes. As an example, let .f(x) = [ x [ l+6 (8 > 0) and define explosion t ime e for each continuous path x(t) by limt-->e-o x(t) = ±oo. Then, it is proved that P(e oo) < 1 and more strongly P( e = oo ) 0. For general case, see Feller's test for explosion in [1]. Despite the explosion, we shall consider f(x) of polynomial growth of an arbitrary order and define a measure over a space of paths. vVe use nonstandard analysis because it has a very convenient theory, Loeb measure theory [2, 3] , which enables us t.o construct. a standard O"-additive measure in a simple way. ·

=

=

*Department of Mat.hemat.ics. S undai Preparatory School, Kanda-S urugadai. Chiyoda kn, Tokyo 1 0 1-0062, .Japan.

21.1.

307

Heuristic arguments and definitions

Let us interpret (21 . 1 . 1) as a law for a particle momentum x = x(t) at time t with the force f ( x( t)) for drift and the random force db( t) / dt acting on the particle. By a time t > 0 some particles may disappear to infinity along the exploding paths, but others still exist with finite momentum so that they should make up a "probability" density of a particle momentum. We wish to usc the word "probability" though its total value may be less than 1 because of the disappearance of particles to infinity. Especially when f ( x( t)) is a repulsive force, that is, its sign is the same as that of x(t) , a particle can get larger momentum compared to the case where f (x(t)) is absent. However, once it gets very large momentum, it can hardly come back to the former one because the drift force f (x(t)) acts on the particle so as to increase its momentum. Thus, the particles which have gone to infinity by a time t > 0 could not contribute to the probability density at t. This consideration suggests that we can introduce a cutoff at an infinite number in momentum space if it is necessary in order to define a measure over a space of paths so that the probability density should be constructed by a path integral with respect to the measure. Eq. (21.1.1) gives the forward Fokker-Planck equation for the probability density U(t, x) of a particle momentum x at time t, 0

0 2 D o 2 U(t, x) - { J(x) U(t, x) } . (21 . 1 .2) ot ox ox We assume that the drift coefficient f(x) and the initial function U(O. x) satisfy

U(t. x)

=

the following conditions:

(A1)

For some natural number large x.

n

E N,

l f(x) l

:<::

const · l x l n

for sufficiently

(A2) f(x) E C 2 (IR) . (A3) f(x) 2 /(4D) + f'(x)/2 is bounded from below. We denote the min {f(x) 2 /(4D) + f'(x)/2 1 x E IR}.

bound by

c =

(A4) U(O, x) E C 2 (IR) and its support is a bounded set . Rewrite (21.1 .2) into a difference equation with infinitesimal time-spacing c and momentum-spacing 6 V'fDc using a forward difference quotient 0 1 0 U(t, x) ==? - { U(t + r:: , x) - U(t, x) } t c =

for time-derivative, and central ones 0

ox

1 U(t, x) ==? 26 { U(t, x + 6) - U(t, x - 6) }

21.

308

and

Path-space measure for stochastic differential equation

oEJ2x2 U(t, x) =? 1)21 { U(t, x + 6) + U(t, x - 6) - 2 U(t, x) }

for space-derivative. The result is

U(t + c, x) � { 1 + .f (x - 6)Jc/ (2D ) } U(t, x - 6) ( 21.1.3 ) + � { 1 + .f (X + 6) (- JcI (2D ) ) } u ( t ' X + 6) ' which indicates that the coefficients H 1 + .f ( .T - 6)Jc/( 2D )} should be as signed to the infinitesimal line-segment with end points ( t, x ( t) ) ( t, x ( t + c) - 6) and ( t + c. x(t + c) ) of a *-polygonal path x( s), and !{1 + .f (x + 6) (- Jc/ ( 2D ) ) } to the segment with end points ( t, x(t) ) (t, x(t + c) + 6) and ( t + c , x(t + c )) . Taking into account the approximation 1 .f (x(t) ) (±v 2Dc ) - 1 .f (x(t) ) 2 c log [ 1 + .f (x(t ) ) (± Jc/ ( 2D ) )] � 4D 2D 1 .f (x(t)) { x(t + c) - x(t)} - 1 .f (x(t)) 2 c , 4D 2D =

=

=

�

=

let us interpret the coefficients as

� { 1 + .f (x(t) ) (±Jc/(2D) ) }

� � exp [lt+c 2�.f (x( s) ) db ( s) - 1 t+c 4�.f ( x( s) ) 2 ds] .

The first integral on the right-hand side is the Ito-integral. Thus, we define *-path w, *-measure f-L for each w, and U(t, by

x)

Definition 1 (1)

(2)

v be [t/c] with Gauss ' parenthesis. For each internal function {0, 1 , · , v - 1} { - 1 , 1} and y E IR, define Xk by Xk y + ��:� o: ( i )6, and w by the *-polygonal path with vertices (0, y), ( c , x l ), (vc , xv)· Let a

:

·

·

=

---+

·

·

·

,

Define * -measure f-L by f-L (w)

=

1v exp [ rt 1 .f (w ( s) ) db (s ) - rt 1 .f (w ( s) ) 2 ds ( 21.1. 4) Jo 2D Jo 4D 2 ]

with the first integral in the exponent being the Ito integral, and U (t. x) L U(O, w(O) ) tL(w) ( 21.1.5 ) =

w

where the sum is taken over all w satisfying w( vc )

=

x.

21.2.

309

Bounds for the *-measure and the *-Green function

We note that the *-measure mula

[4].

21.2

( 21.1. 4)

corresponds to the Girsanov for

Bounds for the *-measure and the *-Green function

l\Iaking use of the I to formula t

1 f(w( s) ) db( s) -1 { F(w(t) ) - F(w(O)) } - � {t f' (w(s) ) ds 2D 2 }0 }0 2D where F'(x) f(x), we obtain a bound for f-L as f-L(w) 21v exp [ 2� {F(w(t) ) - F(w(O)) } - lot{ 4� J(w ( s) ) 2 + �j' (w(s)) }ds] (21.2.1) :::; 21v exp [ 2� {F(w (t)) - F(w(O) )} - ct] . The constant c in the last line was given in the assumption ( A3 ) . Let us fix the end points of w at finite numbers y and x, and consider a {

=

=

=

space of *-paths,

w(c[t/ cl) x } , and define the *-Green function for the interval [0, t ] by (21.2.2 ) Q(t, x : 0, y ) 216 L f-L (w) . wEP(t,x:O,y) Then, U(t, x) defined in ( 21. 1.5) is written as U (t, x) L U(O, y)Q(t, x : 0, y) 26. ( 21.2.3) y The infinitesimal spacing corresponding to dy is not 6 but 2 6 in ( 21.2.3 ) because only the paths that start every other point y could reach the end point x . Since 26 exp [- (x - y) 2 ] ( 1 + O (c l/2 ) ) "" 2_ _ (21.2. 4) v 4 l/ 4 D nD 2 2 t ( � t) wEP(t,x:O,y) P(t, x : 0, y )

=

{w

I

w(O) y =

and

=

=

=

'

the *-Green function is bounded as

F(y) Q(t, x : O, y) :::; exp [ F(x) 2D - 2D - ct ] 1 exp [ (x4-;�) 2 ] ( 1 + 0 (c l/2 )) , (21.2.5) (4nD t) 1 /2 x

21.

310

Path-space measure for stochastic differential equation

and hence

(21.2.6)

U(O , y )

Since the support of is assumed in (A4) to be a bounded set, the right-hand side of (21.2.6) is near-standard. To define a standard function as the standard part of we introduce two time scales of different orders, one for the Brownian motion, r:: , and the other, T, for the changes in the drift term in (21 . 1 . 1) ; we choose c = 0(T3), for example. The spacing c is finer and stands for the time-spacing of *-random walks, and T is long enough to cover many steps of the *-random walks, yet short enough for the change in the drift term to be small. Define a standard function t, as the standard part of the value of at the coarse-grained lattice point of time !;_:

U,

U

U

U ( x)

U(t,

x) = st U(i, x)

where

i = T [t/T] ,

(21.2.7)

which we expect to be the solution to the Fokker-Planck equation

21.3

(2 1 . 1 .2) .

Solution t o the Fokker-Planck equation

x)

In order to prove that U(t, in (21.2.7 ) is the solution to (21 . 1 .2 ) , or more concretely to estimate h in (2 1.3.7 ) below, we should truncate the *-paths at an infinite number as

A PA(i, x : O, y) = { w E P(t, x : O, y) l \is E [O, i] l w ( s)I < A } . The magnitude of A will be determined later in (21 .3.5) . Then the correspond ing *-Green function YA and UA arc defined by ( 21.3.1 ) YA(t, x : 0, y) = 261 L fl (w) wEPA (t,x: O , y)

and

UA(L x) = L U(O, y)QA(i, x : 0, y) 26. y

( 21.3.2 )

Let us first calculate the difference between ( 21 .2.2) and ( 21.3.1 ) . Consider a *-path 0, y) and put min { s 0, \ Define a *-path by turning upside down the section of the path for

w E P(t, x : y) PA(t, x : w'

t(w) =

I l w ( s) l = A}. w( s)

311

2 1 .3 . Solution t o the Fokker-Planck equation

0 0. P(t, x : O, y) \PA(t, x : O, y)

2A - y -2A - y

the interval :<.:: s :<.:: t (w ) so that w' should start or at time s = The section of w'(s) for t(w ) :<.:: s :<.:: t is that of w (s) for the same interval. In this way, we can define a one-to-one correspondence from wE to w' E Then by (2 1 .2.4) ,

P(t, x : 0, 2A - y) UP(t, x : 0. - 2A - y).

and hence

F(y) ] 1 l9(t, x : O, y ) - 9A(t, x : O, y) l :<; exp [ F(x) 2D - 2D - ct (4nDt) l/2 x (2 1 .3 . 3) { exp [ (2A - y - x) 2 ] + exp [ (- 2A - y - x) 2 J } x ( 1 + ( 1 ;2 ) ) 4D t

c

4D t

,

which implies

(21 .3.4)

x.

for any finite Now, we are ready to prove that wish to evaluate the standard part of

for an infinitesimal parameter as

A

a

=

� {U( t +

U(t, x)

a, X )

is the solution to (21 . 1 .2) . We

- U ( t, ) } X

kT (k E *N) given arbitrarily. Choose the truncation

(21.3.5 )

0,

where (3 > a standard constant, is just introduced to make the argument of logarithm dimensionless . Then by (21 .3.4) ,

UA (t, x)

for any standard natural number n, meaning that can be identified with up to negligible error. Therefore we shall hereafter deal with the truncated instead of Then the difference quotient we should calculate is

U(t, x) UA

U.

21.

312

Path-space measure for stochastic differential equation

where

In order to calculate the sum in h , we wish to expand the summand as power series in � except for the exponential function, which is possible if � is sufficiently small. Since

(21.3.9) satisfies

YA (f

+ a, x : f, x + �) :S exp [ 2� { F(x) - F(x + �) } - ca] �2 1 exp [ 4D a ] ' ( 4n D a ) 1 / 2 x

(21.3.10)

-

the summand in (21.3. 7) contains exp [- 4b ] as a factor which enables us to u restrict � to be sufficiently small, 1 � 1 in rough estimation. I n reality, it is restricted to =

for some small

a

such as

O (a 112 ),

1/10 as shown in the following.

Note that

(21.3.11) for some m E N by the truncation at A (A1) . Then by (21.3. 10) ,

=

O ( l log

,Ba l) and the assumption

21.3.

313

Solution t o the Fokker-Planck equation

(D /{3) ! ({Ja) !-a ,

e llog /3ul m ) (4Du . ] o (e- l log/Jul m ) .

in the last line the infinitely large number If I� I :2: In fact, can be controlled by the half of the factor e - e I

1 [ 8Da ] - [ 8({3a)2 a

exp - £ < exp Hence, the sum 2.:::�; in canst · 1�;"12(D I

(21 .3.7)

=

for such � is estimated as

�2 4( nDa) 112-;-c- [ 8Da ] d� o (aae - 8(P") 2a ) o(an ) /3)1/2 (/3u ) 1/2-a

J

1

-

exp

- --

I

=

=

for any n E N, so that it can be neglected. Now, we have only to consider � of infinitesimal order equal or less than so that we can expand

F(x) - F(x + �)

=

1 , a2-a -U(x) - I�2 f'(x) + O (a2-3a ). 3

2.

Next, we wish t o replace the integral in

(21 .3.9)

(21 .3.12)

by

l t+u f(w( s ) ) 2 ds + 1 lt+u f' (w(s) ) ds '"-' a J (x) 2 + a J' (x) . (21 .3.13) 4D ? 4D 1

2

t

t

It is justifiable if paths going very far at some time neglected. Let us take an infinitesimal

where a' is small but a ' > a , for example, Then, the sum over which satisfies bounded by

{ [ ( 4nDa)l l 2 exp

a

=

a

(2A' - �) 2 ] + exp [ ( 2A' + �) 2 ] } 4D a 4D a O (a- l l 2 e -(t3:)2a ) o (an ) =

_

can be

=

(21 .3.14)

=

for any n E N in the same way as used the fact � = = consider satisfying

w

[t , t + a]

2/1 0 if we take 1/1 0 . l w ( s) - x l :2: A' for some s E [t, t + a] is

w

canst

'

sE

(21.3.3), and hence negligible.

O (a! -a ) o (A') because

a

'

>

a.

l w ( s ) - xl :::; (D / f3) ! (f3a) !-a'

Here, we have Thus, we have only to

21.

314

for all

s

E

Path-space measure for stochastic differential equation

[t , t + a], and hence

[ 4D f(x) 2 - �2 f'(x) J 1 exp [- - 1 {f ( w ( s)) 2 - f ( x) 2 } ds 4D Ho- � 1 { f' (w(s)) ds - f'(x) } ds ] exp [- � f ( x) 2 - � j' ( x) J ( 1 + (a�- a' ) ) . 4

= exp - _5!___ x

=

Putting

:t

o-

(21.3. 15) =

0

(21 .3. 12), (21.3. 15) "'

� wEPA (t+ ,x:t,x+ l; ) o-

into

H

x

and

_1:_ 2 v' -

[-__r__ ] ( 1 + O (a3/ 2 ) ) exp / 4Da ( 4nD a) l 2 26

(2 1.3.9), we obtain e + a, x : f, x + �) - ( 4nDa1 ) l/2 exp [- 4Da J= { 2 2 Da � 2 - 2Da f(x) 2 o ( a ) = - � f(x) - � + f (x) + + } 4 D 8D2 2D 2 1 4�Da J · 4( nDa) 1 /2 exp [- -

YA (f

1

x

(21.3.16)

x

Lastly, we usc the expansion

(21.3.17) with for h . For h , we usc the expansion taken one step further,

with

21.3.

315

Solution to the Fokker-Planck equation

because (21.3.16) contains a factor of order O(a � -a ) , whereas there is no such factor in h . We shall not give the proof of these expansions, for it requires elementary but a little long estimations. Estimations similar to this case can be found in [5, 6, 7] . Putting (21.3. 16) and (21.3. 17) into (21 .3.7) and replacing the sum by integral which is permitted because 6 = o(T 31 2 ), we obtain h

J

=

{ - � j(x) - �2 + 2Da f (x) + �2 - 2Da f(x) 2 + o(a) }

� � � � ( D 1 ;9) 112 (iJu) 1/2-a x

4D

2D

1

8D 2

� 2 d� {uA (t, x) + �Do UA (t, x) } (4nD1a) 112 exp [ - 4Da J

{ ( - 2D� f(x) - �2 +4D2Da f (x) + �2 8D- 2Da 2) 2 j(x) UA (t . x) 1 2 1 � f(x)D ( x) o(a) } �2 d�. . -UA t + o [ 1 2D 4Da J (4nDa) 1 2 J

1

�� ( D 6) 1/2 (;Ju ) 1/2-a

exp

-

Since the domain of integration can be extended to *JR by the factor �2 1 (47rDu ) I /2 exp [ 4Du l '

-

h

=

- 2Da f(x) 2 ) UA ( x) t J { ( - 2D� J(x) - �2 +4D2Da f (x) + f,2 8D 2 1 e f(x)Do UA (t , x) + o(a) } exp [ - e J d� - 2D 4Da (4n Da) 1 1 2 a { f'(x)UA (t, x) + f(x)Do UA (t , x) } + o(a) . 1

* !R

(21 .3.19)

= -

Similarly, putting

(21 .3. 18)

into

(21 .3.8),

we obtain

(21 .3.20) Therefore,

� {UA (t + a, x) - UA(t x) } DDJ UA (t, x) - { f' (x)UA (t, x) + j(x)Do UA (t, x) } + o(1 ) =

(21 .3.21)

holds for any infinitesimal a kT, which means that U ( t, x) st UA (t , .1: ) is differentiable with respect to t and its t-derivative is the standard part of =

=

316

2 1 . Path-space measure for stochastic differential equation

the right-hand side of (2 1 .3.21 ) . Though we omit the proof, it is proved that U(t, x ) is twice differentiable with respect to x and their values are and Taking the standard part of the both-hand sides of (2 1 .3 . 2 1) , we obtain

(21.3.22) namely U (t, x ) is the solution to the Fokker-Planck equation (2 1 . 1 .2) . Let us finally note that, by Loeb measure theory, a standard a-additive measure over a space of paths can be derived from the *-measure f-L we have constructed in this paper so far.

References

[1]

H . P . M cK E AN , JR., London. 1969, (§ 3 . 6) .

Stochastic Integrals, Academic Press,

[2]

P . A . LOEB, "Conversion from nonstandard to standard measure spaces and applications in probability theory", Trans. Amer. lVIath. Soc . , 2 1 1 ( 1975) 1 13-1 22.

[3]

K. D. STROYAN and .J . l\1 . BAYOD , Foundations of infinitesimal stochastic Studies in Logic and the Foundations of Mathematics, North Holland. Amsterdam, 1986.

[4]

I. V. GrGRSANOV, " On transforming a certain class of stochastic processes by absolutely continuous substitution of measures", Theor. Prob. Appl . , 5

New York and

analysis,

( 1960) 285-301 .

[5]

T . NAKAMURA, " A nonstandard representation of Feynman's path inte grals", J. Math. Phys. , 32 (1991) 457-463.

[6]

T . NAKAMURA, "Path space measures for Dirac and Schri:idinger equations: Nonstandard analytical approach", J. Math. Phys., 38 (1997) 4052-4072.

[7]

T. NAKAMURA, "Path space measure for the 3+ 1-dimensional Dirac equa tion in momentum space", J. Math. Phys . , 41 (2000) 5209-5222.

Optimal control for N avier-Stokes equations Nigel J. Cutland

*

and Katarzyna Grzesiak

**

Abstract We survey receut results on existence of optimal controls for stochastic

Navier-Stokes equations in 2 and 3 dimensions using Loeb space methods.

22.1

Introduction

this paper we give a brief survey of recent results 1 concerning the exis tence of optimal controls for the stochastic Navier-Stokes equations (NSE) in a bounded domain D in 2 and 3 space dimensions; that is, D C JRd with d 2 or 3. The controlled equations in th eir most general form are as follows ( sec the uext section for details) : In

=

·n (t) = no +

{t { .fo

+ .lot

g

-

IJ

A v, ( ., )

-

B ( 'U ( s)) + f ( s, n( s) , O (s , v.) ) } d.s

(22.1. 1)

(s, u(s)) dw(s)

Here the evolving velocity field ·u = u(L, w) is a stochastic process with values in the Hilbert space H � V1(D) of divergence free functions with domain D; t his gives the (random) velocity 'lt(l, x, w) E JR.d o f the fluid a t any time t and point. x E D. The most general kind of control 0 that we consider act.s through the external forcing term f, and takes the form 0 : [0, T] x 1-l ----7 M wh ere 1-l is the space of paths in H and the control space Jv[ is a compact metric space. · Mathematics Department, University of York UK. ,

nc507@york . ac . uk

**Department of Finance, \;\/yzsza Szkola Biznesu, National-Louis University, Nowy Sacz. Poland. j ermakowicz@wsb-nlu . edu . pl

1The results reported here are developed f1·om the second author's PhD thesis [16] writ.t.en under the supervision of the first. author. Full details may be found in the papers 191 and 11 Oj.

318

22. Optimal control for Navier-Stokes equations

In certain settings however it is necessary to restrict to controls of the form : [0, T ] ----+ lvl that involve no feedback, or those where the feedback only takes account of the instantaneous velocity u( t). The terms vA, B i n the equations are the classical terms representing the effect of viscosity and the interaction of the particles of fluid respectively; the term B is quadratic in u and is the cause of the difficulties associated with solving the Navier-Stokes equations (even in the deterministic case g = 0) . The final term in the equation represents noisy external forces, with w denoting an infinite dimensional Wiener process . The methods involve the Loeb space techniques that were employed in [4, 6] to solve the stochastic Navier-Stokes equations with general force and multi plicative noise (that is, with noise g (s, u(s)) involving feedback of the solu tion u) , combined with the nonstandard ideas used earlier in the study of optimal control of finite dimensional equations (see [7] for example) . For the 3-d case it is necessary to utilize the idea of approximate solutions developed in [ 1 1 ] for the study of attractors. We assume a fixed time horizon for the problem, and then the aim is to establish the existence of an optimal control that minimizes a general cost J that has a running component and a terminal cost, modelled by

e

J(B , u) = JE

(lT h (t, u(t) , B(t, u)) dt + h (u(T) ) ) .

(22 . 1 .2)

The existence of an optimal control (in some cases, a generalized or relaxed control) can be established in a number of settings. The results for d = 2 are somewhat stronger than for d = 3, which is a reflection of the well-known distinction between these two cases even for the uncontrolled equations: in di mension d = 2 there is uniqueness of solutions and the solutions are strong ( es sentially this means that the field u(t, x ) is differentiable in x for each t) whereas for d = 3 the solutions that exist for all time are weak and uniqueness is a ma jor open problem2 . Consequently even the formulation of the optimal control problem is more difficult, and optimal controls are obtained using our methods for more restricted classes of controls compared to the results for d = 2. The plan of the paper is as follows. First (Section 22.2) we provide some details of the Hilbert space setup for the equations and their solution, and then recall the basic nonstandard ideas concerning controls. The results for dimen sion d = 2 are then outlined in Section 22.3, and in the final section we do the same for d = 3. We omit proofs, referring the interested reader to [9] and [10] , although in some cases where new ideas are introduced we provided sketches. Alternative approaches to control theory for stochastic Navier-Stokes equa tions have been studied in [3, 13, 18] , where the controls are assumed to be 2

In fact this is o n e o f the Millennium problems. Uniqueness

but for d

=

�s

known for strong solutions

3 �uch �olution� can only be obtained for �hart timco.

319

22.2. Preliminaries stochastic processes

e

:

[o, T]

x

n

____, H

adapted to the information filter given by complete observations of the fluid velocity. In [3] for example, the author assumes that there is a constant p > 0 such that l e(t, w) l < p for all t E [O, T] and w E D, while in [13, 18] it is assumed that the controls satisfy the condition IE

(loT l e(t) l 2 dt)

< 00 .

In these papers the control is subject to the action of a linear operator. The stochastic minimum principle is derived providing a necessary condition for an optimal control, and a dynamic programming approach is presented to give a sufficient condition. It is shown that the minimum value function is a viscosity solution to the Hamilton-Jacobi-Bellman equation associated with the problem. In [13] the authors obtain smooth solutions to the H.TB equation which justifies the dynamic programming approach.

22.2 22.2.1

Preliminaries Nonstandard analysis

For optimal control theory, nonstandard analysis is a natural tool to use because one can always find a "nonstandard" optimal control - this is simply eN for any infinite N, where (en)nEN is a minimizing sequence of controls. The task then is to see whether e N can be transformed into a standard optimal control of some kind. For finite dimensional DEs and SDEs this idea was developed in [7] and related papers. In the study of the Navier-Stokes equations (particularly the stochastic ver sion) Loeb space techniques have proved very powerful, providing for example the first general existence proof for the stochastic Navier-Stokes equations in dimensions up to 4 (see [6] ) and more recently new results concerning the existence of attractors (see [5] , [ 1 1] and [ 1 2] ) . I n the work reported here the nonstandard techniques used in these two areas are combined. We work in a standard universe V V(S ) where S is a base set that contains all the objects of interest, and take an N 1 -saturated extension *V(S ) C V(*S) . For ease of reference we gather together in an Appendix the most important facts about the nonstandard representation of the spaces used in the study of the NSE equations.

=

22.2.2

The stochastic Navier- Stokes equations

The classical form of the uncontrolled stochastic Navier-Stokes equations with zero boundary condition on a time interval 0 :<:: t :<:: T is as follows:

22 . Optimal control for Navier-Stokes equations

320

{

du = { v �u - < div u = 0 u lan = 0 u(O, x)

=

u. \7 > u - \7p + j(t, u) } dt + g ( t, u)dw(t)

uo (x)

These equations are considered in a bounded domain D C IRd ( d = 2 , 3) which is fixed throughout the paper, with the boundary oD of class C 2 . Here u : [0, T] X D X n ----+ JR3 is the random velocity field, v is the viscosity, p is the pressure, and f represents external forces; u0 is the initial condition. The diffusion term g together with the driving Wiener process w represents additional random external forces, or noise. Underlying this model is a filtered probability space 0 = (D, :F, (:Ft ) t >o , P) . For the conventional Hilbert space formulation, write L 2 ( D ) = ( L 2 ( D )) d and let V = { u E Crf' (D, IRd ) : div u = 0 } . Then H is the closure o f V i n L 2 ( D ) with the norm given by

(u, v)

lul2

=

(u, u) , where

J ui ( x )v i ( x)dx. L 2= 1 d

=

D

and V is the closure of V in the norm

l ui + ll u ll where ll u ll 2

(

)

=

((u, u)) and

ou o ((u, v)) = L ox ' oxv · J J ]=1 d

H and V are real Hilbert spaces, V dense in H. The dual space to V 1s denoted by V' with the duality extending the scalar product in H and V c H = H' c V ' . Write A for the Stokes operator on H (the self-adj oint extension of the pro jection of -�) which is densely defined in H; it can be extended to A : V ----+ V' by Au [v ] = (( u, v)) for u, v E V. The operator A has an orthonormal ba sis of eigenfunctions { ek h E N = [ C H with eigenvalues 0 < A k / oo . For u E H write u = 2...::: ukek. Write Hn for the finite dimensional subspace Hn = span{ e 1 , e 2 , . . . , en } and Prn for the projection onto Hn . A family of spaces Hr for r E lR is defined as follows: for r 2 0 Hr

=

=

{ U E H : kL Ak Uk < 00} =1

321

22.2. Preliminaries with the norm given by

=

l ui ; = L >-.'k uk . k= l and H- r is the dual of Hr . We may represent H - r by H- r

=

{ (uk)kEN :

=

L x ;;r u k < 00} k=l

In terms of this family we have H0 = H, H 1 = V, and H - 1 = V' with the norms l u i = l u l o and ll u ll = l u l l · The quadratic function B is given by (B(u), y) = b(u, u, y) , where

b(u, v, y)

=

d J u� (x) 80�j (x)y1 (x) dx i

L

2,J = l D

whenever the integral is defined. This describes the nonlinear inertia term in the equation. The trilinear form b has many properties [19] including the following that we need here.

b(u, v, v) 0, l b(u, v, y) l :::; c l u l i ll u ll � lvl i llvll � II Y II =

(22.2. 1) (22.2.2)

There are additional properties that hold only i n dimension d = 2. The inequality (22.2.2) gives the second of the following properties. Proposition 22. 2 . 1

For u E V

I Au lv' = ll u ll 1 3 I B(u) lv' :::; cl u l2 ll u P In the above setting the evolution form of the Navier-Stokes equations (without explicit control) in the space V' (the vector \lp 0 in this space) is given by =

lot -vAu(s) - B (u(s)) + f (s, u(s))} ds t + lo g (s , u (s)) dw(s)

u(t) = uo +

{

(22.2.3)

with the initial condition u0 E H. The first integrals are Bochner integrals in V'. The driving noise process w is an H-valued Wiener process with covari ance Q , a fixed non-negative trace class operator (see [6, 14] for details) , and

322

22. Optimal control for Navier-Stokes equations

the stochastic integral is the extension of the Ito integral to Hilbert spaces due to Ichikawa [17] . I n the study of the Navier-Stokes equation there are several types of solu tion. The most general is a weak solution, but in the case d = 2 we also have strong solutions. The definitions are as follows (see [6] ) : Definition 22.2.2 ( a ) An adapted process u : [0. T] X n ----+ H is a weak solution to the stochas tic N avier-Stokes equations (22.2.3) if (i) for P-a.a.

w the path u( -, w)

has

u(-, w) E L = ( [ O , T] ; H) n L2 ( [0, T] ; V ) n C( [O , T] ; Hweak ) , (ii) for all (iii)

tE

[0, T] the equation (22.2.3) holds as an identity in V',

u satisfies the energy inequality IE

By

(

T

)

l u(t) l 2 + l{o ll u (t) ll 2 dt < oo. t E[O, T] sup

solution henceforth in this

paper we mean

(22 . 2 . 4)

weak solution.

(b) A weak solution is strong if (i) for P-a.a.

w the path u(-, w)

has

u( · , w) E L = ([ o , T] ; V ) n L 2 ( [0, T] ; H2 ) n C([ O . T] ; Vweak ) , u(- , w ) E C( [O , T] : H)) ; w the path u ( , w ) has

(which implies that (ii) for P-a.a.

(22.2.5)

sup

tE[O .T]

-

T

ll u (t) ll 2 + l{o Au(t) 2 dt < oc .

(22 . 2 . 6)

When controls are introduced into the forcing term f the equation (22.2.3) takes the form

u ( t)

=

lot -vAu (s) - B (u( s)) + t + lo g ( s , u(s)) dw(s) .

uo +

{

f ( s , u( s ), B(s, u))} ds

In the next section we discuss the types of control that we consider.

323

22.2. Preliminaries

The following general existence result for solutions to the sNSE was first proved in [4] ( see also [6] for an exposition ) , using Loeb space methods. First define Km = { v : I v i i <:.:: m } C::: V, with the strong topology of H . I n the theorem below, continuity o n each Km turns out t o b e the appropriate condition for the coefficients f, g; this is weaker than continuity on V in either the H-norm or the weak topology of V.

Suppose that uo E H and

Theorem 22.2.3

f:

[0,

oc ) x V ----+ V' ,

g:

[0, 00) X v

----+ L ( H, H )

are jointly measurable functions with the following properties (i) f ( t, · ) E C(Km, V�cak) for all m, (ii) g(t, ) E C(Km, L ( H, H ) weak ) for all m, (iii) l f (t, u) l v' + l g(t, u) IH ,H <:.:: a(t) (l + l ui) where a E L 2 ( 0, T ) for all T. Then equation (22.2. 3) has a solution u on a filtered Loeb space. ·

In dimension d = 2 a stronger existence theorem ( and uniqueness, given Lipschitz coefficients ) has been established - this will be noted when required in Section 22.3.

22.2.3

Controls

The simplest controls considered in this paper are those with no feedback, as follows. A:o is customary in optimal control theory, it is often necessary to extend this class to its natural completion, which is the class of generalized or relaxed controls. These, and the topology on them are as defined thus ( as in [7] ) . The compact metric space lvl is fixed for the whole paper. Definition 22.2.4

(i) The class C of ordinary controls is the set of measurable functions e : [0, T] ----+ AI, where M is a fixed compact metric space. (ii)

The class D of relaxed controls is the set of measurable functions cp [O, T] ----+ M1 (M), where M1 (M) is the set of probability measures on M. (We regard C C::: D by identifying a E M with the Dirac measure 6a ) .

:

The weak or narrow topology on It is given by means of the set JC of bounded measurable functions z : [0, T] x M ----+ lR with z(t, ) continuous for all t E [0 , T] . The action of B E It on z E }( is defined by ·

324

22. Optimal control for Navier-Stokes equations

B(z)

T

=

j z(t, B(t)) dt. 0

Then the topology on hoods the sets

11: is defined by specifying as subbase of open neighbour

{B : I B(z ) l < c} c>O , zE IC· The effect of a relaxed control o n a function z E JC is obtain by first extending the domain of each z E JC to [0, T] x M1 (M) as follows: for a probability measure f-L E M 1 ( M) define z(t , f-L )

=

j z(t, m) df-L (m) .

M

The weak (or narrow) topology is then extended to ::D by defining the action of cp E ::D on z E JC to be

where for brevity we use C{Jt := cp (t) . The fundamental results about controls that we shall usc arc summarized below; for details consult for example [7] , [15] , [2] or [20] (noting that a relaxed control is a particular case of a Young measure). Theorem 22.2.5

The set of relaxed controls ::D is compact.

This means that for an internal ("nonstandard") control (*ordinary or *relaxed) there is a well defined standard part 0 E ::D. (In [7] it is shown how this can be defined explicitly using Loeb measures.) Note that for a control cp E ::D we have 0 ( * cp) = cp. For the next result we define a uniform step control to be an ordinary control e such that there is a partition of [0, T] into intervals of constant length with e constant on each partition interval. Theorem 22.2.6

The set of uniform step controls 11:5 is dense in ::D .

To handle controls in a nonstandard setting the next definition and re sult is basic. Definition 22.2. 7 Z : * [0, T] x *M ---+ *R is a bounded uniform lifting of z E JC if Z is an internal *measurable function such that (i) Z is finitely bounded

22.3. Optimal control for

(ii)

for a.a.

T E * [0, T] ,

d

=

325

2

for all

a

E *M : Z(T,

o:

) :::::J

z(0T, 00:).

Let
and Z be a bounded uniform lifting of z. Then
Theorem 22.2.8 ([7] )

:::::J

T

T

j Z(T, lj)T )dT j z(t, CfJt )dt :::::J

0

0

In later sections we will define wider classes of controls of the form e = where u(-) denotes the trajectory =

B(t, u) for u E H or even e B(t, u( · )) {u( s ) : O :<; s :<; t} up to time t. 22.3

Optimal control for d = 2

Throughout this section we take d 2 and utilize the stronger existence and uniqueness results that obtain in this case. =

22.3.1

Controls with no feedback

First we consider the simplest kind of optimal control problem, taking the set of controls to be the no-feedback controls 11: and its generalization :D. The controlled equation then takes the form

u( t )

=

lot { -vAu(s) - B (u(s)) t + lo g ( s , u( s )) dw(s)

uo +

+ f (s, u(s) , cp ( s ))} ds

(22.3 . 1 )

for a control i n :D . Provided f and g are suitably Lipschitz then solutions to this equation are unique. Theorem 22.2.3 is refined as follows.

Let d 2. Consider the following conditions on the func tions f, g: there exist constants c > 0 and a(-) E L 2 (0, T) such that: ( ) f : [0, T] H x M --+ V' is jointly measurable and (i) l f(t, u, m ) - f(t, v , m ) l v' :<:: c l u - v i (ii) f(t, u, ) is continuous (iii) for a. a. t E [0, T]

Theorem 22.3 . 1 a

=

x

·

for all u, v E H

l f ( t , u, m ) l v' and m E M.

:<::

a (t)(l + l ui)

326 (b)

22. Optimal control for Navier-Stokes equations

g:

is jointly measurable and (i) l g(t, u) - g(t, v)IH , H <::: c l u - v i (ii) for a. a. t E [0, T] [0, T]

x

H

---+

L(H, H)

l g (t, u) IH,H <::: a ( t )(l + lui) for all u, v E H. There is a filtered probability space 0 (D, :F, (:Ft ) t >o, P) carrying a Wiener process with the univ ersal property that for any f, g satisfying the abo v e conditions (for any a( t) and c) and any ordinary or relaxed control the equa tion (22. 3. 1) has a unique solution on 0 satisfying the energy inequality =

w,

IE

l u(t) l 2 + { T ll u(t) ll 2 dt < E lo tE[O,TJ

(

sup

)

(22.3.2)

with the constant E uniform over the set of controls; in fact E E ( uo, a (-)) . =

The uniqueness of solutions requires the property

l b(u, v, y)l <::: c l u l � ll u ll � lvl � llvll � I I Y I I

of the form b which is only valid in dimension 2. We now fix a space 0 as given by this theorem. This may be a Loeb space (as in [6, Theorem 6.4. 1 and 6.6.2] ) , but for the current purpose this is not essential3 : we simply assume that there is such a space 0 in the basic standard universe of discourse and do not care about its provenance. (In Section 22.3 . 1 1 it will b e necessary t o specify the space 0 more precisely.) For a given control cp and initial condition u0 we define

u'P

=

the unique solution for the control cp with

u(O)

=

u0

(Strictly we should write u'h to denote the underlying space, but where not mentioned this will be n.)

22.3.2

Costs

To formulate an optimal control problem it is necessary to consider the cost of a control. Here we consider a general cost comprising a running cost and a terminal cost, defined as follows. 3

Thi� i� i n contra�t to Section 22 . 3 . 1 1 and abo when working in d

necessary to specify the space 0 more precisely.

=

3 , where it will be

22.3. Optimal control for

d

=

327

2

Definition 22.3.2 We assume given (and fixed for the paper) two functions h : [0, T] x H x M ---+ R and h : H ---+ R with the following properties.

(i) h is jointly measurable, non-negative, continuous in the second and the third variables and satisfies the following growth condition: for a.a. t E [0, T], all u E H and m E M l h (t, u, m) l :::; a(t) (l + l ui) .

(ii) h : H ---+ R is non-negative, continuous and has linear growth: that is, lh(u) l :::; c (l + lui) for all u E H . The cost for a control cp (ordinary o r relaxed) is J ( cp ) defined by T h (t, u'P( t ) , cp ( t )) dt + h (u'P( T )) .:1( cp) IE =

(lo

)

(Strictly we should write Jo,(cp) to denote the underlying space, but if the space is not mentioned then we mean D.) The minimal cost for ordinary controls is

.:Jo

=

inf{ .:J(e) : e E Q:}

and the minimum cost for relaxed controls is

Jo 22.3.3

=

inf { .:T( cp) :

cp E �}

Solutions for internal controls

In the present context we are not concerned with proving existence of solu tions to the stochastic Navier-Stokes equations (sNSE) - either by Loeb space methods or any other technique: we are simply assuming the basic existence result above. However, for the purposes of obtaining an optimal control we wish to transfer this result to give the existence of an internal solution U if> to the internal equation on the internal space * 0 controlled by an internal control ; of course the solution will live on the adapted Loeb space L(*O ) rather than the original space 0, and later we see how we can come back to 0 . The proof of the following result, making the first part of the above proce dure precise, is almost identical to that involved in the proof of existence - the main difference being that in the existence proof the intemal process U lives in H N whereas here the process uif> lives in *H . Here is the result needed.

328

22. Optimal control for Navier-Stokes equations

Theorem 22.3.3 Let uo E H and f, g, orem 22. 3. 1 and Definition 22. 3.2, and

h and h satisfy the conditions of The suppose that
be the unique solution (which exists by transfer of Theorem 22. 3. 1) on * 0 to the internal equation * (22. 3. 1) with control ; that is U if> ( T )

=

loT { *AUif> (a) - *B (Uif> (a)) + *f (a, Uif> (a) , T + lo *g (a, U if> (a)) dW(a)

*uo +

-v

( a)) } da + (22.3.3)

and satisfying the internal energy inequality

(22.3.4)

Then U if> has a standard part u o uif>, which is a solution on L ( *O ) to the standard equation (22. 3. 1) with control 0, i. e. u = uoq, and =

(22.3.5)

22.3.4

Optimal controls

The main theorem for controls with no feedback is now easy to derive. Theorem 22.3.4 Let uo E H and f, g, h and h satisfy the conditions of The orem 22. 3. 1 and Definition 22. 3. 2. Then there is an optimal relaxed control ipo,

i.e. such that J ( ipo)

=

Jo

Proof. Take a sequence of controls ( ek ) k E N c e: such that

K E * N; then *J• n (BK) Jo for the internal ordinary control BK : * [0, T] ----+ *A1. Corresponding to this control is the unique internal solution Fix an infinite

::::::J

ueK to the internal controlled Navier-Stokes equation (22.3 .3) which obeys the energy inequality (22.3.4). Let ipo : = oeK : [0, T] ----+ M 1 (1\1) and apply Theorem 22.3.3 to the control

so

22.3. Optimal control for

d

=

329

2

It only remains to show that ..JL ( *O ) ( IPo ) = ..J ( �Po ) . First notice that ..J ( �Po ) is a real number so ..J (ipo) = *..J• o (*ip0 ). Apply Theorem 22.3.3 to the internal control *'Po and the internal solution u *

so

ipo is optimal.

D

Actually the above theorem is easily derived from the following more gen eral consequence of Theorem 22.3.3. Theorem 22.3.5 topology on ::D .

The function ..J is continuous with respect to the narrow

Proof. This is routine using the nonstandard criterion for continuity, together with Theorem 22.3.3 and the argument in the final part of the above proof. D This result, together with the density of that Jo = ..Jo and so we have Theorem 22.3.6

of relaxed controls.

11: in

::D (Theorem 22.2.6) means

The optimal relaxed control ipo is also optimal in the class

As usual in optimal control theory, if the force term f is suitably convex then from an optimal relaxed control it is easy to obtain an optimal ordi nary control.

22.3.5

HOlder continuous feedback controls

(d

=

2)

The natural generalization of the control problem for the stochastic 2D Navier-Stokes equation is to allow e to depend on the instantaneous fluid ve locity as well as on time:

u(t)

=

uo +

lo { -vAu(s) - B (u( s)) lo g (s, u(s)) dw( s ) . t

t

+

+ f (s, u(s), B( s , u(s))) } ds +

(22.3.6)

It is straightforward to extend the approach outlined above for no-feedback controls to the special situation when the controls considered are Holder con tinuous feedback controls with uniform constants, as follows.

330

22. Optimal control for Navier-Stokes equations

Definition 22.3. 7 For a given set of constants ( o:, ,6, 1, c1 , c2 ) with o:. ,6 E (0. 1 ) and 1 , C l , c2 > 0 the set of Holder continuous feedback controls C( o:, ,6, I: q , c2 ) is the set of jointly measurable functions

e [o, TJ :

x

H

____,

M

which are locally Holder continuous in both variables with these constants; that is: (22.3.7) for all

t, s E [0, T] such that I t - s l ::; 1 and all u, v E H such that l u - v i ::; f .

We will see that this definition guarantees the existence of an optimal ordinary control; the Holder continuity allows us to construct the standard part of an internal control as an ordinary control without the need to consider measure-valued (i.e. relaxed) controls. The cost of a control is as before: (22.3.8) where h and h. satisfy the conditions of Definition 22.3.2. The minimal cost is now defined as JO = inf { .J(B ) : B E Q: ( o:, ,6, / , c1 , c2) }

o

(of course, strictly this should be .J (o: , ,6, 1, c1 ; c2 )). The standard part of an internal control is defined in the obvious way:

Definition 22.3.8 Suppose that 8 *M. Define e E Q:(o:, ,6, / , cl , c2 ) by

E *Q: ( o: , ,6, { , c 1 , c2 ), so 8 : * [0 , T] *H ---+ x

B(t, u) 08 ( t, u) 08 ( T, U) for any T :::::J t and u :::::J u in H (because of the condition (22.3.7) ) . =

=

Write e

=

08.

Now we have the counterpart o f Theorem 22.3.3 for Holder continuous controls.

Let 8 E *Q:(o:, ,B, ,, c1 , c2 ) be an internal control and j, g satisfy the conditions of Theorem 22. 3. 1. Let U8 be the unique solution on * 0 to the internal equation with control 8: U8 ( T ) *uo + + )0ro T{ -v*AU8 (T) - *B (U8 (T)) + *j (T, U8 (T) , 8 (T, U8 (T))) } dT (22.3.9) T + *g (T, U8( T)) d W(T )

Theorem 22.3.9

=

i

22.3. Optimal control for

d

=

331

2

and satisfying the internal energy inequality T

)

I U8(T) I 2 + {o II U8 (T) II 2 dT < E. rE*[O ,T] J Then its standard part u = o ue is a solution on L (* O.) to the standard equa tion (22. 3. 6) with control 0 8 (z. e. u U0 8 ), and IE·p

(

sup

=

*J·o ( 8 ) � ..1L ( *Sl ) C8 ) .

This gives the following with proof almost identical to that of Theorem 22.3.4.

Let f, g satisfy the conditions of Theorem 22. 3. 1. There exists an optimal admissible control B0 E Q:(q , c2 , (3, / ) , i. e. such that

Theorem 22.3. 10

J(Bo )

=

22.3.6

o:,

Jo .

Controls based on digital observations

(d

=

2)

The most general results for optimal control o f 2 D stochastic N avier-Stokes equations involves feedback controls that depend on the entire past of the path of the process u through digital observations made at a fixed finite number of points of time. This model of control was discussed in [1] and [7] for finite dimensional stochastic equations; in the latter paper digital observations were made at random times, but we do not include that feature here. The controlled system takes the form

du(t) = { -vAu( t ) - B(u(t), u( t )) + f ( t, u, B(y(u), t))} dt (22 . 3 . 1 0) + g ( t, u)dw(t) with initial condition u(O) = u0 E H. In this equation u = u( -) denotes the path of the process u( t) up to the present time (which will be clear from the context) . The function y ( u) in the control e denotes the digital observations or read-out given by the path u, described in the next sections. 22.3.7

The space

H

For this system the conditions on f, g will be strengthened slightly so that the solutions to (22.3. 10) are strong, and thus the paths of solutions to the equation belong to the space 7-{ where Let

lui

7-{

=

C([O, T] ; H) n L = ([O, T] ; V ) .

denote the uniform norm o n this space; that is

l ui = 0 sup l u( t ) l 9S T

and note that with this norm 7-{ is separable.

332

22.3.8

22. Optimal control for Navier-Stokes equations

The observations

Controls will be based on the information received from digital observations of the solution path u E 'H made at observation t imes

0 < t1

< ... <

tv < T

that are fixed for the problem. For completeness we write t0 = 0 and tp+ l but observations are not made at these times. The digital observation (taking its value in N) at time ti is denoted by Yi where

T,

Y�

:

'H ---+ N

(i

1, .

=

The complete set of observations for a path

u

. . , p)

is recorded as

y(u) = ( Yl ( u ) , . . . , Yv (u)) The i th observation y, is assumed to be non-anticipating in the sense that depends only on the past u I [0, t�J of the path u up to time t�. The digital observation functions y, are fixed for the whole problem.

22 . 3 .9

y� ( u )

Ordinary and relaxed feedback controls for digital observations

The feedback controls considered here are as follows. M is the fixed com pact metric space as before. Definition 22.3. 1 1 An ordinary feedback control based on digital ob servations is a measurable function

e NP :

x

[o, T]

____,

M

where e is non-anticipating in the sense that if t k :<.:: t < tk + l for k = 1 , . . . , p, then B(y, t) depends on t and only the first k components of y. namely (y 1 , . . . , Yk)· For t E [0, tl) a control e depends only on t. A relaxed feedback control based on digital observations is a mea surable function where 1111 ( M) is the space of probability measures on M and i.p is non anticipating in the same sense as for an ordinary control. Write 6 and fJ for these sets of ordinary and relaxed feedback controls.

22.3. Optimal control for

d

=

333

2

For brevity, in the current context , by a control (ordinary or relaxed) we mean a feedback control based on digital observations. The interpretation of a relaxed control is as before. Nonstandard (i.e. internal) controls have standard parts that are given as follows, using the compactness of non-feedback controls (Theorem 22.2.5) . Let

X

* [0, T]

---+

*M ( M)

It is sufficient to construct the standard part of
in the sense of the weak topology on ::D (sec the remark following Theorem 22 .2.5). For k = 1, . . . , p and n E I'JP write n I k = (n 1 , . . . , n k ) and notice that
=

Each

*[t k , t k+ l [ ), so,

From this we define the standard part r.p = o

for t E [t k, t k + l [ · (Note that for fixed n E I'JP if we write W (T) =
in 11: is dense in ::D .

is compact and the subset Q:S of uniform step controls

The following is the extension of Theorem 22.2.8 to show how the standard part of a digital feedback control acts.

334

22. Optimal control for Navier-Stokes equations =

Let
. If z : [0, T] lvl ---+ lR is a bounded, measurable function continuous on A1 and Z : * [0, T] *M ---+ * JR is a bounded uniform lifting (according to Defini tion 22. 2. 'l) then for any n E NP

Proposition 22.3 . 13 x

22.3.10

x

Costs for digitally observed controls

The cost of a control is defined in the expected way: for a control r.p ( ordi nary or relaxed) define (22.3 . 1 1 ) where u is any solution to the equation with control r.p . The cost functions h and lL are given, with properties set out below. As usual the aim is to find a control which minimizes the cost over the set of controls. We will see that the solution to the equation is unique for a given control so we can define the cost J(r.p) = J(r.p , u'P ) (where u'P is the solution for the control r.p) . Then the optimal control problem is reduced to finding a control such that Jo

:= inf{J(B) : e an ordinary feedback control }

is obtained. As is to be expected, in general the set of ordinary feedback controls docs not have sufficient closure properties and we prove existence of an optimal relaxed control.

22.3.11

Solution of the equations

As noted above the conditions on the functions f, g, h, h need to be strengthened (and modified to take account of the form of feedback) . We fix the following set of assumptions for the current discussion.

There exist constants a > 0 and L > 0 such that: f : [0, T] 7-{ M ---+ H is jointly measurable, bounded, non-anticipating, Lipschitz contmuous in the second variable and continuous in the third variable. Specifically (i) l f(t, u, m ) l :::; a (ii) u I t = v I t ===? f(t, u, m) = f ( t, v , m ) (iii) l f(t, u, m) - f(t, v, m) I H :::; L l u - v i

Conditions 22. 3 . 14 ( a)

x

x

22.3.

Optimal control for

·

x

-----+

=

x

(c)

(d)

x

·

(e)

335

f(t, u, ) continuous for a. a. t E [0, T] , for all u, v E 7-{ and m E M . g : [0, T] 7-{ ---+ L(H, V) is jointly measurable, bounded, non-anticipating and Lipschitz continuous in the second variable. Specifically (i) l g(t, u) IH . v :::; a (ii) u I t = v I t ===? g(t, u) = g(t, v) (iii) l g(t, u) - g(t, v) IH . H :::; L lu - v i for a. a. t E [0, T] and all u, v E 7-l . For convenience we also assume that g is diagonal, i. e. V n E N Sd(n) where Sd (n) denotes the gn Prn(g I Hn ) : [0, T] x 7-{ space of n n diagonal matrices (that is, a21 f. 0 iff i f. j). g(t, u) is invertible and l g - 1 (t, u)f(t, u, m) l :::; a for a. a. t E [0, T] , all u E 7-{ and m E M . h : [0, T] 7-{ fl,f ---+ lR i s jointly measurable, non-negative, bounded, non-anticipating in the second variable and continuous in the second and third variables. Specifically (i) l h(t, u, m) l :::; a (ii) u I t = v I t ===? h(t, u, m) = h(t, v, m) (iii) h(t, ) is continuous for a. a. t E [0, T] , all u, v E 7-{ and m E M. h : 7-{ ---+ lR is non-negative, continuous and bounded. Specifically l h(u) l :::; a for all u E 7-l . (iv)

(b)

d=2

,

x

·

Exist ence and uniqueness o f solutions t o t h e equation With the in formation, control and cost structures now defined, we return to the equa tions (22.3. 10) under consideration. Since these equations involve feedback of the entire past of the process both directly in the coefficients f, g and through the control r.p there are new considerations concerning existence and unique ness. The first task therefore is to show how the basic existence theorem of [6] can be extended to prove the following.

There is an adapted Loeb space 0 = ( D , F, (Ft ) t E[O ,TJ , P) carrying a Wiener process of covariance Q, with the following property. For any f, g satisfying Conditions 22. 3. 14 {a, b, c), any initial condition u0 E H and any feedback control r.p (ordinary or relaxed) the equation (22. 3. 10) has a unique strong solution u u'P on n .

Theorem 22.3 . 15

w

=

22.

336

Optimal control for Navier-Stokes equations

The proof of this is rather long and somewhat complicated technically. The essence is to solve the internal hyperfinite dimensional Galerkin approximation in H N with the internal control *cp giving an internal solution u*'P ' and then take the standard parts. The latter involves transferring the Girsanov theorem to the hyperfinite setting, and the use of special topologies defined on the spaces C(H, H) and C(H x M, H) to make them separable so that Anderson's Luzin Theorem can be applied.

22.3.12

Optimal control

For the rest of this section the space n = ( D, :F, (:Ft ) tE[ O ,TJ ' P) and Wiener process w are fixed as those given by Theorem 22.3.15. The following is proved by taking the internal control in the proof of the previous theorem to be an arbitrary internal control

Suppose that
is the so lution to the following equation in H N on the internal space s1 ( n, .A, ( .AT ) T E * [O,T] , II) of the previous theorem:

Theorem 22.3.16

=

dU(T) { -v'AU(T) - BN (U(T) , U(T)) + FN (T, U, is the unique strong solution to the controlled equation (22. 3. 1 0) with cp 0 and =

=

=

where J() = * IE

(lor h (T, uif> , ), T)) dt + *h (Uif> ) ) *

The existence of an optimal relaxed control now follows easily: Theorem 22. 3 . 1 7

The cost function J is continuous on ::D

Proof. Suppose that CfJn --+ cp in ::D . Theorem 22.3.16 shows that ] ( *cpn ) :::::J J ( cpn ) and so J( cpM ) :::::J *J ( cp M ) for small infinite AI . Again using Theorem 22.3.16 we have J( C{JM ) :::::J J ( 0CfJM ) = J ( cp ) . Thus *J ( cpM ) :::::J J ( cp ) for small infinite M and so J ( cpn ) --+ J ( cp ) as required. D The compactness of ::D and the density of step-controls now gives:

22.4.

Optimal control for

=

d 3

337 =

For feedback controls based on digital observations Jo Jo (the infimum for relaxed controls) and there is an optimal relaxed control ipo with J( !fo) Jo Jo.

Theorem 22. 3 . 1 8 =

=

Optimal control for d

22.4

=

3

The extra complications of the 3D stochastic N avier-Stokes equations mean that we are only able to consider the simplest kind of controls. The controlled equation considered takes the form

u(t)

=

lo. t { -vAu(s) - B (u(s)) + f (s, u(s) , B( s))} ds + t + l g (s, u(s)) dw(s)

uo +

(22.4. 1)

which has the same appearance as equation (22.3.1) for no-feedback controls in dimension d = 2. The possible non-uniqueness of solutions means that the very definition of cost and optimality has to be refined - sec below - and this requires a space that is rich enough to support essentially all possible solutions for any given control. For this it is necessary to employ a Loeb space. First we fix the conditions on the coefficients f, g in the equation. These are the natural extension of the basic conditions for the fundamental existence of solutions in Theorem 22.2.3.

There exists an L 2 function a(t) 2 0 such that f [0, T] H M V ' is jointly measurable with linear growth, con tinuous in the second variable on each Kn and continuous in the third variable, i. e. (i) l f(t, u, m) lv ' :::; a(t) ( 1 + l u i ), (ii) f(t, , m) E C(Kn ; V�eak ) for all finite n, where Kn {u E V ll u ll :::; n} as before (with the strong H-topology). (iii) f (t, u, ) continuous for a. a. t E [0, T] , all u E H and all m E M . g [0, T] H L(H, H) is jointly measurable with of linear growth, and continuous in the second variable on each Kn , i. e. (i) l g(t, u) IH . H :::; a(t) (1 + l u i ) , (ii) g(t, ) E C( Kn ; L(H, H)weak) for all finite n, for a. a. t E [0, T] and all u E H.

Conditions 22.4 . 1 (a)

:

x

x

·

·

(b)

:

x

·

--+

--+

22.

338

22.4.1

Optimal control for Navier-Stokes equations

Existence of solutions for any control

Theorem 22.2.3 then gives the following, where as before the key feature to note is the uniform energy bound.

There is a filtered probability space 0 = (D, :F, (:Ft ) t >o, P) with the universal property that for any f, g satisfying Conditions 22. { 1 (a, b) and any ordinary or relaxed control, the equation (22. { 1) has a solution sat isfying the energy inequality

Theorem 22.4.2

IE with a constant E

..

..

=

(

T

)

l u(t) l 2 + { ll u(t) ll 2 dt < E lo t E[O ,TJ sup

(22.4.2)

E ( uo, a) uniform over the set of controls.

The space fulfilling this result in [6, Theorem 6.4. 1] is a particular Loeb space, which we need to specify later. Note that in the current 3D setting solutions may not be unique (this is an open problem) .

22.4.2

The control problem for 3D stochastic Navier-Stokes equations

In order to formulate the optimal control problem for the 3D stochastic N avier-Stokes equations it is necessary to have a space 0 as in Theorem 22.4.2 that has solutions for all controls. Fix such a space; then bearing in mind the possible non-uniqueness of solutions, the optimal control problem is formulated as follows. For a given control cp define

UP := { u : u is weak solution on n to (22.4. 1) for cp and satisfying (22.4.2) } Then, taking the functions J(cp, u)

= IE

(22.4.3)

h, h satisfying the conditions of Definition 22.3.2 let

(loT h (t, u(t) , cp (t)) dt +

h ( u ( T))

).

(22.4.4)

The cost for cp is then defined by J(cp)

:=

inf{J(cp. u)

:

u,

E UP} .

(22.4.5)

Now set Jo

=

inf { J(B, u)

: e E Q:, u E U8 }

=

inf{J(B)

: e E Q: }

22.4. Optimal control for

=

d 3

339

which is the optimal cost for ordinary controls. Likewise

Jo

=

inf { ..1 ( ip, u) : ip E

::D, u E U'P}

=

inf { ..1 ( ip) : ip E

::D }

is the optimal cost for relaxed controls. The optimal control problem is to find if possible an optimal control B and optimal solution u E U8 such that ..J(B , u)

=

..J(B)

=

..Jo

and similarly for relaxed controls

..l ( !f , u) = ..l ( IP ) = Jo . Remark The above definitions are of course relative to the space 0 so strictly this should be acknowledged by writing ..]0° , etc. However, the space n defined in the next section will be fixed for the rest of the paper. -

22.4.3

The space

n

In order to obtain an optimal control and optimal solution to the 3D stochastic N avicr-Stokcs equations the space must be rich enough to carry a good supply of solutions for each given control; we will see that the Loeb space used in [4] (sec also [6] ) to solve the general existence problem for the stochastic N avier-Stokes equations in dimensions ::; 4 is just what is needed. For the rest of this paper we take 0 to be this space, defined as follows. Fix an infinite natural number N and take the internal space where D is the canonical space of continuous functions * C ( * [0, T] ; HN ) , and II is the Wiener measure induced by the canonical Wiener process W(T) E HN having covariance operator QN ( = PrN * Q PrN ) . The filtration (A T ) TE * [o . T] is that generated by the internal process W ( T) . Take the Loeb measure and Loeb algebra P

:Ft

=

n a(AT )

<0

t T

v

=

L(II) , :F L (A) =

and put

N

where N denotes the family of P-null sets. This gives the adapted Loeb space n

which carries the process covariance Q.

=

w

( D, F, (Ft ) t E[o .rJ , P )

=

0 W,

which is a Wiener process in H with

340

22. Optimal control for Navier-Stokes equations

The fundamental existence result for the 3D stochastic Navier-Stokes equa tions on the space 0 in [ 4] or [ 6] is proved by taking the internal solution U ( T, w) to the Galerkin approximation of dimension N, so U(T, w ) is an internal pro cess with values in HN . The existence of a uniform finite energy bound allows the definition of the standard part process u = o U in H, which is the required solution. We do not need to repeat this here in order to establish existence, but the proof of the key theorem below concerning approximate solutions uses similar techniques.

22.4.4

Approximate solutions

Recall the main idea for using nonstandard methods in deterministic op timal control theory used in 2-dimensions. Take a minimizing sequence of controls Bn and consider a nonstandard control 8 = *BK for infinite K and a solution U = U 8 to the equation for this control. Then u = o u is a standard solution for the control 1.p = 08 which is thus optimal. The problem with this approach in the 3D stochastic setting is that the solution U associated with 8 would live in *H and be carried by the nonstandard space *0 (provided we can make sense of that ) . We showed in Section 22.3 how the uniqueness of solutions for d = 2 allows us to move back to the space n to complete the story, but here an alternative approach is needed. This is provided by the notion of approximate solutions, first introduced in [ 1 1] in order to establish results on attractors for 3D stochastic Navier-Stokes equations. The rather technical notion required in [ 1 1] has been adjusted below to suit the current needs - and is somewhat simpler than the corresponding notion in [ 1 1] . Definition 22.4.3 (Approximate solutions) Fix an initial condition uo E H and let E = E ( uo, a) . Let
as follows.

(a) For each j E *N denote by X1 the internal class of internal processes u : * [0, T] X n ---+ HN that are * adapted to the filtration (AT ) ' with the following properties: (i) With IT-probability � 1 -

� on n, for all T

E

* [0, T] and all

T I Uk(T) - Uk(O) - { -vAUk(a) - Bk (U(a), U(a)) } da T T - Ff (a. U(a)) da - Gk (a, U(a)) dW(a) l :S Tj

lo

lo

where Bk = (*B, *e k), Gk = (*g, *e k) ·

lo

Fi: (a. U(a))

=

k ::; j : (22 .4.6)

( *f (a, U(a) ,
and

22.4.

Optimal control for

=

d 3

341

(ii)

(22.4.7) (iii)

I U(O) - uo l

1

<:.:: --: J

(22.4.8)

(b) Define x = nj EN xf . This is the set of approximate solutions to the controlled stochastic N avier-Stokes equations for control . Remark The sequence of internal sets X1<�'> is decreasing with j and since the set X involves Xj only for finite j then we have X_k � X for all K E *N \ N . The theory o f [11] , adapted t o the modified definition o f approximate solu tion here gives the following key result. First we must define the internal cost j ( , U) for an internal process U E H N on Sl and control by

which is different from *..J•n (
Let be an internal control (that is, E *::D) and U E x . Then for P-a. a. w E D, IU(T, w) l is finite for all T E * [O, T] and U(-, w) is weakly S -continuous. The process defined by u(t, w )

=

0U(T, w )

for T :::::J t belongs to U o q, (where 0 is the standard part of q> in the Weak topology on ::D as described in Section 22. 2. 3) and

(b)

Let r.p E ::D be a standard control and u E UP. Then there exists with u = o U as defined in part (a), and hence J(*r.p, U)

:::::J

u

E x *'P

..J(r.p, u) .

Proof (Sketch) . The proof of (a) follows quite closely the proof of Theo rem 6.4. 1 in [6] . The main difference here compared with that proof is (i) the presence of the control in the drift term f, which is dealt with routinely using

22.

342

Optimal control for Navier-Stokes equations

Anderson's Luzin Theorem, and (ii) the fact that internally we only have an approximate equation. That is, we have :::::J instead of = in the internal equa tion for each finite co-ordinate of the solution. That is, from the approximate equality (22.4.6) with j = K for some infinite K, using overspill, we have

Uk (T)

:::::J

Uk (O) + +

lo { -vAUk (a) - Bk (U(a) , U(a))}da lo Ff (a, U(a) )da lo Gk (a, U(a)) dW(a) T

T

+

T

for finite k. This gives equality after taking standard parts. For (b) the theory in [11] can be adapted easily to obtain suitable liftings of the solution u, giving an internal process U E X*.P such that u = 0U. The remaining fact about the cost follows from part (a) with

22.4.5

Optimal control

The results concerning optimal control of the stochastic 3D equation follow from the next more general theorem. Theorem 22.4.5

Let f, g, h and h satisfy Conditions 22.4. 1.

(a) For every control r.p there is an optimal solution u.P E U .P with J ( r.p, u.P) = J(r.p) (b) Let I.{Jn be a sequence of controls and Un E U.P n, with J ( I.{Jn , un) converging.

Then there is a control r.p and u E U.P with J( r.p, u) = nlim J( I.{Jn , un ) · -->oo

Proof. (a) For a control r.p E ::D take a minimizing sequence ( U k ) k EN C U.P such that J(r.p, uk ) � J(r.p) . Using Theorem 22.4.4 (b) , for each k take an approximate solution uk E x *.p such that U k = 0 uk and .J (*r.p, Uk ) :::::J J ( r.p, Uk ) . Weakening this gives and for each finite k so by Nl-saturation there is an infinite K with uK E x; c:: x *.p and J(*r.p, UK) :::::J J(r.p, uK) · Then Theorem 22.4.4 (a) gives 0 UK E U.P with J(r.p, 0 UK) = J(r.p) , so we may take u.P = 0UK.

22.4. Optimal control for

=

d 3

343

(b) The proof is similar to the proof of (a) . For each k take an approximate solution uk E x *'P k such that U k = ouk and J(*r.pk , Uk ) :::::J J( yk , U k ) · Using N l saturation there is an infinite K with UK E x: K � x *'P K and J(*r.pK , UK) :::::J

J1 = l imn_, = J(r.pn, Un) · Now let r.p o ( *r.pK) and u J(r.p, u) = 0J(*r.pK, UK) = Jl =

=

0 UK;

Theorem 22.4.4 (a) gives

u

E

U'P

and D

Corollary 22.4.6

(a) There exists a relaxed control r.po (and solution u'P 0 E U 'PO ) that achieves

the mmimum cost for ordinary controls (i. e. such that J(r.po, u'P0 ) J( r.po) Jo) ; (b) There is an optimal relaxed control rpo and optimal solution u
=

=

Proof. For (a) take a minimizing sequence of ordinary controls (and solutions) and for (b) take a minimizing sequence of relaxed controls and solutions. Then apply (b) of the previous theorem. D Remark 22.4. 7 One consequence of the possible non-uniqueness of solutions is that unlike in dimension 2 we cannot prove that Jo = Jo, so this theory is rather less satisfactory than that for 2D.

22.4.6

Holder continuous feedback controls

(d

=

3)

Recall that in dimension d = 2 , the existence of optimal ordinary feedback controls in a general setting was ensured by taking Holder continuous feedback controls (Section 22.3.5) . The technique of approximate solutions allows the extension of this idea to the 3D stochastic setting. Controls take the form B(t, u(t)), and the controlled equation (22.4. 1) becomes

u(t)

=

lot { - vAu(s) - B (u(s)) + f (s, u(s), B(s, u(s))) } ds + t + lo g ( s , u(s)) dw(s)

uo +

(22.4.9)

with cost function

J(B, u) for

u E U'P

=

JE

(loT h (t, u(t) , B(t, u(t)))dt + h (u(T)) )

(22 .4. 1 0)

22.

344

We maintain the Conditions

Optimal control for Navier-Stokes equations

22.4. 1

on the coefficients

f, g, h, h.

The controls considered are the Holder continuous feedback controls 11:(o:, (3, 1, Cl , c2 ) given by Definition 22.3.7. The controls we consider are Holder continuous feedback controls defined as follows. So we fix constants (o:, f3, r. q , c2) and as before write e:H = Q:(o:, (3, , , C l , C2) · We continue to work with the space n as defined above (Sec tion 22.4.3) ; it is routine to see that for any control B E Q:H there is a solution to (22.4.9) having the energy bound (22.4.2) . As before write U8 for the set of all such solutions, and then the minimal cost is defined as expected: for B E Q:H extend the earlier definition to give: J ( B ) = inf { J(B, u) and then set

JoH

=

: u E U8 }

inf { J(B ) : B E Q:H } .

The standard part of an internal control is defined in the natural way: Definition 22.4.8 Suppose that 8 E * Q:H , so 8 : * [0, T] 08 = B E Q:H by B(t, u) = 08 ( t , u ) = 08(T, U)

for any T :::::J tion (22.3.7)).

2 2 . 4 .7

t

and

U

:::::J

u

x

*H

---+

*M. Define

in H (this makes sense because of the condi

Approximate solutions for Holder continuous controls

For an initial condition u0 E H and an internal control 8 E * Q:H the sets X18 and X8 are defined as in Definition 22.4.3 with F/( modified in the obvious way:

F/; (a, U(a))

=

(*f (a, U(a) , 8(a, U(a))) , *e ) k

Then we have the counterpart of Theorem controls.

22.4.4

for Holder continuous

Theorem 22.4.9

(a) Let 8 be an internal Holder continuous control (that is, 8 E *Q: H ) and

8 uEX . U ( , w) is

,

Then for a. a. w E n I U(T, w) l is finite for all T E * [O, T] and weakly S -continuous. The process defined by u(t, w) 0U ( T, w ) for T :::::J t belongs to U o e and

-

=

22.4. Optimal control for

d

=

3

345

(b) Let B E �H be a standard Holder continuous control and u E U8 . Then

there exists U E x *e with u = 0U as defined in part (a), and hence J(*B, U) ;::; :J(B, u) .

The proof is similar to the proof of Theorem 22.4.4 except for dealing with the terms that contain the control , and this is routine using the Holder continuity property. Finally we have the counterpart of Theorem 22.4.5.

Let f, g, h and h satisfy Conditions 22.4. 1. (a) For every Holder continuous control e there is an optimal solution u 8 E U 8 with :J(B, u8 ) :J(B) . (b) Let Bn be a sequence of controls in �H and Un E U8 n , with :J(Bn , u n ) converging. Then there is a control B E �H and u E U8 with :J(B, u) limn_, J ( Bn , Un )

Theorem 22.4.10

=

CXJ

=

·

Proof. Proved from Theorem 22.4.9 just as Theorem 22.4.5 followed from Theorem 22.4.4. D The existence of an optimal control now follows. Corollary 22.4 . 1 1 There is an optimal Holder continuous timal solution u 80 E U 80 with :J(Bo, u 80 ) = :J(Bo) = :J0H .

control Bo and op

Proof. Take a minimizing sequence Bn of controls in �H and solutions Un E U e with :J(Bn , u8n ) --+ :J0H and apply (b) of the previous theorem. D

Appendix: Nonstandard representations of the spaces Hr For ease of reference the most important facts about the nonstandard rep resentation of members of the spaces Hr are summarized here, together with other crucial nonstandard matters. For full details consult [6] . The space *H has a basis (*e n ) n eN = (En ) n eN given by the nonstandard extension *e of the function e : N --+ H. For each U E *H there is a unique internal sequence of hyperreals (Un ) n eN such that *=

22.

346

Optimal control for Navier-Stokes equations

and the subspace HN of *H defined by

HN

=

N

{ L Un En : n =l

U E *IRN , U internal }

O n H ( and also V or any of the spaces Hr) there are both weak and strong topologies. Here are the most important nonstandard facts.

Let U E *H (or H N ) Then U is (strongly) nearstandard to u in H (denoted U :::::J u ) if I U - *u l :::::J 0 (and then l UI :::::J l u i ). U is weakly nearstandard to u in H (denoted U :::::Jw u) if ( U, *v) :::::J ( u, v) for all v E H and l u i � 0 1 UI (allowing this to be oo). If U is strongly nearstandard then it is weakly nearstandard and the stan dard parts agree, so we write o U for the standard part in whichever topol ogy it may be nearstandard. If l UI is finite then U is weakly nearstandard and (0 U) n 0 (Un ) for fi nite n . If II U II is finite then U is strongly nearstandard in H. If I U I , I V I < oo then U :::::J w V � U� :::::J Vi V i E N.

Proposition 22.4. 1 2 (a) (a) (b)

(c) (d) (e)

=

Similar facts obtain for other spaces in the spectrum Hr. From (22.2.2) we have the following lemma ( a slight extension of the Crucial Lemma ( Lemma 2.7.7) of [6] ) , which is crucial for taking standard parts of the quadratic term B(U). which is the source of many of the difficulties when discussing the Navier-Stokes equations. Le mma 22.4 . 1 3 ( Crucial Lemma)

finite, and z E V then where u

=

ou

and v

=

If U, V E * V with II UII and II VII both

*b(U, V, *z) b(u, v, z) 0V (with u, v E V . ) Hence, if II U II < oo *B (U) :::::J b(u) in V ' (weakly) :::::J

For the proof consult any of [ 6. 9, 10] . Taking standard parts of the term AU needs the following observation. Le mma 22.4 . 14

If U E * V with IIUII finite with u *AU:::::J Au in V ' (weakly)

=

ou

then

347

References

References

[1]

S . ALBEVERIO, J . E . F ENSTAD , R. H OEGH- KROHN and T. LINDSTROM,

Nonstandard Methods in Stochastic Analysis and Mathematical Physics, Academic Press, New York, 1986.

[2]

E . .J . BALDER, New fundamentals of Young measure convergence, in Cal culus of Variations and Differential Equations (Eds. A. Ioffe,S. Reich & I. Shafrir) , CRC Research Notes 410, CRC Press, Boca Raton, 1999.

[3]

H . I . B RECKNER, Approximatwn and Optimal control of the Stochastic Navier-Stokes Equation, PhD Thesis, Halle (Saale) , 1999.

http : //webdo c . sub . gwdg . de/ebook/e/2002/pub/mathe/ 99H10 1 /of _index . htm

[4]

M . CAPINSKI and N . J . CuTLAND. "Stochastic Navier-Stokes equations", Acta Applicanda Mathematicae, 25 ( 1991) 59-85.

[5]

M. CAPINSKI and N . J . CUTLAND , " Existence of global stochastic flow and at tractors for N avier-Stokes equations", Probability Theory and Related Fields, 1 1 5 (1999) 121-151.

[6]

M . CAPINSKI and N . J . C C TLAND , Nonstandard Scientific, 1997.

Fluid Mechanics, World

Methods for Stochastic

[7]

N . J CUTLAND , " Infinitesimal methods in control theory: deterministic and stochastic", Acta Appl. Math. , 5 (1986) 105- 135.

[8]

N . .J . CUTLAND , Loeb measure theory, in Developments in nonstandard mathematics, Eds. N.J. Cutland, F. Oliveira, V. Neves, J . Sousa-Pinto, Pitman Research Notes in Mathematics Vol. 336, Longman, 1995.

[9]

N . J . CUTLAND and K. GRZESIAK, "Optimal control for 2-dimensional stochastic N avier-Stokes equations", to appear in Applied :tvlathematics and Optimization.

[10]

N . J . CUTLAND and K. GRZESIAK, "Optimal control for 3-dimensional stochastic N avier-Stokes equations", Stochastics and Stochastic Reports, 77 (2005) 437-454.

[11]

N . J . CUTLAND and H . J . KEISLER. " Global attractors for 3-dimensional stochastic Navier-Stokes equations". Journal of Dynamics and Differential Equations", 16 (2004) 205-266.

.

.

22.

348

[12]

Optimal control for Navier-Stokes equations

N . J . CuTLAND and H . J . KEISLER, "At tractors and neoattractors for 3D stochastic Navier-Stokes equations", Stochastics and Dynamics, 5 (2005)

487-533.

[13] [14]

G . DAP RATO and A. DEBUSSCHE. " Dynamic programming for the stochastic N avier-Stokes equations", Mathematical Modelling and N umer ical Analysis, 34 (2000) 459 -475. G . DAPRATO and J . ZABCZYK , Stochastic Equations Cambridge University Press, Cambridge, 1992.

sions,

in Infinite Dimen

[15]

R.V. GAMKRELIDZE,

[16]

K. G RZESIAK, Optimal Control for Navier-Stokes Equations using Non standard Analysis, PhD Thesis, University of Hull, UK, 2003.

[17]

A . ICHIKAWA , " Stability of semilinear stochastic evolution equations", Journal of f\Iathcmatical Analysis and Applications, 90 (1982) 12-44.

[18]

S . S . SRITHARAN (ed. ) , Optimal Control of Viscous in Applied Mathematics, Philadelphia, 1998.

[19]

R. TEMAM, Navier-Stokes edition, 1984.

[20]

M. VALADIER, Young measures, in Methods of Nonconvex Analysis (A. Cellina, ed. ) , Lecture Notes in f\Iathematics 1446, Springer-Verlag, Berlin,

1978.

1990.

Principles of Optimal Control, Plenum, New York,

Equations,

Flow, SIAM Frontiers

North-Holland, Amsterdam, third

Lo cal-in-time existence of strong solutions of the n-dimensional Burgers equation via discretizations Joao Paulo Teixeira*

Consider the u1.

=

Abstract

e quation :

111l11.

- (·u V')u + f ·

for

a:

E

[0, 1]"

and t E

(01

)

x ,

together v.rith periodic boundary conditions and initial condition ·u ( t.. 0)

_q(:c) .

=

This corresponds a Navier-Stoket; problem where t he incompress ibility condition has been dropp ed . The maj or difficulty iu existence proofs for this simplified problem is the unbo unded advection term,

(n · \7)'!1..

We present a proof of local-in-time existence of a smooth solution based on a di scr e ti zation by a suitable Euler scheme. It will be shown that this solution exists in an int erval with C depending only on n and tlte values of the Lipschitz constants of f and u a.t time The argument given is based directly on local estimates of the solutions of the discrctizcd problem.

[0. T) , where T ::; t;,

23 . 1

0.

Introduction

The Burgers equation 'Ut

- II.I::!..U + (u \l)·u = .{ ·

x E D c �-" , t > O

provides an example of a model for flows that takes into account the interaction between diffusion and ( nonlinear ) advection. This is probably the simplest nonlinear physical model for turbulence. *

lllstitnto Superior Tecn.ico, Lisbon, Portugal . j teix@math . ist . utl . pt

350

23. Burgers equation via discretizations

This equation is a simplification of the Navier-Stokes equations where the incompressibility constraint on the flow has been dropped. The Navier-Stokes equations are usually studied by taking projected solutions of the Burgers equations onto a subspace of divergence free functions. The main difficulty in the analysis of nonlinear flows, the advection term, ( u · 'V)u, is present in both equations. It is a consequence of the incompressibility condition that there are only trivial Navicr-Stokes flows for n = 1 . This is not the case in Burgers flows, where the case n = 1 is already nontrivial. Cole [6] and Hopf [10] have studied the problem in the real line:

{

Ut - VUxx + UUx u(x, t) = uo (x)

=

0

X

E

X

ER

IR, t > 0

(23. 1 . 1 )

Using the transformation of variables

v u = -2v -,

(23 . 1 .2)

V.r

(23. 1 . 1) was reduced to and initial value problem for the heat equation:

{

Vt - VVxx = 0

v(x, t)

=

vo(x)

X

E

X

ER

IR, t > 0

(23 . 1 .3)

(Where u� = -2 v ..!jvo ) . This enabled them to get a formula for a solution of problem (23. 1 . 1 ) . in terms of a Gaussian integral. Thus, and under very mild conditions on u0 , a solution, u, exists for all x E lR and t > 0, and is smooth. It would be reasonable to expect that the n > 1 case should behave in a similar way. The usual standard theory for this type of equations uses a weak formulation of the problem in Sobolev spaces, a Galerkin approximation to show existence of weak solutions and, finally, regularity estimates. How ever, these methods only lead to partial results. For well-posed problems with generic initial and boundary data, the best that can be obtained is local in time existence of regular solutions. It can also be shown that regular (strong) solutions are unique (if they exist) . See [4, 13, 9, 12, 16] for accounts on these methods. To gain some new insight into this problem, we develop a hyperfinite ap proach to the following model problem on a compact domain.

Let 'll' n = IRn ;zn be an n-dimensional torus. Assume that f is locally Lipschitz continuous on 'll'n X [0, oo ) and u0 E C2 •1(1!'n ) (that is, uo is twice differentiable and all its second partial derivatives are Lipschitz continuous on 'll'n ). Let D = 'll'n X (0, oo ) . Let E JR+ . Let f IRn ;zn ---+ IR, Model Problem:

v

:

351

23. 1 . Diffusion-advection equations in the torus

u0 : JRn ;zn ---+ JRn . Our task is to study the initial value problem for the Burgers equations:

{

Ut - vb.. u + (u · 'V)u = f u = uo

D on 'll'n X {0}. in

(23. 1 . 4)

A (strong) solution of this problem is a sufficiently smooth u satisfying (23 . 1 . 4) . By "sufficiently smooth u" we mean that u : 'll'n x [0, oo ) ---+ lR is such that u( - , t) E C2 ( 1l'n ) for all t E [0, oo ) and u ( x , ·) E C 1 ( [0. oo ) ) for all x E 'll'n .

23 . 2

A discretization for the diffusion-advection equations in the torus

We now look for a discretized version of problem (23 . 1 .4) . Since we are mainly interested in the existence result, we will do this in the simplest way possible. First we work in the standard universe. To discretize 'll'n , we introduce an h-spaced grid on 'll'n . Choose lvf E N 1 , and let h = Jv1 . Then, let: 11':\1

=

{ 0, h, 2h, . . . , (M - l ) h , 1 r

=

h (Z mod Mt.

Consistently with our interpretation of 'll'M as a discrete version of the toms, given any x ( m 1 , m2 , . . . , mn ) h and

'll'n , we define addition in 1!'�1 as follows: y = (h, l2 , . . . , ln )h in 11':\1 , let:

=

This makes addition well-defined in 'll'M ; furthermore, it will behave in a similar way to addition in 'll'n . In particular, the set of grid-neighbors of any x E 'll'M ,

{x ± hei :

i = 1, 2, . . . , n }

is well-defined. As for the discretization of time, consider T and define:

lf:

=

{ 0, k, 2k, . . . , (K - l)k, T }

E JR+ and K E N 1 . =

k (N n [0, K) );

Let

k

=

J? ,

To each triple d = (M, K, T), with M , N E N 1 and T E JR+ , we associate a discretization as defined above . We will , later on, introduce some restrictions on the set of admissible d. For now, given any d = (lvf, K, T) , with M, N E N 1 and T E JR+ , we let :

352

23. Burgers equation via discretizations Dd

=

']['M

X

(II u {T})

The elements of Dd are called gridpoints. Any function U whose domain is a subset of Dd is called a gridfunction. The discrete Laplacian can be defined as a map b.. d : JR15d ---+ JR15d given by:

(

)

1 n U(x + h e,, t) - 2U(x, t) + U(x - h ei , t) . (23.2. 1 ) h2 L i=l The definition of addition in 'll'M makes this well-defined. The discrete version of the nonlinear parabolic operator, P, that occurs in the diffusion-advection equations is defined as the map Pd : JRDd ---+ JRDd b.. d U(x, t)

=

given by:

" U(x, t + k) - U(x, t) - vudU ( x, t ) k U(x + he , , t) - U(x - he , , t) . L....., U · ( x, t ) +� 2h i= l '

With we get:

PdU(x, t) �

=

,�;,, (U(x, t + k) - ( 1 - 2nvA)U(x , t) -

- >.

� ((

( �

)

�

(23.2.2)

)

))

v - U; (x, t) U(x I he ; , t) + v + U; (x, t) U(x - he ; , t)

The discretized version of problem (23 . 1 .4) is, then:

{

PdU(x, t) f(x, t) U(x, 0) uo(x, 0) =

=

if

(x, t)

if

X E

E

Dd

1l'A1 ·

(23.2.3)

Let us look more closely at the finite difference equation in (23.2.3). If we solve for U(x, t + k), we get: =

U(x, t + k) (1 - 2nv.A) U(x, t) n (, h h t) U(x+hei , t) + v+ 2 Ui (x, t) U(x - he i . t)�) U (x, +A v i 2 \ + .Ah 2 f(x, t)

�(

)

(

)

(23.2.4)

23.2.

353

Standard estimates for the discrete problem

Equation (23.2.4), together with the initial condition in (23.2.3), gives a recursive formula for the unique solution of problem (23.2.3). Define a map

by:

+ + � (0 - � v, (x t)) U(x+he, t) + 0+�V,(x, t)) U(x - he,, ,v,

(U, V)(x, t)

=

(1 - 2nvA) U(x, t)

.x

,

whenever

(x, t)

E Dd . Equation

+

U ( x, t k)

=

(23.2.4)

(23.2.5)

can now be written as:

+

In what follows, we will require the right-hand side of (23.2.5) to be in the form of a weighted average, that is, all coefficients multiplying U(x, t) and U(x ± he, , t) must be positive. For this, we will always assume that:

1 A<2m/

( 23.2.6)

We also require that: for all ( x, t ) E Dd ,

(23.2. 7)

where ud is the solution of (23.2.3), relative to d. This means that the set of admissible discretizations is:

{

(M, K, T) E I'h

x

I'h

x

JR+ :

1 A<2nv

1\

V( x, t )

-

E Dd

2v I Ud ( x. t ) l < h

}

Note that condition (23.2.7) is easily satisfied when we work in the nonstandard universe. Whenever h is infinitesimaL 2/: will be infinitely large; so, if U is kept finite, then condition (23.2.7) holds.

23.3

Some standard estimates for the solution of the discrete problem

In this section, we derive estimates that will not require a nonstandard discretization.

23.

354

A

Burgers equation via discretizations

Consider any ( standard) discretization, Dd , let:

c

I I U IIL� ( A)

max

(x,t)EA

[U] L� ( A) [[UlJL� ( A)

d.

Given

U

Dd ---+

IRn

and

I U(x, t) l

I U(x, t) - U(y, t) l lx - Y l I U(x, t + k) - U(x, t) l max k (x , t) , (x , t +k )EA max

(x,t) , (y , t)EA #y

=

Here, I · I denotes the Euclidean norm on ]RJ . We begin by showing some properties of

Let U, V E (IRn ) 15d . Let h > 0 be such that: 2v . h< V I II L'; (Dd ) Then, for all (x, t) E Dd : I
Lemma 1

(23.3. 1)

(x, t) E Dd: I <�>(U V) (x, t) l <::: I (1 - 2nvA) U(x, t) l +

Proof. For any ,

A

I

Let

� ( I (v - � V; (x, t) ) U(x + he; , t) H (v + � V; (x, t) ) U(x - he; , t) l )

M

=

II U II Lc; (l5d ) ·

By condition

(23.3.1):

h v ± 2 V(x, t) > O. Thus:

I <�>(U, V)(x, t) l +A Since

�((

<:::

�

(1 - 2nvA) I U(x, t) l +

( �

)

)

v - V; (x, t) I U(x + he; , t) l + v + V; (x, t) I U(x - he , t) l

I U(x, t) l

<:::

I
M,

we get:

)

� (v - h Vi(x, t) + v + h Vi(x, t) ) n

<:::

(1 - 2nvA) M + AM

<:::

(1 - 2nvA) M + AM2vn

2

=

M.

2

D

23.3.

355

Standard estimates for the discrete problem

Lemma 2

Let U, V, W, Z E (IRn y15 d . Then, for all (x, t) =

E Dd :

(U, W)(x, t) - (V, Z )(x, t)
Proof. For all

(x, t)

(

)(

)

E Dd :

(U, W)(x, t) - ( V, Z ) (x, t)

=

)

v( (v- % (v %

(1 - 2n :A. ) U( x , t) - V(x, t) +

� ( (v- % W, (x, t) ) U(x + he; , t) - z, (x, t) ) V(x + he, t) + + (v + � W; (x, t) ) U(x - he , t) - + z; (x , t) ) V(x - he ; , t) ) ( 1 - 2n:A. v ) ( U(x, t) - V(x, t) ) + n h + A � ( (v- W,(x , t) ) ( U(x + he , , t) - V(x + he , , t) ) + + (v + � W, (x, t) ) ( U(x - h e, , t) - V(x - h e2 , t) )) + + >.

=

2

8(

Ah n +2 ( Zi (x, t) - Wi (x, t) ) V(x + he , , t) =

)(

(U - V,W)(x, t) + Ah n 2 L Z, ( x, t) - W, ( x, t) V ( x + h ei , t) - V ( x - h e2, t) ,=1

Lemma 3

(

)

( Zi (x, t) - Wi (x, t) ) V(x - h e, , t)

Let d be a discretization such that: 2v h < uo ll ll u'c + T ll f ll u)O

If U is the solution of the discrete problem,

(23. 2. 3),

In partzcular, d is an admissible discretization.

then:

)

.

D

23.

356

Burgers equation via discretizations

Proof. Let M = ll uo ll uxo and L = l l f ll uxo . We show, by induction on 0, k, . . . . T, that for all T = 0, k, . . . , t and for all x E 'll'j\1 :

I U ( x, t) l

t

<::: M + tL.

By the initial condition in problem (23.2.3), the result holds for t = 0. As suming it is valid for some t E lJ: , we have, by the hypothesis on h and the induction hypothesis, that:

2v

2v < M + tL I U ( x, t) l ' x {0. k, 2k, . . . , t}. Hence, by Lemma 2: h<

for all ( x, t) E 'll'j\1

I U ( x, t + k) l 23.4

M + TL <

2v

<::: I

=

M + (t + k)L. D

Main estimates on the hyperfinite discrete problem

The nonstandard analytical setup we now use is as follows. We work in a superstructure (V(IR) , *V(IR), *) . We will omit the stars on all standard functions of one or several variables and usual binary relations. Any x E *IRn is called finite iff there exists m E N such that l xl < m. Each finite x E *JR can be uniquely decomposed as x = r + c: , where r E lR and c is an infinitesimal; r is called the standard part of x, and denoted by st x. If x, y E *IR are such that x - y is infinitesimal, then we say that x is infinitely close to y, and write x :::::J y. Similarly, if x, y E *IRn : x :::::J y

lx - Yl

iff

:::::J 0 iff

Xi

:::::J

Yi ,

for each i

=

1,

. . .

,

n.

Let j, l E N. If x E * ]RJ is finite then let:

If F : A

c

*JRJ

---+

*IR1 is an S-continuous function, define °F ( st x )

For sets A E ]RJ . let:

0A

=

{ st x :

=

op by:

st ( F (x )) Vx E A.

(23.4. 1)

}

"x is finite" and x E A .

Each "circle" map as introduced above i s sometimes called a standard part map.

23.4. Main estimates on the hyperfinite discrete problem

357

In this, and other similar definitions of this work, we consider *V, where V = { � d } . *V is an internal *family of linear maps indexed in the internal set of all admissible d; each �d E *U is a *linear map acting on the vector space of internal gridfunctions U : Dd ---+ *R Similarly, we can consider *U, where

{

U = Pd : 111 E *N, T E *JR+ , K E *N and

}

TM 2 < -1 2nv , K

--

*U is now an internal family of linear maps, indexed on an internal set. Each HiiK E *U is then a *linear map acting on the vector space of internal grid functions. For the norms and seminorms introduced in the previous Section, we can proceed similarly. We get families of internal norms and seminorms which, by transfer, satisfy the following. Given U : Dd ---+ * IRn and A C Dd :

I I U IIL� ( A) [U] L� ( A)

=

[[U]] L� ( A) Lemma 4

{

=

{

}

* max I U(x , t) l : (x, t) E A ,

} }

* max IU(x, t) - U(y, t)l : (x, t) , (y, t ) E A and x # y , =

lx Yl * max I U(x, t + k) - U(x, t) l : (x, t), (x, t + k) E A . k

{

_

Let d be an admissible discretization such that h is infinitesimal. Let

If U is the solution of the discrete problem

{23. 2. 3)

then, for T < 2n1Lo :

Lo [U] L� (Dd ) rv< 1 2nLoT -

To work internally, we will begin by requiring a weaker condition on h, namely:

Proof.

2v ---,-,- -----,h < -,----l l uo l l u)C + T llfl l u)O ·

(23.4.2)

Lemma 3 shows that this is a sufficient condition for admissibility of d, which is all that is needed for now.

23. Burgers equation via discretizations

358

Fix z E 'll'M and write V(x, t)

U(x + z, t + k) - U(x, t + k) =

=

=

U(x + z, t) . Then, by Lemma 2:

-.h2 (f(x + z, t) - f(x, t))

(

+ >-.h 2 (J(x + z, t) - f(x, t))

)(

)

By *recursion, we will construct a function L : {0 , k, 2k, . . . , T} that, for all t 0 , k, . . . T and x, z E 'll'M :

---+

*IR such

=

I U(x + z , t) - U(x, t) l :::; L(t) lzl

(23.4.3)

Let L (t) be given by:

For t

{

=

0,

Lo L(t) + nk(L(t)) 2 + nkL6

L(O) L(t + k)

we get:

I U(x + z, 0 ) - U(x, 0 ) I

=

lu.o(x + z) - u.o(x) I :::; [ u.o ] Ld (1I''M) lzl :::; Lo I z l .

Now, we assume that for all T

=

0,

k, . . . , t and x E 'll'j\1

I U(x + z, T) - U(x, T) l :::; L(t) lzl . Since d is admissible ( and so I V(x, T) l that I
=

I U(x + z, T) l

<

2{), Lemma 1 implies

I U(x + z, t + k) - U(x, t + k) l :::; L(t) l zl Ah n + 2 L L(t) lziL(t)2h + Ah 2 [j]Lx lzl i= l :::; L(t) + nk(L(t)) 2 + nkL6 l z l .

(

)

This shows inequality (23.4.3) by *induction. Note that L( t) defines the Euler iterates for the standard ODE initial value problem:

{

y' y(O) Lo =

(23.4.4)

23.4. Main estimates on the hyperfinite discrete problem Since

y

'

359

2ny 2 , and thus: Lo Y (t) -< 1 - 2n L0 t

2 0 , y (t) 2 L0 . Hence,

y

'

:<::

Now, use the fact that h is infinitesimal. By the convergence of the Euler iterates to the solutions of problem (23.4.4) , we conclude that: Lo L(t) :::::J y (t) :<:: 1 - 2n L t o This estimate is valid for t in any compact interval where problem (23.4.4) has solution; in particular, the estimate will be valid for t < 2n1Lo . D

Let d be an admissible discretization such that h is infinitesimal. Let Lo be as in Lemma 4 and

Lemma 5

Mo = max ([[uo lJL�(1I':\I ) ' � ( [[f]] L�(l5d) f12, Lo ) '

[[uolJL�(1I':\I ) = * max { v� uo (x) - (u · 'V)uo(x) + f(x, 0 ) :

with

x E 1!'%J }

If U is the solution of the discrete problem {23. 2.3) then, for T <

2n1Lo :

o [[ u]] L�(Dd) < 1 - M 2n Lo T Proof. Again we begin by requiring that h satisfies the weaker statement: 2v (23.4.5) h < uo ll ll u " + T i l f i l L Write V(x, t) U(x, t + k) . Then, by Lemma 2: U(x, t + 2k) - U(x, t + k) -.h 2 (f(x, t + k) - f(x, t)) =
=

=

=

l=l

+ >-.h2 (f(x, t + k)

- f(x, t))

)

)

By *recursion, we again construct a function !VI : {0 , k, 2k, . . . , T} that , for all t = 0 , k, . . . T and x E 1l'A1 :

I U(x, t + k) - U(x, t) l

:<::

M(t)k.

---+

*JR such

(23.4.6)

23. Burgers equation via discretizations

360

Let M ( t) be given by:

{

Mo M(t) + nkM(t)L(t) + nkMJ Here, L is the function introduced in the proof of Lemma 4. For t 0 , we get: n n U(x, k) - U(x, 0 ) k :::::J V� UO ( X ) - ( UO · \7 ) UO ( X ) + j ( X , 0 ) < [[uolJL:J ( 1I':\I ) < Mo. Now, we assume that for all T 0 , k, . . . , t and x E 1!'�1 M(O) M(t + k)

=

=

I U(x, T + k) - U(x, T) l <::: M(t)k. Since d is admissible ( and so I V(x, T) I I U(x, T + k) l < 2/: ) , Lemma 1 implies that I

i=l

<

(M(t) + nkM(t)L(t) + nkMJ ) k

M ( t + k)k.

This shows inequality (23.4.6) by *induction. Now, we note that M(t) defines the Euler iterates for the standard ODE initial value problem:

{

=

n z y + nMJ (23.4. 7) Mo. Here, y( t) is the solution of problem (23.4. 7) , as given in the proof of Lemma 4. Since z' 2 0 , z(t) 2 Mo, so nz(t)y(t) 2 nMoLo 2 nMJ. Hence, z'(t) <::: 2nz(t)y(t) , and thus: Mo z(t) <- 1 - 2nLot z' z(O)

=

Now, use the fact that h is infinitesimal. By the convergence of the Euler iterates to the solutions of problem (23.4.4) , we conclude that:

M (t) :::::J z(t) -<

Mo 1 - 2nL0t

23.5. Existence and uniqueness of solution

361

This estimate is valid for t in any compact interval where problem (23.4.4) has solution; in particular, the estimate will be valid for t < 2n1Lo . D

23 . 5

Existence and uniqueness of solution

By the results of section 23.4, the function U(x, t) is S-continuous for t E 1 . Note that the set of admissible values for T < Lo 2n depends only on L0, that is, on the size of the Lipschitz constants of u0 and f. Therefore, u(st x, st t) st U(x, t) (23.5. 1) is well-defined and continuous on 'll'n x [0, T] . Also, Lemmas 4 and 5 imply that u is globally Lipschitz on 'll'n x [0, T] . To show that u solves (locally in time) the problem and also to prove a uniqueness result, we need the following:

{0, k, . . . , T}, with T

=

Let a : 'll'n x [0, T] ---+ lR be Lipschitz continuous. Then, the problem, in D Vt - vb.. v + (a · 'V)v f (23.5.2) on 'll'n X {0}. uo has a unique solution v E C2 • 1 ( 1l'n X [0, T] ) . Furthermore, if v is a solution of problem (23. 5. 2) then (in the nonstandard universe) v satisfies: for all (x, t) E Dd , there exists an c :::::J 0 such that v(x, t + k) -.h 2 f(x, t) + r:: >-. h 2 . Proof. Consider problem (23.5.2). Since a is given, the vectorial differential equation Vt v b.. v + (a 'V)v f is just an uncoupled system of n scalar second order parabolic equations. Since a is a Lipschitz continuous function on ']['n X [0, T] and Uo E C2 • 1 ('ll'n ), by the theory of parabolic equations (e.g. Friedman, [8]) , the problem has a unique (strong) solution, v E C2, 1 ( 1l'n X [0, T] ). By the regularity of v: f(x, t) Vt (X, t) - vb.. v (x, t) + (a(x, t) · 'V)v(x, t) Pd v(x, t) v(x, t + k) - v(x, t) vu" v x, t + � a" x, t v(x + h e2, t) - v(x - he i , t) - d ( ) 6 ( ) k 2h l=l Lemma 6

{

=

v =

=

-

_

• •

·

=

=

>,;,.' (v (x, t + k) - (1 - 2n v,\)n(x, t) -

:::::J

- ), � ( (v - �a; (x, t) ) n(x + he; , t) + (v + � a,(x t) ) n(x - he, , t) ) )

23. Burgers equation via discretizations

362

Consequently, for each (x, t) E Dd, there exists an c :::::J 0 such that:

,

v(x t + k) + >. =

=

v

(1 - 2n .A ) v(x , t) +

� ((v- � a, (x, t) ) v(x + he; , t) + (v + � a; (x, t) ) v(x - he, , t)) +

+ .Ah 2 f(x , t) + r:: .A h 2
D

Here is our existence lemma: Lemma 7 Let u and T be as given by equation (23. 5. 1 ). Then u E C( 2 ; l ) ( 'll'n x [0, T] ) and u is a (strong) solution of the Burgers equation problem (23. L4) . Proof.

Consider the problem:

{

Vt

v

- Vb,. V + ( U · \7) V

=

=

j

in D on 'll'n

uo

X

{0}.

By Lemma 6 the problem has a unique (strong) solution, u E C2, 1 ( 1l'n X [0, T] ) and, in the nonstandard universe, for all (x, t) E Dd, there exists an c :::::J 0 such that: v(x , t + k)
=

=

T =

I U(x, t) - * v(x , t) l ::; rt. This implies that, for all (x, t) E Dd: u( st x, st t) st U(x, t) st *v(x, t) v( st x, st t), =

=

=

as wanted. (As usual, from this point on we omit stars on *v) . For t = 0 the statement follows from the initial conditions. Now, assuming the statement holds true for some t jk, j E *N: =

U(x, t + k) - v(x, t + k)

= =

(

)(

)

Ah n u (x, t) - U (x, t) +2 v(x + hei, t) - v(x - he 2 , t) . , L " i= l

23.5. Existence and uniqueness of solution

363

Since v( - , t) is a C2 function on the compact set 'll'n , there exists a finite C so that l v(x + he, , t) - v(x - hei , t) I :S 2hC. Also, by the definition and continuity of u , Ui (x, t) - Ui (x, t) = 6 :::::J 0. By the induction hypothesis and Lemma 3, I
I U(x, t + k) - v(x, t + k) l

rt + ck + )..2h n6 2hC = rt + k (c + nC6) :S r(t + k ) . :S

D

As consequence of the above Lemma, we get: Theorem 1

C2, 1 ('ll'n ) . Let

Let f be locally Lipschitz continuous on 'll'n

(

1

x

[0, oo ) and u0

1

E

)

L -- max [uo] L=d ('II'nM ) -n ( [f] £=d (75d ) ) 1 / 2 , [[uolJ Ud"' ( � nM ) -n ( [[fll u"'d ( Dd ) ) 1 /2 · >

>

Here, [[uolJ Ud"' (1I'M" ) is an upper-bound for the Lipschitz constant of the function �uo (x) + (uo · 'V)uo(x) + f(x, O) . Then, for any T < 2�L ' the problem

{

Ut = v�u - (u · 'V)u + f u = uo has a strong solution, u E C2, 1 (1l'n X [0, T] ) .

in D on 'll'n

X

{0}.

The uniqueness theorem follows from:

Let u and v be solutions of the Burgers equation problem [0, T] , for some T > 0. Let Lv = [v] L d (1I'n x [O , T] ) . Then:

Lemma 8

on 'll' n

X

u(x, t) = v(x, t)

for all

X

E

(23. 1..4)

{-1-, }

'll'n , 0 :S t < min T 2nLv

Proof. Using the nonstandard statement of Lemma 6, (note that u and v are given functions) , we conclude that for all (x, t) E Dd, there exist infinitesimal 61 and 62 such that:

u(x, t + k) =
v(x, t + k)

=

-.h 2 .

23. Burgers equation via discretizations

364

}

{

Let 8 be the largest ik, i E *N, such that ik :S min 2n1L v , T . Let r E � + be arbitrary. We will show by induction on t 0, k, 2k, . . . . 8 that for all 0, k, 2k, . . . , t and x E 'll'j\;1 : =

T =

l * u(x, t) - * v(x, t) l

This implies that, for all (x, t)

E

:<::

rt.

Dd :

u(st x, st t)

=

s

v( t x, st t),

as wanted. (From hereon, we omit stars on *u and *v). For t = 0 the statement follows from the initial conditions. Now, assuming the statement holds true for some =

t jk, j E *N:

u(x, t + k ) - v(x, t + k ) Let

c =

61 - 62

::::::

=

-.h 2

0. Using Lemma 2: =

u(x, t + k ) - v(x, t + k ) -.h u(x, t + k ) - v(x, t + k ) < rt + c:k + 2 nrt 2hLv ,

l

Since t :<:: e

:<::

l

rt + k ( c + rntLv ) .

2nL ' it follows that: lu(x, t + k ) - v(x, t + k ) l

:<::

r(t + k ).

Let u and v be solutions of our Model Problem [0, T] , for some T > 0. Then u v.

Theorem 2

=

D

(23. 1..4)

Assume there exists t E [0, T] such that, for some x v(x, t) ; let B be the infimum of such t. Note that: u(x, B) v(x, B) , for all x E 'll' n .

Proof.

=

E

on 'll'n

x

'll'n , u(x, t) #

(23.5.3)

References

365

If e 0 then equality (23.5.3) follows from the initial conditions; otherwise, it follows from continuity of u and Consider the problem: =

{

v.

Wt

w

=

=

vD.. w - (w · V')w + g u(x , e)

in D on 1rn

X

(23.5.4)

{0}.

where g(x, t) f(x, e + t), for all (x, t) E 'll'n X [0, T - B] . Consider WI , W2 : 'll'n X [0, T - B] ---+ lR given by WI (x, t) u(x, e + t) and WI (x, t) ( x , e + t). But W I and w2 are solutions of problem (23.5.4) such that, for arbitrarily small t > 0: =

This contradicts Lemma 8.

=

=

v

D

References [1] .T . M . BURGERS, "A Mathematical Model Illustrating the Theory of Tur bulence", Advances in Applied Mechanics, 1 ( 1948) 171 - 199. [2] M. CAPINSKI and N . J . CUTLAND, "A Simple Proof of Existence of Weak and Statistical Solutions of Navier- Stokcs Equations", Proceedings of the Royal Society, London. Ser. A, 436 ( 1992) 1 - 1 1 . [3] R. COURANT and D . HILBERT. Methods of mathematical physics, Inter science Publishers, New York, 1953- 1962. [4] P. CIARLET, The finite element method for elliptic problems, North Hol land, New York, 1978. [5] C . CHANG and H . J . KEISLER, Model Theory, North Holland, Amsterdam, 1990. [6] J. D . COLE, "On a Quasilinear Parabolic Equation Occuring in Aerody namics", Quant. Appl. Math., 9 ( 1951) 225-236. [7] N. CUTLAND , Nonstandar-d Analysis and its Applications, London Math ematical Society, Cambridge, 1988. [8] A. FRIEDMAN , Partial differentzal equatwns of parabolic type, Prentice Hall, Inc., Englewood Cliffs, N . .T., 1 964. [9] D . HENRY, Geometric theory of semilinear parabolic equations, Springer Verlag, New York, 1981.

23. Burgers equation via discretizations

366

[10] E. HOPF, "The Partial Differential Equation Ut + UUx nications Pure and Applied Math. , 3 (1950) 201 -230.

=

VUxx ",

Commu

[11] H . J KEISLER, Foundations of Infinitesimal Calculus, Prindle, Weber and Schmidt, 1976. .

.

[12] .J .L. LIONS, Quelques methodes de resolution des problems aux limites nonlineaires, Dunod, Gauthier-Villars, Paris, 1969. [13] J .L. LIONS, Control of systems governed by partial differencial equations, Springer-Verlag, New York, 1972. [14] A . J . MAJDA and A.L. BARTOZZI, Vorticity and Incompressible Flows, Cambridge Texts in Applied Math., Cambridge, 2002. [15] K . D . STROYAN , Introduction to the Theory of Infinitesimals, Academic Press, New York, 1976. [16] LUTHER W. WHITE, "A Study of Uniqueness for the Initialization Prob lem in Burgers ' Equations", .T. of Math. Analysis and Applications, 1 72 (1993) 412-43 1.

C alculus with infinitesimals Keith D . Stroyan *

24. 1

Intuitive proofs with "s1nall" quantities

Abraham Robinson discovered a rigorous approach to calculus with in finitesimals in 1960 and published it in [9] . This solved a 300 yem· old problem elating to Lcibni z and Newton. Extending t he ordered field of (Declekind) "real" numbers Lo indude infiniLosinmls is noL difficult algebraically, buL calcu lus depends on approximations with transceudental functions. Robinson used mathematical logic to show how to extend a.ll real ftmctions in a way that preserves their properties in a precise sense. These properties cm1 be used to develop calculus with infinitesimals. Infinitesimal numbers have always fit basic intuitive approximation when certain quantities are "small enough," but Leibniz, Euler, and many others could not make the approach free of contra diction. Sec:tion 1 of this article uses some intuitive approximations to derive a few fundamental results of analysis. We usc approximate equality, x � y, only in an intuitive sense that " x is sufficiently close to y". I-I. Jerome Keisler developed simpler approaches to Robinson's logic and began using infinitesimals in begilming U.S. calculus courses in 1 969. The experimental and first edition of his hook were used widely iu th e 1970's. Sec tion 2 of this article completes the intuitive proofs of Section 1 nsiug Keisler's approach to infinitcsimals from [6J . 24 . 1 . 1

Continuity and extreme values

Theorem 1 The Extreme Value Theorem Suppose a function f [x] is continuous on a compact interval [a, b] . Then f[x] attains both a maximttm nnd minimum, that is, there are points Xl\.fAX and Xmin in [ a, b] , so that for eve1·y other x in [a, b] , f [xmin] :::; f [x] :::; f [XMAX] · ·university of Iowa,

USA.

keith-stroyan@uiowa . edu

370

24. Calculus with infinitesimals

Formulating the meaning of "continuous" is a large part of making this result precise. We will take the intuitive definition that f[x] is continuous means that if an input value x1 is close to another, x2 , then the output values a.rc close. We summarize this as: ! [:1:] is continuous if and only if a. ::::; x1 g::: x2 ::::; b ==} .f[x r] g::: j [x2]. Given this property of f[x] , if we partition [a, b] into tiny increments,

a. < a +

k(b - a) 2(b - a) l (b - a) < ··· < b < ··· < a+ < a+ H H H

the maximum of the function on the finite partition occurs at one (or more) of the points :t:AJ = a + k ( a) . This means thai, for any other ptutit;iou point j (b-a) -rr- : ' [X M ] 2: }· [.!.:- - ( . _ X 1 - Q, + ---J J Any point a ::::; x ::::; b is within f.J of a partition point x 1 a + j(bi7a) , so if H is very large, :x; g::: :r; 1 and

��

=

f[x u ] ;:::: f[x t ]

g:::

.f [x]

and we have found the approximate maximum. It is not hm·d t,o make t.his idea into a sequential argument where XJ\!J [I-I] depends on H, but there is quite some trouble to make the sequence X M[H] converge (using some form of compactness of [a, b] .) Robinson's theory simply shows that the hyperreal X lvJ chosen when 1/H is infinitesimal, is infinitely near an ordinary real number where the maximum occurs. We complete this proof as a simple example of Keisler's Axioms in Section 2. 24. 1 . 2

Microscopic tangency i n one variable

In beginning cakulus you learned that the derivc1bvo measures the slope of the line tangent to a curve y = f[x] at a particular point, (x, J[x] ) . We begin by setting up convenient "local variables" to use to discuss this problem. If we fix a particular (x, f [x] ) in the x-y-coordinates, we can define new parallel coordinates ( dx , dy) through this point. The (dx, dy)-origin is the point of tangency to the curve. y

dy

m

dx

*� J;fx =

Figure 24. 1 . 1 : IVIicroscopic Tangency

24. 1 . Intuitive proofs with "small" quantities

371 =

A line in the local coordinates through the local origin has equation dy mdx for some slope m. Of course we seek the proper value of m to make dy = mdx tangent to y = f[x] . You probably learned the derivative from the approximation lim

.6- x ----> 0

f[x + �x] - f [x] �X

=

j ' [x] .

If we write the error in this limit explicitly, the approximation can be expressed as f[x +��-f[x] = f' [x] + c or f[x + �x] - f [x] = f' [x] · �x + c �x. Intuitively we say f [x] is smooth if there is a function f'[x] that makes the error small, c 0, in the formula ·

:::::::

f [x + 6x] - f[x]

=

f'[x] · 6 + c · 6x

(24. 1 . 1)

:::::::

when the change in input is small, 6x 0. The main point of this article is to show that requiring c to be infinitesimal whenever 6x is infinitesimal and x is near standard gives an intuitive and simple direct meaning to C 1 smooth. Requiring only that c tend to zero only for fixed real x, or pointwise , is not sufficient to capture the intuitive approximations of our proofs. The nonlinear change on the left side of ( 24. 1 . 1 ) equals a linear change, m 6x, with m = f' [x] , plus a term that is small compared with the input change. The error c has a direct graphical interpretation as the error measured above x + 6x after magnification by 1j6x. This magnification makes the small change 6x appear unit size and the term c 6x measures c after magnification. ·

·

y

Figure 24. 1. 2: Magnified Error When we focus a powerful microscope at the point (x, f[x] ) we only sec the linear curve dy = m dx, because c 0 is smaller than the thickness of the line. The figure below shows a small box magnified on the right. ·

:::::::

372

24. Calculus with infinit,esimals dy

dy

X

Figure 24. 1.3: A !v!agnificd Tangent

24. 1 .3

The Fundamental Theorem of Integral Calculus

Now we usc the intuitive microscope approximation (24. 1 . 1 ) to prove:

Theorem 2 The Fundamental Theorem of Integral Calculus: Part I b Suppose we want to find fa f[x] dx . If we can find another function F[x] so that the derivative satisfies F'[x] = f[x] for every x , a :::; :c :::; b, then

1b

f [x] dx

=

F [b] - F[a]

The definition of the integral we use is the real number approximated by a sum of small slices, b-6x

.l J[x] dx � f[x] b

�

·

8x, when r5x

st.C'p li.r

�

0

Figure 24.1 .4: Sum Approximations

24. 1 .4

Telescoping sums and derivatives

We know that if F[x] has derivative F' [x] = f [x] , the differential approxi mation above says,

F [x + r5x] - F[x] so we can smn both sides

=

f[x] · 8x + c: 8x •

24. L Intuitive proofs with "small" quantities b-i5x

L F[x + 6x] - F[x]

x =a

,.,t('p O x

=

373 b-i5x

b-!5

L f [x] · 6x + L c · 6x

The telescoping sum satisfies, b-i5x

L F[x + 6x] - F[x]

x=a s t e p Ox

=

F[b'] - F[a]

so we obtain the approximation,

1ab f [x] dx ;::; x=a L f[x] · 6x = F[b'] - F[a] - L c · 6x x=a b-i5x

b-i5x

"tep O x

-, t c p O x

This gives, b-i5x

b-i5x

b-i5x

x=a step Ox

x=a step O x

x=a step 6x

L f[x] · 6x - (F[b'] - F[a] ) <

L c · 6x < L b-i5x

< m ax [ l c l ] · L 6x x=a

or

I: .f[x] dx

step Ox

;::; 0 ;::;

=

lei

· 6x

max [ l c l ] · (b' - a )

� b;;;!� f[x] · 6x ;::; F[b'] - F[a] . Since F[x] 1s continuous, step O x

F[b'] ;::; F[b] , so I: f [x] dx

=

F[b] - F[a] .

We need to know that all the epsilons above are small when the step size is small, c ;::; 0, when 6x ;::; 0 for all x a, a + 6x , a + 26, · · · . This is a uniform condition that has a simple appearance in Robinson's theory. There is something to explain here because the theorem stated above is false if we take the usual pointwise notion of derivative and the Riemann integral. ( There are pointwise differentiable functions whose derivative is not Riemann integrable. See [ 5 ] Example 35, Chapter 8, p.l07 ff. ) The condition needed to make this proof complete is natural geometrically and plays a role in the intuitive proof of the inverse function theorem in the next example. =

24 . 1 . 5

Continuity of the derivative

We show now that the differential approximation

f [x + 6x] - f[x]

=

f' [x] · 6 + c · 6x

374

24. Calculus with infinitesimals

forces the derivative function J' [x] to be continuous,

Let XI R:: x2 , but :r 1 -=f x2 . Usc the differential approximation with x = X I and 15x = x2 - :1:1 and also with x = x2 and 15x = x1 - x2 , geometrically looking at the tangent approximation from both endpoints .

.f [x2] - f[x 1] = .f ' [xr] (.'1:2 - x1 ) + c:1 (x2 - x1) J [x1 ] - .f [x2] = .f' [x2] · (x1 - x2 ) + c:2 · (x i - x2 ) ·

·

Adding these equations, we obtain

Dividing by the nonzero term (.1:2 - x1 ) and adding .f' [x2] t o both sides, we obtain, j' [x2] = .f' [xi] + (c:1 -£2 ) or j' [x2] R:: f' [xl] . since the difference between t,wo small errors is small. This fact can be used to prove:

Theorem 3 The Inverse Function Theorem {f f'[xo] -=f 0 then f[x] has an inverse function in a small neighborhood of xo , that, is, if y R:: Yo = .f[x o] , then there is a wnique x R:: xo so that y = .f[x] . We saw above that the differential approximation makes a microscopic view of the graph look linear. If y R:: Yl the linear equation dy = m dx with m. f' [xr] can be inverted to find a first approximation to the inverse, ·

=

Yo = m (x1 - xo) x 1 = xo + � (y - Yo) Y-

·

y

1 X OX x ""----'1"-Xo

Figure 24. 1 .5: Approximate Inverse

24. 1 . Intuitive proofs with "small" quantities

375

=

We test to see if f [x 1 ] y. If not, examine the graph microscopically at (x 1 , yl ) = (x 1 , j[x 1 ] ) . Since the graph appears the same as its tangent to within c and since m = f'[x l ] :::::J J'[x 2 ] , the local coordinates at (x 1 , yl ) look like a line of slope m. Solving for the linear x-value which gives output y, we get Y - Y l m (x 2 - x l ) X2 = Xl + � (y - j[x 1 ]) =

·

Continue in this way generating a sequence of approximations, x 1 = xo + G[x n ] , where the recursion function is G[�] x + � (y- f[�] ) . The distance between successive approximations is 1 lx 2 - x 1 l I G[x l ] - G[xoJ I :<:: 2 lx 1 - xo l 1 1 1 lx 3 - x 2 l = I G[x 2 ] - G[x 1J I :<:: 2 · lx 2 - x 1 l :<:: 2 · 2 · lx 1 - xo l

� ( y - yo ) , Xn+ l

=

=

=

·

by the Differential Approximation for G [x] . Notice that G'[�] 0, for � :::::J x o , so IG'[�] I < 1/2 in particular, and

=

1 - f'[�] /m :::::J

I G[x l ] - G[xoJ I :<:: 21 · lx 1 - xo l I G[x2] - G[x 1J I :<:: 21 · l x2 - x 1 l :<:: 212 · l x 1 - xo l 1 lxn + l - Xnl < -n · lx 1 - xo l 2 lxn+ l - xo l < lxn+ l - Xn l + lxn - Xn - 1 1 + · · · + lx 1 - xo l < 1 + � + . . · + n · lx 1 - xo l 2 2

(

�)

A geometric series estimate shows that the series converges, Xn ---+ x :::::J x 0 and f[x] = y. To complete this proof we need to show that G[�] is a contraction on some noninfinitesimal interval. The precise definition of the derivative matters because the result is false if f' [x] is defined by a pointwise limit. The function f[x] = x + x 2 sin[n/x] with j[O] = 0 has pointwise derivative 1 at zero, but is not increasing in any neighborhood of zero. 24. 1 . 6

Trig, polar coordinates , and Holditch's formula

Calculus depends on small approximations with transcendental functions like sine, cosine, and the natural logarithm. Following arc some intuitive ap proximations with non-algebraic functions.

24. Calculus with infinitesimals

376

Sine and cosine in radian measure give the :z; and y location of a poil1t on the unit circle measured a distance e along the circle. Now make a small change in the angle and magnify by 1/6@ to make the change appear unit size. y 1

Figure 24. 1 .6: Sine increment�; Since magnification does not change lines, the radial segments from the origin to the tiny increment of the circle meet the circle at right angles and appear under magnification to be parallel lines. Smoothness of the circle means that under powerful magnificat,ion, the circulc:Lr increment appears to be a line. The difference in values of the sine is the long vertical leg of the increment "triangle" above on the right. The apparent hypotenuse with length 6() is the circular increment. Since the radial lines meet the drclo at right angles tho large triangle on the unit circle at the left, with long leg cos[e] and hypotenuse 1 is similar to the increment triangle, giving

We write approximate similarity because the increment "triangle" actually has one circular side that is �-straight. In any case, t his is a convincing argument that dd�n [e] = cos[e] . A similar geometric argument on the increment triangle shows that d([;;s [e] = - sin [B] . 24. 1 . 7 Tho

The p olar area differential

derivation of sine and eosine is related to the area differential in polar coordinates. If we take an anglo i ncrement of 6@ and magnify a view of the circular arc on a circle of radius r· , the length of Lhe circnlar increment is r· · c)(}, by similarity.

24. 1 . Intuitive proofs

377

with "small" quantities

y

Figure 24. 1 .7: Polar increment

A magnified view of circles of radii r and r · + 6r bct.ween t.he rays a t angles () and () + 6() appears to be a rect an gl e with sides of lengths or and r c)() If this were a t·.rue rec t angle , its area would be 7' • (j() 6r , but it is ouly an approximate rectangle Technically, we can show that the area of this region is ·r 6() 6r p l us a term that is s mall comp ared with thi s infinit esimal , ·

.

·

.

·

Keisler's Infinite Sum Theorem 16] assures us that error and integrate with respect to TdfJdr. Theorem 4 Holditch 's formula

The area swept out by a tangent of length con·uex curue in the plane is A = 7fR2 •

R

as

we

·it

·

can neglect this size

traverses an arbitmTY

Figure 24. 1 .8: p-ip- coordin ates can sec t;his i nteres t i ng result by using a variation of pol ar coordinates and t h e infinitesimal polar area increment above. Since the curve is convex , ea,ch tangent, meets the cmve at, a unique angle, !p, and each poi nt in the region swept ou t by the tangents is a distance p along that ta n gent 'vVe

.

24. Calculus with infinitesimals

378

We look at an infinitesimal increment of the region in p-
Figure 24 .1 .9: p-only increment Next, looking at. t.hc base point. of the tangent on the cnrvc, moving t.o the

correct

Figure 24.1 .10: rp and p increment of height p Orp and base op, or area oA ·

=

p Orp Op: ·

·

Figure 24. 1 . 1 1 : da = p Orp 8p ·

·

24. 1 . Intuitive proofs with "small" quantities

3 79

Integrating gives the total area of the region

{ R { 2 n pdcpdp = nR2 . Jo Jo 24. 1 .8

Leibniz's formula for radius of curvature r

The radius of a circle drawn through three infinitely nearby points on a curve in the (x, y ) -plane satisfies

where s denotes the arclength. For example, if y = f[x] , so ds =

d dx �=1

(

)1 + (f' [x] ) 2 dx. then

)

f ' [x] = J1 + (f' [x] ) 2

If the curve is given parametrically, y Jx' [t] 2 + y' [t] 2 dt, then 1

r

24. 1 .9

( )

d dys - dxd

f" [x]

y[t] and x = x[t] , so ds

y ' [t]x" [t] - x ' [t] y " [t] (x'[t] 2 + y' [t] 2 ) 3/ 2

Changes

Consider three points on a curve lC with equal distances b.s between the points. Let a1 and a1 1 denote the angles between the horizontal and the segments connecting the points as shown. We have the relation between the changes in y and a: . [ al = b.y s1n (24. 1 . 2) -

b.s

The difference between these angles, b.a, is shown near PIII ( figure 24. 1 . 12). The angle between the perpendicular bisectors of the connecting segments is also b.a, because they meet the connecting segments at right angles. These bisectors meet at the center of a circle through the three points on the curve whose radius we denote The small triangle with hypotenuse gives r.

r

(24. 1.3)

24. Calculus with infinitesimals

380

s

Figure 24. 1 . 12: Changes in and 24 . 1 . 1 0

a

Small changes

Now we apply these relations when the distance between the successive points is an infinitesimal os. The change -0 sin[o:]

= -0

( ��)

= sin[o:]

- sin[ a - 5o:] = cos [o:] . oa + {} . oa, (24. 1 . 4)

with {} :::::J 0, by smoothness of sine (see above) . Smoothness of sine also gives, .

[ ] 00:

00:

sm 2 = 2 + rJ . oa, with rJ :::::J 0. r

Combining this with formula (24. 1 .3) for the infinitesimal case (assuming # 0) , we get OS 00: = + 00: , with :::::J 0. -

r

L

L

•

Now substitute this in (24.1 .4) to obtain

-o

( oyos )

By trigonometry, cos[o:]

=

=

os

cos[o:] --;::- + ( os, with ( :::::J 0. ·

oxjos, so

u�)

0 1 os 1 - � = -;: + C ox :::::J -;: '

24. 1 . Intuitive proofs with "small" quantities

381

as long as g� is not infinitely large. Keisler's Function Extension Axiom allows us to apply formulas (24. 1.3) and (24. 1.4) when the change is infinitesimal, as we shall see. We still have a gap to fill in order to know that we may replace infinitesimal differences with differentials (or derivatives) , especially because we have a difference of a quotient of differences. First differences and derivatives have a fairly simple rigorous version in Robinson's theory, just using the differential approximation (24. 1 . 1 ) . This can be used to derive many classical differential equations like the tractrix, catenary, and isochrone, see: Chapter 5, Differential Equations from Increment Geometry in [11]. Second differences and second derivatives have a complicated history. See [2]. This is a very interesting paper that begins with a course in calculus as Leibniz might have presented it. 24. 1 . 1 1

The natural exponential

The natural exponential function satisfies y[O] = 1 dy -=y dx We can use (24. 1. 1 ) to find an approximate solution, y[6x]

=

y[O] + y' [O] · 6

=

1 + 6x

Recursively, y[26x] y[36x] y[x]

y[6x] + y' [6x] · 6x y[6] ( 1 + 6x) (1 + 6x) 2 y[26x] + y' [26x] · 6x y[26x] · (1 + 6x) (1 + 6x) 3 =

( 1 + 6x t 1ox , for x

·

=

=

=

0, 6x, 26x, 36x,

=

·

·

·

This is the product expansion :::::J (1 + 6 ) 1 / ox , for 6x :::::J 0. No introduction to calculus is complete without mention of this sort of "in finite algebra" as championed by Euler as in [3] . A wonderful modern interpre tation of these sorts of computations is in [7] . W. A. J. Luxemburg's reformula tion of the proof of one of Euler's central formulas sin [ z] z TI%"= 1 (1 - c::7r ) 2 ) appears in our monograph, [13] . e

=

24. Calculus with infinitesimals

382

24. 1 . 12

Concerning the history of the calculus

Chapter X of Robinson's monograph [10] begins:

The history of a subject is usually written in the light of later developments. For over half a century now, accounts of the history of the Differential and In tegral Calc1Ll1LS have been based on the belief that even though the idea of a number system containing infinitely small and infinitely large elements might be consistent, it is useless for the development of Mathematical Analysis. In consequence, there is in the writings of this period a noticeable contrast between the severity with which the ideas of Leibniz and his successors are treated and the leniency accorded to the lapses of the early proponents of the doctrine of lim its. We do not propose here to subject any of these works to a detailed criticism. However, it will serve as a starting point for our discussion to try to give a fair summary of the contemporary impression of the history of the Calculus . . . I recommend that you read Robinson's Chapter X. I have often wondered if mathematicians in the time of Weierstrass said things like , 'Karl ' s epsilon-delta stuff isn ' t very interesting. All he does is re-prove old formulas of Euler.' I have a non-standard interest in the history of infinitesimal calculus. It really is not historical. Some of the old derivations like Bernoulli's derivation of Leibniz' formula for the radius of curvature seem to me to have a compelling clarity. Robinson's theory of infinitesimals offers me an opportunity to see what is needed to complete these arguments with a contemporary standard of rigor. Working on such problems has led me to believe that the best theory to use to underly calculus when we present it to beginners is one based on the kind of derivatives described in Section 2 and not the pointwise approach that is the current custom in the U.S. I believe we want a theory that supports intuitive reasoning like the examples above and pointwise defined derivatives do not.

24. 2

Keisler's axioms

The following presentation of Keisler's foundations for Robinson ' s Theory of Infinitesimals is explained in more detail in either of the ( free .pdf ) files [12] and the Epilog to Keisler's text [6] . 24.2 . 1

Small, medium , and large hyperreal numbers

A field of numbers is a system that satisfies the associative, commutative and distributive laws and has additive inverses and multiplicative inverses for nonzero elements. ( See the above references for details. ) Essentially this means the laws of high school algebra apply. The binomial expansion that follows is

24.2. Keisler's axioms

383

a consequence of the field axioms. Hence this formula holds for any pair of numbers x and b.x in a field. To compare sizes of numbers we need an ordering. An ordered field has a transitive order that is compatible with the field operations in the sense that if a < b, then a + c < b + c and if 0 < a and 0 < b, then 0 < a · b. (The complex numbers can't be ordered compatibly because i 2 - 1.) An infinitesimal is a number satisfying 1 6 1 < 1Im for any ordinary nat ural counting number, m 1 , 2, 3, Archimedes' Axiom is precisely the statement that the (Dcdckind) "real" numbers have no positive infinitcsimals. (We take 0 as infinitesimal.) Keisler's Algebra Axiom is the following: =

=

24.2 . 2

·

·

·

.

Keisler's algebra axiom

The hyperreal numbers are an ordered field extension of the real numbers. In particular, there is a positive hyperreal infinitesimal, 6.

Axiom 5

Any ordered field extending the reals has an infinitesimal, but we just include this fact in the axiom. There arc many different infinitcsimals. For example, the law a < b =? a + c < b + c applied to a 0 and b c 6 says 6 < 26. If k is a natural number, k6 < � ' for any natural m, because 6 < k ·� when 6 is infinitesimal. All the ordinary integer multiples of 6 are distinct infinitesimals, =

=

=

. . . < -36 < -26 < -6 < 0 < 6 < 26 < 36 < . . . c

Magnification at center with power 1 I 6 is simply the transformation x ---+ ( x - c) I 6, so by laws of algebra, integer multiples of 6 end up the same integers apart after magnification by 1 I 6 centered at zero.

-3

-2

-1

1

0

2

3

Figure 24.2. 1 : Magnification by i Similar reasoning lets us place � , � , on a magnified line at one half the distance to 6, one third the distance, etc. ·

·

·

24. Calculus with infinitesimals

384

0

c5 c5

;, 2 u --

c5

3 2

Figure 24.2.2: Fractions of 6 Where should we place the numbers 62 , 63 · · · ? On a scale of 6, they arc infinitely near zero, � ( 62 0) 6 :::::J 0. Magnification by 1 /62 reveals 6 2 , but moves 6 infinitely far to the right, b ( 6 0) J > m for all natural m = 1 , 2, 3. · · · =

-

-

=

Figure 24.2.3: 63 on a 62 scale Laws of algebra dictate many "orders of infinitesimal" such as 0 < · · · < 63 < 62 < 6. The laws of algebra also show that near every real number there are many hyperreals, say near 3.14159 · · · 1r =

... <

1f

-

36 <

1f

-

26 <

1f

-

6<

1f

1f

1f

1f

< + 6 < + 26 < + 36 < . . .

A hyperreal number x is called limited (or "finite in magnitude") if there is a natural number m so that lx l < m. If there is no natural bound for a hyperreal number it is called unlimited (or "infinite") . Infinitesimal numbers are limited, being bounded by 1. Theorem 6 Standard Parts of Limited Hyperreal Numbers

Every limited hyperreal number x differs from some real number by an in finitesimal, that zs, there is a r-eal r so that x :::::J r. This number- is called the "standard part" of x, r = st [x] . Define a Dedekind cut in the real numbers by A { s s :<:.:: x} and D B = {s : x < s}. st [x] is the real number defined by this cut. The fancy way to state the next theorem is to say the limited numbers are an ordered ring with the infinitesimals as a maximal ideal. This amounts to simple rules like "infsml limited infsml." Proof.

=

x

:

=

Theorem 7 Computation rules for small, medium, and large

( a) If p and q are limited, so are p + q and p · q

24.2. Keisler's axioms

385

( b)

Ij r:: and 6 are infinitesimal, so is c + 6.

( c)

If 6

;:::j

0

and q is limited, then q · 6

;:::j

0.

( d ) 1/0 is still undefined and 1/x is unlimited only when x ;:::j 0. Proof. These rules are easy to prove as we illustrate with (c) . If q is limited, there is a natural number with l q l < k. the condition 6 0 means 161 < k � , so l q · 6 1 < � proving that q · 6 ;:::j 0. D ;:::j

24.2 . 3

The uniform derivative o f x 3

Let's apply these rules to show that f[x] = x 3 satisfies the differential approximation with f' [x] 3x 2 when x is limited. We know by laws of alge bra that =

(x + 6x) 3 - x 3 = 3x 2 6x + c · 6x, with c = ((3x + 6x) · 6x) If x is limited and 6x ;:::j 0, (a) shows that 3x is limited and that 3x + 6x is also limited. Condition (c) then shows that c = ( (3x + 6x) · 6x) ;:::j 0 proving that for all limited x f[x + 6x] - f [x] j' [x] · 6x + c · 6x =

with c ;:::j 0 whenever 6x ;:::j 0. Below we will see that this computation is logically equivalent to the state ment that lim 0 (x+ ��) 3- x:l 3x 2 , uniformly on compact sets of the real line. �x ----+ It is really no surprise that we can differentiate algebraic functions using algebraic properties of numbers. You should try this yourself with f [x] = x n , f[x] 1/x. f[x] JX, etc. This does not solve the problem of finding sound foundations for calculus using infinitesimals because we need to treat transcen dental functions like sine, cosine , log. x

=

24.2.4

=

=

Keisler's function extension axiom

Keisler's Function Extension Axiom says that all real functions have exten sions to the hyperreal numbers and these "natural" extensions obey the same identities and inequalities as the original function. Some familiar identities are sin[ a + {3] = sin[ a] cos[f3] + cos[ a] sin[f3] log[x · y] = log[x] + log[ y]

24. Calculus with infinitesimals

386

The log identity only holds when x and y are positive. Keisler's Function Extension Axiom is formulated so that we can apply it to the Log identity in the form of the implication (x > 0 and y > 0) =? log[x] and log[y] are defined and log[x · y] = log[x] + log [y] The Function Extension Axiom guarantees that the natural extension of log[·] is defined for all positive hypcrrcals and its identities hold for hypcrrcal numbers satisfying x > 0 and y > 0. We can state the addition formula for sine as the implication (o: = a and (3 = {3) =? sin [a:] , sin[{3] , sin[ a: + {3] , cos[a] , cos[f3] are defined and sin[o: + {3] = sin[a:] cos[f3] + cos[a:] sin [f3] The addition formula is always true so we make the logical real statement (sec 24.2. 7 below) begin with ( a = a and {3 = {3) . 24.2.5

Logical real expressions

Logical real expressions are built up from numbers and variables using functions. (a) A real number is a real expression. (b) A variable standing alone is a real expression. (c) If e 1 , e 2 , · · , en are a real expressions and f [x 1 , x 2 , · · · , xn] is a real func tion of n variables. then f[ e1 , e 2 , · · , en] is a real expression. ·

·

24.2.6

Logical real formulas

A logical real formula is one of the following: (i) An equation between real expressions, E1 = E2 . (ii) An inequality between real expressions, E1 < E2 , E1 E1 2' E2 , or E1 # E2 .

<

E2 , E1 > E2 ,

(iii) A statement of the form .. E is defined" or of the form "E is undefined. "

24.2. Keisler ' s axioms 24. 2 . 7

387

Logical real statements

Let S and T be finite sets of real formulas. A logical real statement is an implication of the form, S =? T. The functional identities for sine and log given above are logical real statements. Axiom 8 Keisler's Function Extension Axiom Every real function f[x l , x 2 , · · , xn] has a "natural"

extension to the hy perreals such that every logical real statement that holds for all real numbers also holds for all hyperreal numbers when the real functions in the statement are replaced by their natural extensions. ·

There are two general uses of the Function Extension Axiom that under lie most of the theoretical problems in calculus. These involve extension of the discrete maximum and extension of finite summation. The proof of the Extreme Value Theorem below uses a hyperfinite maximum, while the proof of the Fundamental Theorem of Integral Calculus uses hyperfinite summation and a maximum. Theorem 9 Simple Equivalency of Limits and Infinitesimals Let f [x] be a real valued function defined for 0 < l x - a l < � with �

a fixed positive real number. Let b be a real number. Then the following are equivalent: (a) Whenever the hyperreal number x satisfies a # x :::::J a, the natural exten sion function satisfies f [x] :::::J b. (b) For every real accuracy tolerance e there is a sufficiently small positive real number 1 such that if the real number x satisfies 0 < l x- a l < 1 , then lf[x] - b l < e . Condition ( b ) is the familiar Weierstrass "epsilon-delta" condition ( written with e and I · ) Notice that the condition f[x] :::::J b is NOT a logical real statement because the infinitesimal relation is NOT included in the formation rules for forming logical real statements. Proof. We show that ( a) =? ( b ) by proving that not ( b ) implies not ( a) , the contrapositive. Assume ( b ) fails. Then there is a real e > 0 such that for every real 1 > 0 there is a real x satisfying 0 < l x - a l < 1 and l f[x] - b l 2: e . Let X [r] = x be a real function that chooses such an x for a particular f. Then we have the equivalence 1

> 0 ¢? (X [r] is defined, 0 < IX [r] - a l <

1,

l f[X[r]] - b l

2:

e)

24. Calculus with infinitesimals

388

By the Function Extension Axiom this equivalence holds for hyperreal numbers and the natural extensions of the real functions X[·] and f [·]. In particular, choose a positive infinitesimal 1 and apply the equivalence. We have 0 < IX [I ] - al < I and l f[X [r]] - b l > e and e is a positive real number. Hence, f[X [r]] is not infinitely close to b, proving not ( a ) and completing the proof that ( a) implies ( b ) . Conversely, suppose that ( b ) holds. Then for every positive real e, there is a positive real 1 such that 0 < l x - al < 1 implies lf[x] - b l < e. By the Function Extension Axiom, this implication holds for hyperreal numbers. If � :::::J a, then 0 < I� - a l < 1 for every real 1, so If[�] - b l < e for every real positive e. In other words, f [�] :::::J b, showing that ( b ) implies ( a) and completing the proof of the theorem. D Other examples of uses of the Function Extension Axiom to complete the proofs of the basic results of Section 1 follow. 24.2.8

Continuity and extreme values

We follow the idea of the proof in Section 1 for a real function f[x] on a real interval [a, b] . Coding our proof in terms of real functions. There is a real function x M [h] so that for each natural number h the max imum of the values f[x] for x = a + k6.. x , k = 1 , 2, , h and 6.. x = (b - a) /h occurs at x 111 [h] . We can express this in terms of real functions using a real function indicating whether a real number is a natural number, ·

J

[x]

=

·

·

{ 0,1 . 1f�f x -= 11,, 2,2, 3,3, : : : X

=

When the natural extension of the indicator function satisfies I[k] 1 , we say that k is a hyperinteger. ( Every limited hyperinteger is an ordinary positive integer. As you can show with these functions. ) The maximum of the partition can be described by

We want to extend this function to unlimited "hypernatural" numbers. The greatest integer function Floor [x ] satisfies, I [Floor [x ]] 1, 0 :S x - Floor [x ] :S 1 . The unlimited number 1/6, for 6 :::::J 0 gives an unlimited H Floor [x] with I[H] 1 and =

=

=

24.2. Keisler ' s axioms

389

There is a greatest partition point of any number in [a , b] , P[h, x] 1 and 0 < a + Floor [ h �.:::� ]bha with a :::; P[h, x] :::; b & I [ h P [��- a ] x - P[h, x] :::; 1/h. When we take the unlimited hypernatural number H we have x - P[H, x] :::; 1 /H 0 and P[H, x] a partition point in the sense that (a :::; x :::; b & I [H P [��l- a J = 1 ) , so we have =

;:::j

f[P[H, x]] :::; f[xM [H]]. Let rM st [x 11t [H]], the standard part. Since a :::; X M [H] :::; b, a :::; rM :::; b. Continuity of the function in the sense x 1 ;:::j x2 =? f [xl] ;:::j f [x2] gives =

f[x] ;:::j f[P[H, x]] :::; f[x M [H]] for any real x in [ a, b] .

;:::j

f[rM ] , so f[x] :::; f[rM ]

One important comment about the proof of the Extreme Value Theorem is this. The simple fact that the standard part of every hyperreal x satisfying a :::; x :::; b is in the original real interval [a. b] is the form that topological com pactness takes in Robinson's theory: A standard topological space is compact if and only if every point in its extension is near a standard point, that is, has a standard part and that standard part is in the original space. 24.2 . 9

Microscopic tangency i n one variable

Suppose f[x] and f' [x] are real functions defined on the interval (a, b), if we know that for all hyperreal numbers x with a < x < b and a -:j::, x -:j::, b, f[x + 6x] - f [x] J' [x] · 6x + c- · 6x with c ;:::j 0 whenever 6x ;:::j 0, then arguments like the proof of the simple equivalency of limits and infinitesimals above show that f' [x] is a uniform limit of the difference quotient functions on compact subintervals [a, ,6] C (a, b). More generally, in [12] we show: =

Theorem 10 Uniform Differentiability

Suppose f [x] and J' [x] are real functions defined on the open real interval a, b) ( . The following are equivalent definitions of, "The function f[x] is smooth with continuous derivative f' [x] on (a, b) . " ( a) Whenever a hyperreal x satisfies a < x < b and x is not infinitely near a or b, then an infinitesimal increment of the extended dependent variable is approximately linear on a scale of the change, that is, whenever 6x 0 f[x + 6x] - f [x] j' [x] · 6x + c · 6x with c ;:::j 0. ;:::j

=

( b) For every compact subinterval [a, {3] C (a, b), the real limit f[x + � x] - f[x] lim j ' [x] uniformly for :::; x :::; {3 . �x ---+ 0 � =

a

24. Calculus with infinitesimals

390

( c) For every pair of hyperreal x 1 f' [c] .

:::::!

x 2 with a < st [x i ]

(d) For every c in (a, b), the real double limit,

=

c < b,

lim

Xl -----+ C, X2 -----+ C

f [xt] f[xX22] -Xl -

2 ]:__:f [xt] = f[xX2 Xl

:::::!

f' [c] .

f [x+�x�- f [x] zs (e) The tmditional pointwise defined deTivative Dx f = � lim ----+ 0 x continuous on (a, b) .

Continuity of the derivative follows rigorously from the argument of Sec tion 1 , approximating the increment f[ x 1 ] - f[x 2 ] from both ends of the interval [x 1 , x 2 ] . It certainly is geometrically natural to treat both endpoints equally, but this is a "locally uniform" approximation in real-only terms because the x values are hyperreal. In his Geneml Investigations of Curved Surfaces ( original in draft of 1825, published in Latin 1827, English translation by l\1orehead and Hiltebeitel, Princeton, N.J , 1902 and reprinted by the University of Michigan Library ) , Gauss begins as follows: A curved S1Lrface is said to possess continuous C1Lrvature at one of its points A, if the directions of all stmight lines drawn from A to points of the surface at an infinitely small distance fmm A are deflected infinitely little from one and the same plane passing through A. This plane is said to touch the surface at the point A. In [4] , Chapter A6, we show that this can be interpreted as C 1 -embedded if we apply the condition to all points in the natural extension of the surface. 24. 2 . 10

The FUndamental Theorem of Integral Calculus

The definite integral J: f[x] dx is approximated in real terms by taking sums of slices of the form

f[a] · �x + f[a + �x] · �x + f[a + 2�x] · �x + · · · + f[b'] · �x, where b' a + h · �x and a + ( h + 1) · �x > b Given a real function f [x] defined on [a , b] we can define a new real function S[a , b, �x] by S[a, b, �x] f[a] · �x + f [a + �x] · �x + f [a + 2�x] · �x + · · · + f[b'] · �x, where b' = a + h · �x and a+ ( h + 1) · �x > b. This function has the properties =

=

of summation such as

I S[ a, b. �x J I :::; l f[a] l · �x + l f[a+�x] l · �x + l f[a+2�x] l · �x + . . · + l f[b'] l · � x I S[a, b, �x J I :::; M ax [ lf[xJ I : x = a , a + �x, a + 2 �x, . . . , b' · (b - a)

]

24.2. Keisler ' s axioms

39 1

We can say we have a sum of infinitesimal slices when we apply this function to an infinitesimal 6x,

1a f[x] dx � x=a L f [x] · 6x b-o

b

step 5x

or

1b f [x] dx

=

[�

l

b-ox

st

f[x] · 6x , when 6x � 0

,_,h,p O x

Officially, we code the various summations with the functions like S [ a , b, 6x] (in order to remove the function f[x] as a variable.) We need to show that this is well-defined. that is, gives the same real standard part for every in finitesimal, S [a , b, 6x] � S[a , b, L] and both are limited (so they have a common standard part.) When f [x] is continuous, we can show this "existence," but in the case of the Fundamental Theorem, if we know a real function F[x] with F' [x] = f[x] for all a ::; x ::; b, the proof in Section 1 interpreted with the extended summation functions and extended maximum functions proves this "existence" at the same time it shows that the value is F[b] - F [a] . The only ingredient needed to make this work is that max [ l �:[x, 6x] l : x = a, a + 6x . a + 26x, · · , b' ] = c[ a + k 6x, 6x] ·

�

0

This follows from the Uniform Differentiability Theorem above when we take one of the equivalent conditions as the definition of "F'[x] = j[x] for all a -< x < b." Notice that c[x, �x] is the real function f[x +'t�-f[x] - f' [x] , so we can define an infinite sum by extending the real function

Sc [ a , b � �x] 24. 2 . 1 1

=

b -ox

L l ei · 6x

x=a .step 6x

The Local Inverse FUnction Theorem

In [1] Michael Behrens noticed that the inverse function theorem is true for a function with a uniform derivative even just at one point . (It is NOT true for a pointwise derivative. ) Specifically, condition (d) of the Uniform Differentiability Theorem makes the intuitive proof of Section 1 work.

24. Calculus with infinitesimals

392

Theorem 1 1 The Inverse Function Theorem If m is a nonzero real number and the real function x :::::J xo, a real xo with Yo = f[xo] and f[x] satisfies

f[x] is defined for all

f[x 2 ] _ ] - _::_ f[x-___:_ l ---'- :::::J m whenever x 1 :::::J x 2 :::::J xo X2 - X l then f[x] has an inverse function in a small neighborhood of x0, that is, there is a real number � > 0 and a smooth real function g[y] defined when I y - Yo I < � with f[g[y]] = y and there is a real c > 0 such that if lx - xo l < c, then l f[x] - Yo l < � and g[f [x]] = x. .:___---_:_ -'-

This proof introduces a "permanence principle." When a logical real formula is true for all infinitesimals, it must remain true out to some positive real number. We know that the statement " l x - x0 1 < 6 =? f [x] is defined" is true whenever 6 :::::J 0. Suppose that for every positive real number � there was a real point r with l r - x0 1 < � where f [r] was not defined. We could define a real function U[�] = r. Then the logical real statement Proof.

� > 0 =? (r = U[�] , l r - xo l < � ' f[r] is undefined ) is true. The Function Extension Axiom means it must also be true with �

=

6 :::::J 0, a contradiction, hence, there is a positive real � so that f[x] is defined whenever lx - xo l < �.

We complete the proof of the Inverse Function Theorem by a permanence principle on the domain of y-values where we can invert f[x] . The intuitive proof of Section 1 shows that whenever I Y - Yo I < 6 :::::J 0, we have lx 1 - xo l :::::J 0, and for every natural n and k,

l xn - xo l < 2 1 x 1 - xo l , l xn+ k - x k l < 2 k1_1 l x 1 - xo l , f [xn ] is defined, I Y - f[xn+ l l l < � � Y - f[xnJ I Recall that we re-focus our infinitesimal microscope after each step in the recursion. This is where the uniform condition is used. Now by the permanence principle, there is a real � > 0 so that whenever I Y - Yo I < � ' the properties above hold, making the sequence Xn convergent. Define g[y] = n----+ Limrx> Xn · D 24. 2 . 12

Second differences and higher order smoothness

In Section 1 we derived Leibniz' second derivative formula for the radius of curvature of a curve. We actually used infinitesimal second differences, rather than second derivatives and a complete justification requires some more work.

References

393

One way to re-state the Uniform First Derivative Theorem above is: The curve y = f[x] is smooth if and only if the line through any two pairs of infinitely close points on the curve is near the same real line, X l ::::::: X2 =}

f[x l ] - f[x 2 ]

::::::: m

X l - X2

A natural way to extend this is to ask: What is the parabola through three infinitely close points? Is the (standard part) of it independent of the choice of the triple? In [8] , Vftor Neves and I show: Theorem 12 Theorem on Higher Order Smoothness

Let f[x] be a real function defined on a real open interval (a, w). Then f[x] is n-times continuously differentiable on (a, w) if and only if the nth _order differences 6n f are S-continuous on (a , w) . In this case, the coefficients of the interpolating polynomial are near the coefficients of the Taylor polynomial, whenever the interpolating points satisfy x 1

:::::::

·

·

·

::::::: Xn

::::::: b.

References

[1]

MICHAEL BEHRENS , A Local Inverse Function Theorem, in Victoria Sym Lecture Notes in Math. 369, Springer Verlag, 1974.

[2]

H . J . M . Bos, "Differentials, Higher-Order Differentials and the Derivative in the Leibnizian Calculus", Archive for History of Exact Sciences, 14 1974.

[3]

L . EULER, Introductio in Analysin Infinitorum, Tomus Primus, Lausanne, 17 48. Reprinted as L. Euler, Opera Omnia, ser. 1, vol. 8. Translated from the Latin by .J. D. Blanton, Introduction to Analysis of the Infinite, Book I, Springer-Verlag, New York, 1988.

[4]

JON BARWISE (editor) , The Handbook of Mathematical Logic, North Hol land Studies in Logic 90, Amsterdam, 1977.

[5]

BERNARD R. GELBAUl\1 and JOHN M . H. OLMSTED , in Analysis, Holden-Day Inc. , San Francisco, 1964.

[6]

H . JEROME KEISLER, Elementary Calculus: An Infinitesimal edition, PWS Publishers, 1986. Now available free at http : //www . math . wisc . edu/ -ke i sler/calc . html

posium on Nonstandard Analysis,

2nd

Counterexamples Approach,

24.

394

[7]

Calculus with infinitesimals

l\ I ARK McKINZIE and CURTIS T C CKEY , "Higher Trigonometry, Hyper real Numbers and Euler's Analysis of Infinities". Math. Magazine, 74

( 2001) 339-368.

[8]

VITOR NEVES and K. D . STROYAN, "A Discrete Condition for Higher Order Smoothness", Boletim da Sociedade Portugesa de Matematica, 3 5

(1996) 81-94.

[9] ABRAHAM ROBINSON , "Non-standard Analysis.. , Proceedings of the Royal Academy of Sciences, ser A, 64 ( 1961 ) 432-440 [10] ABRAHAM

ROBINSON, Non-standard Analysis, North-Holland Publishing Co. , Amsterdam, 1966. Revised edition by Princeton University Press, Princeton, 1996.

[11]

K . D . STROYAN, Projects for Calculus: The Language of Change, on my website at http : //www . math . uiowa . edu/%7Estroyan/Proj e ct sCD/estroyan/ indexok . htm

[12]

K . D . STROYAN, Foundations of Infinitesimal Calculus. http : //www . math . uiowa . edu/%7Estroyan/backgndct lc . htm

[13]

K . D . STROYAN and W . A . .J . LUXEl\IBURG , Introduction to the Theory of Infinitesimals. Academic Press Series on Pure and Applied Math. 72, Academic Press, New York, 1976.

Pre-University Analysis Richard 0 )Donovan *

This paper is

a

Abstract

Hrbacek's article showin g

how his

ap

Conceptual difficulties arise in elementary pedagogical approaches.

In

follow-up of

proach can be pedagogically tmiversity level.

K.

helpful

when introducing analysis at pre

most cases it remains difficult to explain at pre-university level how the

derivative is calculated at nonstandard values or how

an

internal func

tion is clefiucd. 1-Irbacek provides a modified version of 1ST lSI

(rather

P6raire's RIST) which seems to reduce all tlw,se difficulties. This system is hricHy

preR ented

here i11 its pedagogical form with

an

app licat io n to

the derivative. It must be understood as a state-of-the-art report1 .

25.1

Introduction

Infinitesimals aTe interesting when teaching analysis because they give meaning to symbols s uch as dx, dy or df(x) and all related formulae. Also, symb ol ic manipulations are usually simpler thtn with limits. In secondary school, real numbers arc introduced with no formal justifica tion. It is sometimes sh ow n that v'2 tJ_
1

College Andr&.Chavanne, Geneve, richard . o -donovan�edu . ge . ch

Switzerland.

The author has been teaching an alysis with infinitesimals at pre-university level for been possible thanks to many exch anges with Karel Hrbacek and also many helpful remarks by Keith Stroyan.

several years. This paper has

25.

396

Pre-University Analysis

Hoskins writes: "the logical basis of NSA needs to be made clear before it can be used safely" [6] . Most published books on the subject seem to confirm this view. Di Nasso, Benci and Forti state: "Roughly, nonstandard analysis consists of two fundamental tools: the star-map and the transfer principle" [2] . Foundational aspects are fundamental but they should not be a prerequisite to study infinitcsimals. Frcgc and Wittgcnstcin arc not studied in kindergarten prior to learning how to add natural numbers. Our motivations may have a more pedagogical origin than those exposed by Hrbacek in [8] , but globally the reasons why we feel unsatisfied with the avail able textbooks are the same. Because of these difficulties, some colleagues have been tempted to work with infinitesimals by ignoring the question of transfer altogether. But without transfer, analysis with infinitesimals is analysis in a non-archimedean field, and these fields are known to have very special and unwanted properties as shown in [4] and [13]. Nonstandard analysis ensures that the properties of completeness arc transferred to a certain class of objects.

25.2

Standard part

A major difficulty in nonstandard analysis is the existence of external func tions. Consider the "updown" function: j:

X f---+

2 St ( X ) - X ·

At standard scale it is indistinguishable from the identity function but at the infinitesimal scale the function is everywhere decreasing with a rate of change (slope? ) equal to -1 and it is S-continuous and satisfies the interme diate value property! Why does it not satisfy that if a continuous function has a negative slope everywhere on an interval , then the function is decreasing on that interval? It appears difficult to explain to students that statements which use the standardness predicate are "external" but that we can nonetheless use it safely for the definition of the derivative. How can a student understand when it is acceptable to use st[ ] and when is it not? Some students have pointed out that if the derivative is given by st (M�x) ) for all x, then f'(x + 6) = f' (x). The classical textbooks such as [9] , [14] or [5] use an indirect definition for the derivative at nonstandard points which makes a drastic distinction in nature between standard and nonstandard numbers: at standard points, the derivative is calculated using that point and other nonstandard neighbours but at nonstandard points some form of transfer is used. How can we be convincing when explaining why there are two different definitions?

25.3. Stratified analysis

397

Even Stroyan's uniform derivative [14] is not totally satisfying here because it requires that we recognise a "real function", which in turn means that we need a definition of what such a function is, and what an extension is.

25.3

Stratified analysis

Stratified analysis may be an answer. An interesting aspect is that the theory is adapted for elementary teaching at the same time as it is being developed. The interactions are mutual. In the adaptation for high-school , we have deliberately avoided references to the words "standard" and "nonstandard" for reasons already discussed in [10] . Instead of the binary relation C::: introduced by Hrbacek, we have suggested to use a concept of level, symbolised by v ( x ) . If a C::: b then a E v (b): a is relatively standard to b means that a is at the level of b. For high-school, we feel that this is a simpler concept. All of the adaptations have been discussed with Hrbacek so that they satisfy both the educational requirements and the theoretical consistency. The following is a possible description given to the students:

Levels We get to know the fa miliar natura l n u m bers 1 , 2, 3 , . . . when we learn to cou nt. But it would be a rare child that actually cou nts ( i n steps of 1) to more than a thousa nd or so. It is not necessary to keep going further, beca use somewhere at this point one gets the idea that the cou nti ng process never ends (this is cal l ed : potential infi n ity) , a n d one i s capable of forming the set of natural n u m bers N , i . e. it is possible to consider the col lection of all natural n u m bers as one "th i ng" ( this is ca lled: actual infinity) . It is neither necessary nor possible to cou nt all natura l n u m bers to do it! Also, sums, products, differences, quotients, powers, roots, etc. of these fa m i l iar n u m bers are or can be known at this stage. All these fa miliar n u m bers are at the sa me level. I f x is at the level of y, it means that x can be known when y ca n be know n ; written X E v(y) Fa miliar n u m bers are a l l at the coarsest level . v(O) = v ( 1 ) = v

(�)

= v ( v3)

398

25.

Pre-University Analysis

But then there are natura l n u m bers u n known at this level , n u m bers that are so l a rge (we will say "u nlimited") that they do not belong to v(O). Let K be such an u n l imited natura l n u m ber, i.e. K � v ( O ) . It ta kes a higher level of knowledge to know K, but once we do, we also know 2K, K + 1 . . . and many other objects which all belong to v(K). I n increasing our knowledge about n u m bers, we do not lose former knowledge, so 1 , 2, 3, . . . are a lso at v(K) : they rema i n known. Also, the reciproca l : -k is a n "infi nitesimal" that becomes known at the level of K. Hence a n u m ber such as, say, 3 + 15K is at v(K) (a finer level) 3 and is i nfi n itely close to 3. But aga i n , this level of knowledge does not exha ust the set N, so there are n u m bers not at v ( K ) - hence not at v ( O ) , etc. The levels potenti a l ly expand forever and never fu l l y exha ust N. •

There are many levels, each making new n u m bers known.

•

At v ( O ) there are no i nfi nitesima ls.

Infinitesimals are defined relatively to a level. a-infinitesimals and a unlimited are numbers which are not at ( a) and which are, respectively, less in modulus, greater in modulus, than any nonzero number at ( a). For an a-limited number � ' the a-shadow (noted sha (�)) is the unique number at v(a) which is a-infinitesimally close to �- The x-shadow replaces the concept of standard part in this relativistic framework2 . We give a more restricted definition of levels than Hrbacck's. We only include numbers in the levels, not sets. This slight difference can be considered as a restriction of the definition given by Hrbacek not a contradiction: the only sets we use explicitly in our course are intervals, N or JR. One of the immediate advantages of stratified analysis is the possibility to easily identify "acceptable" statements - those that will eventually transfer. A statement about x is acceptable if it makes either no reference to levels or only to the level of x. The same holds if instead of x there is a list x x1 , x 2 , . . . A function f is defined by a rule that tells us what f ( x) is when x is in the domain of f. The rule may depend on some parameters, Pl , P2 , . . . A function is acceptable if the rule either does not refer to levels or refers only to v(x, p 1 , p2 , . . . ). (Although probably not necessary at this school level, transfer ensures that if there are two different rules using parameters from different levels which give the same value for all values of the variable, then they define 2 Again. for rca�on� related to politic� in the educational realm, �hadow ha� been preferred v

v

=

to "st andard p art". coar�cr lcvcb information.

It also yields a fairly intuitive image that numbers cast a shadow on

( provided

they arc finite with rc�pcct to that level ) ; �hadow� contain lc��

25.4.

Derivative

399

the same function and its level is the coarsest one where the function can be defined) . f : x f---+ 2 sh0 (x) - x i s not an acceptable function because there is an absolute reference to v( 0) in the shadow predicate. If adapted as 2 shx ( x) - x then, as shx ( x) = x, it simplifies to the identity function. Theorems about continuity are for (acceptable) continuous functions and this function does not satisfy the conditions. The student can see this with no "external" knowledge. It seems reasonable to state that from then on, acceptable functions will be simply called functions (so the statements of theorems will look very much like the classical ones). ·

·

25.4

Derivative

--+

The derivative of a function is defined by: Let f :]a; b [ lR be a function and x E]a; b [ f is differentiable at x iff there is an L E v(x, f) such that, for any < x, f > infinitesimal h , with x + h E ]a; b[, S

h < x ,j>

( f(x

+ h) h

f(x)

)

=

L

then the derivative is f' (x) = L It is a direct observation that the derivative is "acceptable". For f : x f---+ x 2 at x = 2 the derivative is simple and works exactly as in any other nonstandard method. For x in general. the quotient simplifies to 2x + h. The only parameter of f is 2 E v(O) hence v(x, f) v(x). As h is x-infinitesinml, shx (2x + h) = 2x. For a direct calculation of f' (2 + c5), we have 2 + c) E v(c5) hence a [2 + 6]-infinitesimal is also a 6-infinitesimal. Let h be a 6-infinitesimal.

=

Thus stratified analysis satisfies one of our major requirements: a single definition of the derivative which applies to all numbers. In [10] it is shown that using the definition of infinitesimals, the students could work out the rules of computation as exercises and find that for standard a and infinitesimal c), a · c) is infinitesimal. The same exercises adapted to ac ceptable statements yield that if 6 is a-infinitesimal, then a · 6 is a-infinitesimal.

25 . Pre-University Analysis

400

25.5

Transfer and closure

Some definitions have been re-written several times and the definitions we use today may not be final. All the proofs of theorems in the syllabus must be checked and cross-checked. The only principles that are needed are Hrbacek's closure principle: If f is an accept able function, then f ( x ) E v ( x, f) for all x in the domain of f. This extends the observation that operations with "familiar" numbers do not yield infinitesimals. The other is a simple form of transfer which states that an acceptable statement is true for v ( a ) iff it is true for all v ((J) with a E v ( (J ) . This adaptation to pre-university level is not completed yet . We hope to finish a first version of a handout , with proofs, within a year or so. Our goal is still t he same: for most people, infinitesimals "are there", but can they be used to make maths easier to teach and learn and still remain rigorous? We hope to be able to contribute an answer to this question in a not too distant future.

References [1] Occam's razor, In

Encyclopaedia Britannica,

1 998. CD version.

[2] VIERI BENCI , MARC FORTI and MAURO Dr NASSO, The eightfoldpath to nonstandard analysis, Quaderni del Dipartimento di :t\1athematica Ap plicata "U. Dini", Pisa, 2004. [3] MARTIN BER Z, "Non-archimedean analysis and rigorous computation", International .Journal of Applied Mathematics, 2 (2000) 889-930. [4] MARTIN BERZ, " Cauchy theory on Levi-Civita fields", Contemporary Mathematics, 319 (2003) 39-52.

Analyse Non Standard, Hermann , 1989. Standard and Nonstandard Analysis, Ellis Horwood,

[5] F . DIENER and G . REEB , [6] RoY HOSKINS ,

1990.

[7] RoY HOSKINS, 2004. personal letter. [8] KAREL HRBACEK , Stratified analysis? , in this volume. [9] .JEROME KEISLER, Elementary Calculus, University of Wisconsin, 2000. www . math . wi s c . edu/-ke isler/cal c . html [10] JOHN KIMBER and RICHARD O ' DONOVAN, Non standard analysis at pre-university level, magnitude analysis. In D. A. Ross N . .J . Cutland, M . D i Nasso (eds.), Nonstandard Methods and Applications in Mathematics, Lecture Notes in Logic 25, Association for Symbolic Logic, 2006.

References

401

[ 1 1]

MAURO Dr NASSO and VIERI BENCI, Alpha theory, an elementary ax iomatics for nonstandard analysis, Quaderni del Dipartimento di Mathe matica Applicata "U. Dini", Pisa, 200 1 .

[ 1 2]

CAROL SCHUMACHER, Chapter Zero,

[13]

KHODR SHAMSEDDINE

[14]

K . D. STROYAN, Mathematical Background: Foundations of Infimtesimal Calculus, Academic Press, 1997.

Addison-Wesley. 2000.

and MARTIN BERZ, "Intermediate values and inverse functions on non-archimedean fields", IJMMS, (2002) 165-1 76.

www . math . u iowa . edu/%7Estroyan/backgndctlc . html

The Strength of Nonstandard Analysis

Read more

Nonstandard Analysis

Read more

Nonstandard Analysis

Read more

Nonstandard Analysis

Read more

Nonstandard Analysis

Read more

Nonstandard Analysis

Read more

Nonstandard analysis

Read more

Nonstandard Analysis

Read more

Nonstandard Analysis

Read more

Nonstandard Analysis

Read more

Nonstandard Analysis

Read more

Nonstandard Analysis

Read more

Nonstandard Analysis 001

Read more

Nonstandard Analysis. Recent Developments

Read more

Nonstandard Analysis, Axiomatically

Read more

Nonstandard Asymptotic Analysis

Read more

Nonstandard analysis, axiomatically

Read more

Applied Nonstandard Analysis

Read more

Applied nonstandard analysis

Read more

Applied nonstandard analysis

Read more

Nonstandard Analysis: Axiomatically

Read more

Nonstandard Analysis and Vector Lattices

Read more

Victoria Symposium on Nonstandard Analysis

Read more

Nonstandard Analysis, Axiomatically (Springer Monographs in Mathematics)

Read more

Nonstandard Analysis, Axiomatically (Springer Monographs in Mathematics)

Read more

An Introduction to Nonstandard Real Analysis

Read more

An introduction to nonstandard real analysis

Read more

An introduction to nonstandard real analysis

Read more

An Introduction to Nonstandard Real Analysis

Read more

An Introduction to Nonstandard Real Analysis

Read more

Recommend Documents

The Strength of Nonstandard Analysis

Nonstandard Analysis

Nonstandard glasses NONSTANDARD ANALYSIS Alain Robert University of Neuch~tel Switzerland Translated and adapted by ...

Nonstandard Analysis

Nonstandard Analysis

Martin Väth Nonstandard Analysis Birkhäuser Verlag Basel · Boston · Berlin Contents Preface vii 1 Preliminaries §...

Nonstandard Analysis

Nonstandard Analysis

Martin Väth Nonstandard Analysis Birkhäuser Verlag Basel · Boston · Berlin Contents Preface vii 1 Preliminaries §...

Nonstandard analysis

NONSTANDARD ANALYSIS DR. J. PONSTEIN With love, love, love to those five women, who caressed me. J. Ponstein former ...

Nonstandard Analysis

Martin Väth Nonstandard Analysis Birkhäuser Verlag Basel · Boston · Berlin Contents Preface vii 1 Preliminaries §...

Nonstandard Analysis

Martin Väth Nonstandard Analysis Birkhäuser Verlag Basel · Boston · Berlin Contents Preface vii 1 Preliminaries §1...

Nonstandard Analysis

Martin Väth Nonstandard Analysis Birkhäuser Verlag Basel · Boston · Berlin Contents Preface vii 1 Preliminaries §...