Peter A. Loeb · Manfred P.H. Wolff Editors
Nonstandard Analysis for the Working Mathematician Second Edition
Nonstandard Analysis for the Working Mathematician
Peter A. Loeb Manfred P.H. Wolff •
Editors
Nonstandard Analysis for the Working Mathematician Second Edition
123
Editors Peter A. Loeb Department of Mathematics University of Illinois Urbana, IL USA
ISBN 978-94-017-7326-3 DOI 10.1007/978-94-017-7327-0
Manfred P.H. Wolff Mathematical Institute University of Tübingen Tübingen Germany
ISBN 978-94-017-7327-0
(eBook)
Library of Congress Control Number: 2015946066 Primary: 03H05, 03H10, 03H15, 11U10, 12L15, 26E35, 28E05, 46S20, 47S20, 54J05 Secondary: 11B05, 11B13, 11B30, 11B75, 46B08, 46B20, 47A10, 47A58, 47D06, 47H09, 47H10, 54D30, 60G51, 60H07, 60J65, 91A06, 91B99 Springer Dordrecht Heidelberg New York London © Springer Science+Business Media Dordrecht 2015 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissions that may have been made. Printed on acid-free paper Springer Science+Business Media B.V. Dordrecht is part of Springer Science+Business Media (www.springer.com)
Preface
This book is addressed to mathematicians working in analysis, probability, and various applications. The aim is to provide an understandable introduction to the basic theory of nonstandard analysis in Part I, and then illuminate some of the most striking applications. Much of the book, in particular Part I, can be used in a graduate course; problems are posed in those chapters. After Part I, each chapter takes up a different field for the application of nonstandard analysis, beginning with a gentle introduction that even nonexperts can read with profit. The remainder of each chapter is then addressed to experts, showing how to use nonstandard analysis in the search for solutions of open problems and how to obtain rich new structures that produce deep insight into the field under consideration. The applications discussed here are in functional analysis including operator theory, topology applied to compactifications, probability theory including stochastic processes, economics including game theory and financial mathematics, and combinatorial number theory. In all of these areas, the intuitive notion of an infinitely small or infinitely large quantity plays an essential and helpful role in the creative process. For example, Brownian motion is often thought of as a random walk with infinitesimal increments; the spectrum of a selfadjoint operator is viewed as the set of “almost eigenvalues”; an ideal economy consists of an infinite number of agents each having an infinitesimal influence on the economy. Already at the level of calculus, one often views the integral as an infinitely large sum of infinitesimal quantities. Of course, the notion of an infinitesimal quantity has been used in mathematics for over 2200 years. Although it was one of the leading ideas employed during the period of mathematical development from Leibniz and Newton to Cauchy, it eluded rigorous treatment until the work of Abraham Robinson between 1957 and 1966, culminating in 1966 with the publication of his book “Non-standard Analysis”. That work finally established a rigorous foundation for the use of infinitesimals in mathematics. More precisely, Robinson constructed an ordered field extension R of the reals, R, so that all of the sequences, functions and, indeed, relations of real analysis had a unique extension to the equivalent structure built from R, and all statements true for the real structure remained true for the extended objects in the v
vi
Preface
extended structure. This essential property, known as the Transfer Principle, is the pivotal result of Robinson’s nonstandard analysis. With subsequent contributions to this new discipline from many mathematicians in the late 1960s, including a new result in standard functional analysis obtained by Robinson together with A.R. Bernstein, it became clear that nonstandard analysis was much more than a foundation of infinitesimal calculus. It promised to become a powerful tool in all branches of analysis. Then, an important breakthrough was initiated by W.A.J. Luxemburg’s 1969 paper developing nonstandard hulls, and was extended with the work of C.W. Henson and L.C.R. Moore. The prior construction of ultrapowers of the reals begun by E. Hewitt and generalized to ultraproducts of Banach spaces by J.L. Krivine in the mid-1960s had become a powerful tool in functional analysis. Nonstandard functional analysis using nonstandard hulls, discussed here in Manfred Wolff’s Chap. 4, is a far reaching generalization of these applications of ultraproducts. Chapter 4 deals with old and new applications of nonstandard analysis to the theory of Banach spaces and linear operators. In particular it considers the structure theory of Banach spaces, basic operator theory, strongly continuous semigroups of operators, approximation theory of operators and their spectra, and the Fixed Point Property. The early 1970s produced another breakthrough with P.A. Loeb’s construction of a measure space out of a “hyperfinite” discrete analogue of finite probability spaces. The simplest example was based on a coin toss of length H for an infinitely large natural number H. Immediately, Loeb’s general construction was successfully used by R.M. Anderson to formulate Brownian motion exactly as a random walk with infinitesimal increments based on that coin toss. The general procedure to extract a measure space out of some given “internal” one is now called Loeb measure theory. This theory is briefly introduced at the end of Chap. 3, and it is fully explored starting from first principles in Horst Osswald’s Chap. 6. Applications to stochastic processes, including the Itô integral as well as the Malliavin calculus, are further detailed in Chap. 7, while Yeneng Sun’s Chap. 8 contains an application solving the measurability problem that arises when one considers an infinite number of equally weighted independent agents, and more generally, a continuum of independent random variables. Most of the results of Chap. 8 come from Sun’s recent papers and are based on the richness of the Loeb construction applied to product spaces. Successful applications of nonstandard analysis have occurred in many applied areas such as mathematical physics (an example is L. Arkeryd’s research starting in the early 1980s on gas kinetics and the Boltzmann equation) and mathematical economics. Work in the latter area was initiated with the seminal 1975 paper of D.J. Brown and A. Robinson on nonstandard exchange economies. There is a need in economic theory for models of economies with a very large number of equally weighted agents, each of which has only a negligible influence on the economy. Standard models have taken as the set of agents the unit interval [0,1] supplied with Lebesgue measure. A more natural model is the “hyperfinite set” of agents used by Brown and Robinson. Yeneng Sun’s Chap. 9 of this book describes many uses of
Preface
vii
this model when combined with Loeb measure theory; it also shows why Lebesgue spaces must fail for many economic applications. Other applications of nonstandard analysis include the exploitation of compactness using the “S-topology” on the nonstandard extension of a topological space to obtain quite general and intuitive constructions of compactifications; see Chap. 5 by Insall, Loeb and Marciniak. Important and extensive applications of nonstandard analysis to combinatorial number theory are the subject of Jin and Di Nasso’s Chaps. 10 and 11 in this book. For the reader just learning nonstandard analysis, we point again to our Part I that begins with a simple form of nonstandard analysis, suitable for the results of calculus and basic real analysis. The presentation is intended to give the reader a feeling for the fundamental arguments of nonstandard analysis with a minimum use of model theory. The reader who begins with no background in mathematical logic should easily pick up what is needed to continue. Chapter 2 of Part I is devoted to general nonstandard analysis and presents the heart of Robinson’s theory. It is written so that the interested reader learns all that is needed for later applications without being forced to read detailed model theoretic constructions, some of which are postponed to the appendix of the chapter. Part I concludes in Chap. 3 with further applications. The authors have been asked from time to time about the relation between Robinson’s nonstandard analysis and the subject of “internal set theory,” initiated by Edward Nelson in the 1977 Bulletin of the American Mathematical Society. A good recent development of that framework with applications can be found in Nader Vakil’s text cited here in the references to Chap. 2. What is the difference? The Robinson framework adds to the standard mathematical “world” a second “nonstandard” mathematical world. There are no infinitely large integers in the standard world, but they do exist in the nonstandard world. Robinson chose the name “nonstandard analysis” because the nonstandard world is used to analyze the standard one. Internal set theory, on the other hand, works with only the nonstandard world, but recognizes some elements of that world as being “standard”. Important developments in the Robinson framework have taken objects formed in the nonstandard world and adjoined them to the standard world. For example, equivalence classes of “remote points” become compactifying boundary points of standard topological spaces. Quotients of points in the nonstandard extension of Banach spaces become new Banach spaces in the standard world. Measure spaces formed from nonstandard point-sets become rich measure spaces in the standard world. These constructions do not make sense in internal set theory because there is no standard world. In working through the foundations for nonstandard analysis presented in this book, the reader will gain many new and helpful insights into the enterprise of mathematics. Once these foundations are understood, research formulated in the framework of internal set theory can be easily understood with just some translation of terminology. The editors have found, however, that the reverse is not generally true. Therefore, the reader may best be served by starting here at least with Part I, or with a similar introduction to Robinson’s theory.
viii
Preface
The editors would like to thank the contributors to this second edition for their outstanding contributions to this project. We also thank Erik Talvila, who made numerous helpful suggestions for improvements of the first edition. Finally, the editors dedicate this book to one of the founders of nonstandard analysis; he is our mentor, colleague, and friend, W.A.J. (Wim) Luxemburg. Champaign-Urbana, IL, USA Tübingen, Germany February 2015
Peter A. Loeb Manfred P.H. Wolff
Contents
Part I 1
2
An Introduction to Nonstandard Analysis
Simple Nonstandard Analysis and Applications . . . . . . . . . . Peter A. Loeb 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 A Simple Construction of a Nonstandard Number System 1.3 A Simple Language . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4 Interpretation of the Language L . . . . . . . . . . . . . . . . . . 1.5 Transfer Principle for R . . . . . . . . . . . . . . . . . . . . . . . 1.6 The Nonstandard Real Numbers . . . . . . . . . . . . . . . . . . 1.7 Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.8 Topology on the Reals . . . . . . . . . . . . . . . . . . . . . . . . 1.9 Limits and Continuity . . . . . . . . . . . . . . . . . . . . . . . . . 1.10 Differentiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.11 Riemann Integration . . . . . . . . . . . . . . . . . . . . . . . . . . References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . An Introduction to General Nonstandard Analysis . . . Peter A. Loeb 2.1 Superstructures . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Language for Superstructures . . . . . . . . . . . . . . . 2.3 Interpretation of the Language for Superstructures. 2.4 Monomorphisms and the Transfer Principle . . . . . 2.5 Ultrapower Construction of Superstructures and Monomorphisms . . . . . . . . . . . . . . . . . . . . . 2.6 Special Index Sets Yielding Enlargements . . . . . . 2.7 A Result in Infinite Graph Theory . . . . . . . . . . . 2.8 Internal and External Sets . . . . . . . . . . . . . . . . . 2.9 Saturation . . . . . . . . . . . . . . . . . . . . . . . . . . . . References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.... . . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
3
. . . . . . . . . . . .
3 7 11 12 15 18 23 26 28 30 32 35
.........
37
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
37 38 39 41
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
44 50 52 53 58 78
ix
x
3
Contents
Topology and Measure Theory . . . . . . . . . . . . . . . . . . Peter A. Loeb 3.1 Metric and Topological Spaces . . . . . . . . . . . . . . . 3.2 Continuous Mappings . . . . . . . . . . . . . . . . . . . . . 3.3 Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 More on Topologies . . . . . . . . . . . . . . . . . . . . . . 3.5 Compact Spaces . . . . . . . . . . . . . . . . . . . . . . . . . 3.6 Product Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . 3.7 Relative Topologies . . . . . . . . . . . . . . . . . . . . . . 3.8 Uniform Continuity and Uniform Spaces . . . . . . . . 3.9 Nonstandard Hulls . . . . . . . . . . . . . . . . . . . . . . . 3.10 Compactifications . . . . . . . . . . . . . . . . . . . . . . . . 3.11 The Base and Antibase Operators . . . . . . . . . . . . . 3.12 Measure and Probability Theory . . . . . . . . . . . . . . 3.12.1 The Martingale Convergence Theorem . . . 3.12.2 Representing Measures in Potential Theory References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Part II 4
........ . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
79
. . . . . . . . . . . . . . .
79 83 83 84 85 88 88 89 92 93 93 97 99 101 103
.....
107
..... .....
107 108
.....
108
.....
114
..... ..... .....
116 119 119
. . . . . . . . . .
119 124 130 133 133 133 135 137 137 137
Functional Analysis
Banach Spaces and Linear Operators . . . . . . . . . . . . . . . . Manfred P.H. Wolff 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Basic Nonstandard Analysis of Normed Spaces . . . . . . 4.2.1 Internal Normed Spaces and Their Nonstandard Hull . . . . . . . . . . . . . . . . . . . . . 4.2.2 Standard Continuous and Internal S–continuous Linear Operators . . . . . . . . . . . . . . . . . . . . . . 4.2.3 Special Banach Spaces and Their Nonstandard Hulls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.4 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Advanced Theory of Banach Spaces . . . . . . . . . . . . . . 4.3.1 A Brief Excursion to Locally Convex Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . 4.3.2 General Banach Spaces . . . . . . . . . . . . . . . . . 4.3.3 Banach Lattices . . . . . . . . . . . . . . . . . . . . . . 4.3.4 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Elementary Theory of Linear Operators . . . . . . . . . . . . 4.4.1 Compact Operators . . . . . . . . . . . . . . . . . . . . 4.4.2 Fredholm Operators . . . . . . . . . . . . . . . . . . . 4.4.3 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5 Spectral Theory of Operators . . . . . . . . . . . . . . . . . . . 4.5.1 Basic Definitions and Facts . . . . . . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
Contents
xi
4.5.2 4.5.3
The Spectrum of an S–bounded Internal Operator . . The Spectrum of Compact Operators and the Essential Spectrum . . . . . . . . . . . . . . . . . 4.5.4 Closed Operators and Pseudoresolvents . . . . . . . . . 4.5.5 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.6 Selected Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.6.1 Strongly Continuous Semigroups . . . . . . . . . . . . . 4.6.2 Approximation of Operators and of Their Spectra . . 4.6.3 Super Properties . . . . . . . . . . . . . . . . . . . . . . . . . 4.6.4 The Fixed Point Property . . . . . . . . . . . . . . . . . . . 4.6.5 References to Further Applications of Nonstandard Analysis To operator Theory . . . . . . . . . . . . . . . . 4.6.6 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Part III 5
6
139
. . . . . . . .
. . . . . . . .
141 142 144 145 145 147 152 155
.. .. ..
158 158 159
.......
165
. . . . .
. . . . .
165 167 170 174 176
....
179
. . . . . . . . . .
. . . . . . . . . .
179 182 182 185 186 188 189 190 195 197
....
197
Compactifications
General and End Compactifications. . . . . . . . . . . . . . . . Matt Insall, Peter A. Loeb and Małgorzata Aneta Marciniak 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 General Compactifications . . . . . . . . . . . . . . . . . . . 5.3 End Compactifications. . . . . . . . . . . . . . . . . . . . . . 5.4 Product Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Part IV
..
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
Measure and Probability Theory
Measure Theory and Integration . . . . . . . . . . . . . . . . . . . . . Horst Osswald 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Loeb Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.1 Loeb Measure Spaces . . . . . . . . . . . . . . . . . . . 6.2.2 Loeb Measures over Gaußian Measures . . . . . . . 6.2.3 Loeb Measurable Functions . . . . . . . . . . . . . . . 6.2.4 Loeb Spaces over the Product of Internal Spaces 6.2.5 The Hyperfinite Time Line T . . . . . . . . . . . . . . 6.2.6 Lebesgue Measure as a Counting Measure . . . . . 6.2.7 Adapted Loeb Spaces . . . . . . . . . . . . . . . . . . . 6.3 Standard Integrability for Internal Measures . . . . . . . . . . 6.3.1 The Definition of S-integrability and Equivalent Conditions . . . . . . . . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
xii
Contents
6.3.2 6.3.3 6.3.4 6.3.5 6.3.6 6.3.7 6.3.8 6.4 Internal 6.4.1 6.4.2 6.4.3 6.4.4 6.4.5 6.4.6 6.4.7 References. . . 7
lL -integrability and Sl -integrability . . . . . . . . . . . Integrable Functions defined on Nn K ½0; 1½m Standard Part of the Conditional Expectation . . . . Characterization of S-integrability . . . . . . . . . . . . Keisler’s Fubini Theorem . . . . . . . . . . . . . . . . . Hyperfinite Representation of the Tensor Product . On Symmetric Functions . . . . . . . . . . . . . . . . . . and Standard Martingales . . . . . . . . . . . . . . . . . . Stopping Times and Doob’s Upcrossing Result. . . The Maximum Inequality. . . . . . . . . . . . . . . . . . Doob’s Inequality . . . . . . . . . . . . . . . . . . . . . . . The Burkholder Davis Gundy Inequalities . . . . . . S-integrability of Internal Martingales . . . . . . . . . S-continuity of Internal Martingales. . . . . . . . . . . The Standard Part of Internal Martingales . . . . . . ....................................
Stochastic Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Horst Osswald 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 The Itô Integral for the Brownian Motion . . . . . . . . . . . . 7.2.1 The S-Continuity of the Internal Integral . . . . . . . 7.2.2 The S-Square-Integrability of the Internal Itô Integral. . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2.3 Adaptedness and Predictability . . . . . . . . . . . . . . 7.2.4 The Standard Itô Integral . . . . . . . . . . . . . . . . . . 7.2.5 Integrability of the Itô Integral . . . . . . . . . . . . . . 7.2.6 The Wiener Measure . . . . . . . . . . . . . . . . . . . . . 7.3 The Iterated Integral . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3.1 The Definition of the Iterated Integral . . . . . . . . . 7.3.2 On Products of Iterated Integrals. . . . . . . . . . . . . 7.3.3 The Continuity of the Standard Iterated Integral Process . . . . . . . . . . . . . . . . . . . . . . . . 7.3.4 The W CH -Measurability of the Iterated Itô Integral 7.3.5 InM ðf Þ is a Continuous Version of the Standard Part of InM ðFÞ . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3.6 Continuous Versions of Iterated Integral Processes 7.4 Beginning of Malliavin Calculus. . . . . . . . . . . . . . . . . . . 7.4.1 Chaos Decomposition . . . . . . . . . . . . . . . . . . . . 7.4.2 A Lifting Theorem for Functionals in L2W ðCL Þ . . . 7.4.3 Computation of the Kernels . . . . . . . . . . . . . . . . 7.4.4 The Kernels of the Product of Wiener Functionals 7.4.5 The Malliavin Derivative . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
200 205 210 211 213 217 220 222 223 224 224 225 225 226 226 230
...
233
... ... ...
233 237 238
. . . . . . . .
. . . . . . . .
243 245 247 248 250 252 252 256
... ...
259 260
. . . . . . . .
262 263 264 265 270 271 273 276
. . . . . . . .
. . . . . . . .
. . . . . . . .
Contents
xiii
7.4.6 7.4.7 7.4.8 7.4.9 7.4.10
A Commutation Rule for Derivative and Limit . . The Clark-Ocone Formula . . . . . . . . . . . . . . . . A Lifting Theorem for the Derivative . . . . . . . . The Skorokhod Integral . . . . . . . . . . . . . . . . . . Product and Chain Rules for the Malliavin Derivative . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.5 Stochastic Integration for Symmetric Poisson Processes . . 7.5.1 Orthogonal Increments. . . . . . . . . . . . . . . . . . . 7.5.2 From Internal Random Walks to the Standard Poisson Integral . . . . . . . . . . . . . . . . . . . . . . . 7.5.3 Iterated Integrals . . . . . . . . . . . . . . . . . . . . . . . 7.5.4 Multiple Integrals . . . . . . . . . . . . . . . . . . . . . . 7.5.5 The r-Algebra D generated by the Wiener-Lévy Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.6 Malliavin Calculus for Poisson Processes . . . . . . . . . . . . 7.6.1 Chaos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.6.2 Malliavin Derivative . . . . . . . . . . . . . . . . . . . . 7.6.3 Exchange of Derivative and Limit. . . . . . . . . . . 7.6.4 The Clark-Ocone Formula . . . . . . . . . . . . . . . . 7.6.5 The Skorokhod Integral . . . . . . . . . . . . . . . . . . 7.6.6 Smooth Representations. . . . . . . . . . . . . . . . . . 7.6.7 The Product Rule . . . . . . . . . . . . . . . . . . . . . . 7.6.8 The Chain Rule . . . . . . . . . . . . . . . . . . . . . . . References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
. . . .
. . . .
277 278 280 281
.... .... ....
284 288 288
.... .... ....
290 293 297
. . . . . . . . . . .
. . . . . . . . . . .
298 302 302 305 306 307 309 310 311 315 317
...
321
. . . . . . . . . . . .
321 322 324 326 327 330 332 336 338 340 343 344
New Understanding of Stochastic Independence . . . . . . . . . . . Yeneng Sun 8.1 The General Context . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2 The Specific Problems. . . . . . . . . . . . . . . . . . . . . . . . . . 8.3 Difficulties in the Classical Framework . . . . . . . . . . . . . . 8.4 The Resolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.5 Exact Law of Large Numbers. . . . . . . . . . . . . . . . . . . . . 8.6 Converse Law of Large Numbers . . . . . . . . . . . . . . . . . . 8.7 Almost Equivalence of Pairwise and Mutual Independence 8.8 Duality of Independence and Exchangeability. . . . . . . . . . 8.9 Grand Unification of Multiplicative Properties . . . . . . . . . 8.10 Discrete Interpretations . . . . . . . . . . . . . . . . . . . . . . . . . 8.11 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . .
. . . . . . . . . . .
. . . .
. . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
xiv
Contents
Part V 9
Economics and Nonstandard Analysis
Nonstandard Analysis in Mathematical Economics . . . . . . . . . Yeneng Sun 9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2 Distribution and Integration of Correspondences . . . . . . . 9.2.1 Distribution of Correspondences . . . . . . . . . . . . . 9.2.2 Integration of Correspondences . . . . . . . . . . . . . 9.3 Nash Equilibria in Games with Many Players. . . . . . . . . . 9.3.1 General Existence of Nash Equilibria in the Loeb Setting . . . . . . . . . . . . . . . . . . . . . . 9.3.2 Nonexistence of Nash Equilibria in the Lebesgue Setting . . . . . . . . . . . . . . . . . . . 9.4 Nash Equilibria in Finite Games with Incomplete Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.4.1 Nonexistence of Nash Equilibria for Games with Information . . . . . . . . . . . . . . . . . . . . . . . . 9.4.2 Approximate Nash Equilibria for Large Finite Games and Idealizations . . . . . . . . . . . . . . . . . . 9.4.3 General Existence of Nash Equilibria for Games with Information . . . . . . . . . . . . . . . . . . . . . . . . 9.5 Exact Law of Large Numbers and Independent Set-Valued Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.6 Competitive Equilibria in Random Economies . . . . . . . . . 9.7 General Risk Analysis and Asset Pricing . . . . . . . . . . . . . 9.7.1 General Risk Analysis for Large Markets. . . . . . . 9.7.2 The Equivalence of Exact No Arbitrage and APT Pricing. . . . . . . . . . . . . . . . . . . . . . . . 9.8 Independent Universal Random Matching . . . . . . . . . . . . 9.9 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Part VI
...
349
. . . . .
. . . . .
349 356 356 361 363
...
364
...
365
...
368
...
368
...
370
...
373
. . . .
. . . .
. . . .
375 380 383 383
. . . .
. . . .
. . . .
388 389 392 396
.......
403
. . . . . . .
403 405 407 411 417 419 432
. . . . .
Combinatorial Number Theory
10 Density Problems and Freiman’s Inverse Problems . . . . Renling Jin 10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.2 Applications to Density Problems . . . . . . . . . . . . . . 10.2.1 Sumset Phenomenon . . . . . . . . . . . . . . . . . 10.2.2 Plünnecke Type of Inequalities for Densities 10.3 Applications to Freiman’s Inverse Problems . . . . . . . 10.3.1 Freiman’s Inverse Problem for Cuts . . . . . . 10.3.2 Freiman’s 3jAj 3 þ b Conjecture . . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
Contents
xv
10.3.3
Freiman’s Inverse Problem for Upper Asymptotic Density. . . . . . . . . . . . . . . . . . . . . . . . . References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 Hypernatural Numbers as Ultrafilters . . . . . . . . . . . . . . Mauro Di Nasso 11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 The u-equivalence. . . . . . . . . . . . . . . . . . . . . . . . . 11.3 Hausdorff S-topologies and Hausdorff Ultrafilters . . . 11.4 Regular and Good Ultrafilters. . . . . . . . . . . . . . . . . 11.5 Ultrafilters Generated by Pairs . . . . . . . . . . . . . . . . 11.6 Hyper-Shifts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.7 Nonstandard Characterizations in the Space ðbN; Þ . 11.8 Idempotent Ultrafilters . . . . . . . . . . . . . . . . . . . . . . 11.9 Final Remarks and Open Questions. . . . . . . . . . . . . References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
439 440
.......
443
. . . . . . . . . .
. . . . . . . . . .
443 445 450 454 458 462 466 468 471 473
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
475
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
Part I
An Introduction to Nonstandard Analysis
Chapter 1
Simple Nonstandard Analysis and Applications Peter A. Loeb
1.1 Introduction The notion of an infinitesimal number has been used in mathematical arguments since before the time of Archimedes, some 2200 years ago. We understand infinitesimals in terms of a scale set by ordinary numbers. For a positive number to be infinitesimal, it must be greater than 0 and yet smaller than any positive number you might write using a decimal expansion. Infinitesimals played a fundamental role in the development of the calculus, starting in the late 1600s with the work of Newton and Leibniz. They, in particular Leibniz, used infinitesimal numbers to define the derivative and the integral. Even today, the modern definitions of limits, derivatives and integrals can be simplified with the use of infinitesimal numbers. Consider, for example, calculating the total force of water on a dam. If we use infinitesimals, we can calculate the total force by cutting the dam up into horizontal strips of infinitesimal width. In each strip, the pressure changes by only an infinitesimal amount. By taking the pressure at any point in the strip and multiplying by the area of a rectangle that approximates the strip, we will get the actual force on the strip except for an infinitesimal error. The sum of these errors over all of the strips will still be infinitesimal. Therefore, the total force on the dam is the nearest ordinary number to the sum of the approximations we obtain for each strip. Here we come to a question we will answer later: What do we mean by the sum over all of the strips? It is helpful to think in terms of infinitesimals in branches of mathematics beyond calculus. For example, Brownian motion is the random motion of microscopic particles suspended in a liquid or gas. Under a microscope you can clearly see this zig-zag random motion caused by the collision of the particles with molecules in the suspending medium. An important part of mathematical probability is the construction P.A. Loeb (B) Department of Mathematics, University of Illinois, 1409 West Green Street, Urbana, IL 61801, USA e-mail:
[email protected] © Springer Science+Business Media Dordrecht 2015 P.A. Loeb and M.P.H. Wolff (eds.), Nonstandard Analysis for the Working Mathematician, DOI 10.1007/978-94-017-7327-0_1
3
4
P.A. Loeb
of mathematical models for this motion. A very convenient model has each particle performing a random walk with steps of infinitesimal length. That is, one divides time into infinitesimal intervals, and in each interval, the particle moves in a straight line over an infinitesimal distance. At the end of each time interval, the particle chooses at random a new direction for its motion. Many probabilists do not know how to make a mathematically correct version of this construction. Nevertheless, they still think of Brownian motion in these terms. For them, it is a useful fiction, but in this book, we will see how to turn that fiction into rigorous mathematics. Infinitesimals have always been regarded as a useful fiction that facilitates mathematical computation and invention. This fiction has always had its critics. Nevertheless, infinitesimals were used during the 18th and 19th centuries to flesh out what we now call “the Calculus”. Then, in the late 19th century, this theoretical foundation, which seemed unlikely ever to be made clear, was replaced with what is now called the ε-δ method. In the middle of the last century, mathematicians began removing infinitesimals from all mathematics courses. Thus infinitesimals gradually faded from use, persisting only as an intuitive aid to conceptualization by physicists and engineers, and even mathematicians when setting up multiple integrals. In 1960, the mathematical logician Abraham Robinson gave a clear, mathematically correct foundation for the use of infinitesimals in all branches of mathematics. (See [6].) Robinson’s foundation started with a mathematical object such as the real number system or the real number system together with another object such as a topological space or a Hilbert space. For now, we don’t have to worry about what the second object might be. As befit a logician, Robinson set up a formal language to express facts about the mathematical system with which he was working. That system formed what Robinson called a standard model for the theorems expressed in his formal language. One thinks of the standard model as a world that exists in some Platonic sense. The theorems in the language are correct statements about this world. What Robinson showed was that there is another mathematical object, called a nonstandard model, in which there exist positive infinitesimal numbers, and yet all of the theorems in the language are correct statements about this model as well. Informally, what we have are two worlds, the standard and the nonstandard, and the theorems about the first are also correct statements about the second. One uses the nonstandard model to analyze the standard model, and for this reason, Robinson called his method and results “nonstandard analysis.” One way to explain Robinson’s existence result is to invoke a theorem of Gödel. Take a name not used for anything in the standard number system—for example, Bach. To the theorems about the standard number system we add new statements: “Bach is bigger than 1”, “Bach is bigger than 2”, etc. We add one such statement for each natural number. The standard number system is not a model for the collection of theorems augmented by these statements about Bach. It is, however, a model for any finite subset of the augmented collection. To see this, all one has to do is look for the biggest number mentioned in a given finite set of statements and let Bach be the name of a number that is even bigger. Since every finite subset of our augmented collection of statements has a model, namely the ordinary numbers, it follows from a result of Gödel that the entire augmented collection of statements has a model. That
1 Simple Nonstandard Analysis and Applications
5
is, there is a number system extending the standard one in which Bach is the name of an infinite positive number. Bach’s reciprocal, i.e., 1 divided by Bach, is then an infinitesimal number. Another approach to understanding Robinson’s result is to construct a simple number system with infinitesimals using sequences of real numbers. A sequence of positive real numbers tending to zero, such as the sequence 1, 1/2, 1/3, 1/4, . . . , becomes a positive infinitesimal. A sequence of real numbers increasing to infinity, such as the sequence 1, 2, 3, . . . , becomes an infinite nonstandard number. We will say more about this in the next section. For applications, it is usually simpler just to work with the properties of a nonstandard number system, keeping in mind that any theorem for the ordinary numbers is also a theorem, when properly interpreted, for the enlarged number system. What does it mean to say “when properly interpreted”? Briefly, we can’t formally specify what we mean when we say “all” subsets of a given set. Even for the set of natural numbers, the idea of all subsets cannot be formalized. This inability to formalize the notion of “all subsets” means that when interpreting theorems in the nonstandard model, we can cheat. We don’t interpret the word “all” to really mean “all”. We work instead with what are called internal sets, and interpret “all sets” to mean all internal sets. We will postpone the problem of what is meant by “all” and the explanation in terms of internal sets until the next chapter of this book. We will first consider in this chapter a simplified formulation of nonstandard analysis. Experience shows that a simplified formulation is often a necessary beginning for non-logicians who wish to apply this powerful mathematical tool. Our formulation is based on a modification of H.J. Keisler’s 1976 calculus book [3], which uses infinitesimals. Keisler’s approach is based on Robinson’s foundations, and our version of that approach first appeared in the book [2] coauthored with A.E. Hurd. In Keisler’s calculus book, the nonstandard real numbers are an extension, with infinitesimals, of the usual number system. Every function defined for the standard real numbers also works for the enlarged set of numbers. If two finite systems of equalities and inequalities have the same solutions for the real number system, then they have the same solutions for the enlarged number system. These rules, which suffice for the calculus and basic real analysis, will be modified to form the simple introduction to nonstandard analysis presented in this chapter. Of fundamental importance for many of the applications in this chapter and applications later in this book is a generalization of the notion of a finite set. In the standard world, a set is finite if it can be enumerated with natural numbers, finishing with a largest natural number. Once we have extended the real numbers, we will have also extended the set of natural numbers, thus obtaining nonstandard natural numbers that are bigger than any ordinary natural number. Such numbers are called infinite or “unlimited” natural numbers. The number “Bach” mentioned earlier is an example. If a set can be formally enumerated with the standard and nonstandard natural numbers up to an unlimited natural number, then the set is called a hyperfinite set. Hyperfinite sets are infinite sets, but they have all of the formal properties of a finite sets. In particular, since we can sum any finite set of real numbers, the summing
6
P.A. Loeb
function in the nonstandard model also gives an answer for any hyperfinite set. Thus we have a meaning for a sum of infinitesimal numbers used in integration theory. Before we continue with our simple framework for real analysis, here is a brief description of some of the applications of nonstandard analysis that will be considered later in this book. Hyperfinite sets play a central role in such applications to many areas of mathematics beyond the calculus. In probability, for example, it is easy to analyze a finite coin toss; it is much harder to analyze an infinite coin toss. On the other hand, in a nonstandard real number system, where there are unlimited natural numbers, we can choose such a number, again call it Bach, and, at least in our imagination, we can perform a hyperfinite coin toss. That is, we can toss the coin Bach times. Now any particular outcome has probability equal to one divided by two raised to the power Bach. Moreover, this hyperfinite coin tossing space contains all standard infinite coin tosses. The problem is, to make contact with the rest of mathematics, one needs to superimpose on the set of hyperfinite coin tosses a standard probability space structure using standard numbers. In 1975 [4], the author showed how this could be done and developed a measure theory for hyperfinite probability spaces. In 1976 in [1], Robert Anderson used this measure theory for coin tossing spaces to construct a model for Brownian motion and the Itô stochastic integral. In Anderson’s model, the underlying set of elementary outcomes for Brownian motion is a set of random walks with infinitesimal steps. One divides time up into infinitesimal intervals, and at the beginning of each interval tosses a coin. If the toss is a head, one moves to the right; if the toss is a tail, one moves to the left. The step size is the square root of the time change. This is how to make rigorous the probabilist’s intuitive model for Brownian motion. In the last 40 years, the theory of hyperfinite measure spaces has been used to obtain new mathematical results in probability theory and many other areas of mathematics including potential theory, number theory, mathematical physics, and mathematical economics and finance. In mathematical economics, which will be discussed by Yeneng Sun later in this book, a central problem is to study equilibria in economies with a very large number of individuals when each individual has only a negligible influence on the economy. For this, it is quite natural to consider an economy with a hyperfinite number of individuals, each individual having only an infinitesimal influence on the economy. In particular, hyperfinite measure theory and its extension to more general nonstandard measure spaces has been used by Yeneng Sun to finally make rigorous the treatment of a continuum of independent random variables and traders in an economy. The aim of this book is to make Robinson’s discovery and some of the subsequent research available to the working mathematician. As noted, the reader who begins with this chapter of the book should easily pick up what is needed to go on. To accommodate readers with little or no background in mathematical logic, we have begun, as in the first chapter of [2], with a simple construction of a nonstandard number system associated with a very simple system of logic. The simplicity of the system of logic helps to illuminate the way logic is used to obtain mathematical results. It also simplifies the proof of the “transfer principle”, which is the basic property of
1 Simple Nonstandard Analysis and Applications
7
nonstandard number systems. The skills the reader needs for more advanced work are developed through applications of this simple system to calculus and elementary real analysis. In the next chapter of this book, we develop the full theoretical background needed to discuss modern developments and applications.
1.2 A Simple Construction of a Nonstandard Number System First, we extend the real numbers R with an ordered field ∗ R containing infinitesimals. We denote the natural numbers by N and the set of all subsets of N by P(N). We will construct ∗ R from sequences of real numbers. It is well known that sequences of real numbers with pointwise operations do not form a field. For example, let E be the set of even natural numbers, and let O be the set of odd natural numbers. The product of characteristic functions χ E · χ O is identically 0, and should represent 0. To have a field, therefore, one of the sequences χ E or χ O must also represent 0; we may decide which it should be. We will make this decision and all other such decisions in one step using a “free ultrafilter”. Definition 1.2.1 A free ultrafilter in N is a collection U ⊂ P(N) such that (1) (2) (3) (4)
∅∈ / U, A ∈ U & B ∈ U => A ∩ B ∈ U, A⊂N& A∈ / U => N \ A ∈ U, A a finite subset of N => N \ A ∈ U.
Properties 1, 2, and 3 make U an ultrafilter. They imply that any set containing a set in U is also in U. If we replace Property 3 with this weaker property, then, given Property 4, we have just a free filter in N. Ultrafilters are either free or fixed. There is a fixed ultrafilter for each n ∈ N; it is the collection {A ⊂ N : n ∈ A}. A free ultrafilter, on the other hand, contains the filter consisting of all complements of finite subsets of N. An easy application of Zorn’s Lemma shows that such an ultrafilter exists. The fixed ultrafilter consisting of all sets containing the singleton {n} corresponds to unit mass at n. Free ultrafilters on N correspond to 0-1 valued finitely additive measures on P(N). Problem: Show that for any finite partition A1 , . . . , An of N and any ultrafilter U in N, one and only one of the sets Ai is in the ultrafilter U. Answer: Note that if V ∈ U and V = A ∪ B with A ∩ B = ∅, then either A ∈ U or B = V ∩ (N \ A) ∈ U but not both since ∅ ∈ / U. Now, for the Ai s, not more than / U. Either A1 or ∪i>1 Ai is in U. If the latter, one of the Ai s can be in U since ∅ ∈ / U for i < n, then either A2 is in U or ∪i>2 Ai ∈ U. Continuing, we see that if Ai ∈ then An ∈ U.
8
P.A. Loeb
Definition 1.2.2 Given an ultrafilter U, we say a property holds a.e. if it holds on some set U ∈ U. We set a sequence ri ≡ si when ri = si a.e. That is, when {i ∈ N : ri = si } is in U. It is clear that the relation ≡ is an equivalence relation. In particular, if ri ≡ si
and si ≡ ti , then ri ≡ ti , since {i ∈ N : ri = ti } ⊇ {i ∈ N : ri = si } ∩ {i ∈ N : si = ti }. Definition 1.2.3 We will write [< ri >] for the equivalence class containing the sequence ri , and we will use ∗ R to denote the collection of equivalence classes. The set ∗ R is called the set of nonstandard real numbers or hyperreal numbers. Its formation using an ultrafilter is called an ultrapower construction. Returning to the example given at the beginning of this section, we now have either χ E ≡ 0 or χ O ≡ 0. Thus, there is no longer the problem with 0 divisors. We next define the operations + and · together with an order relation < for ∗ R. Definition 1.2.4 Given real sequences < ri > and < si >, we set [< ri >] + [< si >] = [< ri + si >], [< ri >] · [< si >] = [< ri · si >], |[< ri >]| = [< |ri | >] [< ri >] < [< si >] if ri < si a.e. Proposition 1.2.5 The operations + and ·, the mapping |·|, and the ordering < are independent of the choice of representing sequences. Proof The proof is left to the reader. Note that the set of real numbers R (these are also called the standard numbers) is imbedded in the set of nonstandard real numbers ∗ R via map c → [< c >]. For example, 5 is mapped to the equivalence class [< 5 >] containing the constant sequence < 5 >. We write ∗ c for [< c >], but we will later drop the star. We will call ∗ c the extension of c. Similarly, we have an extension for every n-tuple of real numbers c1 , . . . , cn given by ∗ c1 , . . . , ∗ cn . Proposition 1.2.6 The structure (∗ R, +, ·, <) is an ordered field extension of the ordered field (R, +, ·, <). Proof To show, for example, that the trichotomy law holds, we consider r = [< ri >] and s = [< si >]: One of the sets {i ∈ N : ri < si }, {i ∈ N : ri = si }, {i ∈ N : ri > si } is in U. The rest is left to the reader.
1 Simple Nonstandard Analysis and Applications
9
Problem: Show that if r = [ ri ], then |r | = [ |ri | ] is the absolute value of r in the usual sense. Answer: If r ≥ 0, then ri ≥ 0 a.e., so |r | = r = [ ri ] = [ |ri | ]. If r ≤ 0, then ri ≤ 0 a.e., so |r | = −r = [ −ri ] = [ |ri | ]. Problem: Show that if r = [< ri >], s = [< si >], and r < s, then there is a t ∈ ∗ R with r < t < s. Answer: Since ri < si a.e., ri < (ri + si )/2 < si a.e. An element t ∈ ∗ R which works is t = [< (ri + si )/2 >]. Problem: Show that there are infinitely many elements in ∗ R greater than the number ω := [< 1, 2, . . . , n, . . . >]. Answer: For each m ∈ N, mω = [< m, 2m, . . . , mn, . . . >] > ω. Roughly speaking, a property will hold for ∗ R if for any finite set of sequences of real numbers, it holds a.e. on N. To extend this principle to properties involving functions, we extend each real-valued function f defined on R to a function defined on ∗ R; the value of the extended function at [< ri >] is [< f (ri ) >]. Definition 1.2.7 For any r ∈ ∗ R, (1) r is infinite or unlimited (positive or negative) if |r | > n for every standard n ∈ N. (2) r if finite or limited if |r | < n for some standard n ∈ N; (3) r is infinitesimal if |r | < 1/n for every standard n ∈ N. Note that 0 is the only standard infinitesimal. The equivalence class [< 1/i >] is infinitesimal and [< i >] is a positive, unlimited number in ∗ R. We have already extended the functions +, · , | · | and the relation <. These are examples of the ∗-transform of n-ary relations. To say more, we need the following definitions. Definition 1.2.8 An n-ary relation P on a setS is a subset of S n = S × S × · · · × S. 1 n We write a , . . . , a ∈ P or P a 1 , . . . , a n if the n-tuple a 1 , . . . , a n satisfies the relation P. The complement in S of an n-ary relation P is the n -ary relation P = S n \ P. A 1-ary relation in S is just a subset of S. Example 1.2.9 For the 2-ary relation ≤, we may write 5, 7 ∈ ≤ or ≤ 5, 7 , but we usually just write 5 ≤ 7. Instead of writing 5 ∈ N, we may also write N < 5 >. Definition 1.2.10 A function is an n + 1-ary relation such that f of n variables if a 1 , . . . , a n , b ∈ f and a 1 , . . . , a n , c ∈ f , then b = c. We will usually write b = f (a 1 , . . . , a n ) instead of a 1 , . . . , a n , b ∈ f . The domain of f is the set of all 1 a , . . . , a n for which ∃b with a 1 , . . . , a n , b ∈ f . The range of f is the set of all 1 b such that ∃ a , . . . , a n with a 1 , . . . , a n , b ∈ f . Example 1.2.11 The operations + and · are functions of 2 variables. However, we usually write 5 + 7 = 12 instead of 5, 7, 12 ∈ + or +(5, 7) = 12. When using a formal language however, as in the next section, one should remember that 5 + 7 = 12 is shorthand for a more formal statement.
10
P.A. Loeb
Definition 1.2.12 The ∗-transform an n-ary relation P is the relation ∗ P where ∗ of n 1 1 [< ri >], . . . , [< ri >] ∈ P if ri , . . . , rin ∈ P for almost all i. It is a good exercise to show that the ∗-transform of an n-ary relation is welldefined, that is, it is independent of the choice of representatives of the equivalence classes. Note that previous extensions of +, ·, < , | · |, all follow this general pattern. The ∗-transform of equality is true equality in ∗ R, because the elements of ∗ R are the equivalence classes. For a unary relation A, ∗ A ⊃ A in the sense that for each a ∈ A, the equivalence class [< a >] containing the constant sequence < a > is also in ∗ A. In general, we have the following fact. Proposition 1.2.13 If P is an n-ary relation, then ∗ P extends P; i.e., every n -tuple in P is in ∗ P (or rather, the extension of the n-tuple is in ∗ P.) Proof The proof is clear. Example 1.2.14 The extension of the unit interval ∗ [0, 1] contains all nonstandard reals between 0 and 1. If f is a function of n-variables, then ∗ f extends f . Moreover, if D is the domain of f , then ∗ D is the domain of ∗ f . We will also work with the characteristic functions of relations. Given an n-ary relation P, we set χ P (x1 , . . . , xn ) = 1 if x1 , . . . , xn ∈ P and χ P (x1 , . . . , xn ) = / P. An important consequence of the fact that we are working 0 if x1 , . . . , xn ∈ with a free ultrafilter, and not just a free filter, is given by the following result. Proposition 1.2.15 Given an n-ary relation P, the extension of its characteristic function, ∗ χ P , is equal to the characteristic function of ∗ P. That is, ∗ χ P = χ∗ P . This means that ∗ χ P (x1 , . . . , xn ) = 1 if x1 , . . . , xn ∈ ∗ P, and ∗ χ P (x1 , . . . , xn ) = 0 otherwise. Proof We will present the proof for a unary relation P, the general proof is similar. For any sequence ri , the sequence χ P (ri ) is defined and consists of just 0’s and 1’s. The set of natural numbers N is the union of the two disjoint sets {i ∈ N : χ P (ri ) = 1} = {i ∈ N : P ri } and
{i ∈ N : χ P (ri ) = 0} = {i ∈ N : P ri }.
Therefore, either ∗ χ P ([ ri ]) = 0 or ∗ χ P ([ ri ]) = 1 in ∗ R. Here we have used the fact that U is an ultrafilter. It follows that ∗ χ P is a characteristic function on ∗ R. We also see that ∗ χ P ([ ri ]) = 1 if and only if P ri holds a.e., and this is true if and only if ∗ P [ ri ] holds on ∗ R, so ∗ χ P = χ∗ P .
1 Simple Nonstandard Analysis and Applications
11
1.3 A Simple Language Recall that the real numbers R can be constructed as a collection of equivalence classes of Cauchy sequences of rational numbers. One soon forgets the construction, however, and works just with the properties of R. For the same reason, except for proofs of some fundamental properties of ∗ R, it is best to put aside the construction of ∗ R as a collection of equivalence classes of real-valued sequences and work not with the construction, but with the properties of ∗ R. Our simplified formulation of nonstandard analysis in this chapter treats only elements of ∗ R and the n-ary relations on ∗ R. We will distinguish those relations that are functions. In the next chapter, we will expand this simple approach to handle full mathematical structures. For this discussion, we need a formal language for R and another formal language for ∗ R. We construct both as a single language by using the symbol S to stand for either number system. The language we construct in this chapter is restricted so that beginners can concentrate on formal sentences having just two forms. This restriction simplifies the interpretation and proof of the important “transfer principle”. It also helps in learning to exploit the difference between a mathematical object, such as the number system S, and formal statements in the language about such a mathematical object. Our restricted language L is built from the following logical symbols: Connectives: ∧, → (“and”, “implies”). Quantifier: ∀ (“for all”). Parentheses: [, ], {, }, <, >. Variables: x, y, x1 , a, n, etc. We need only a countable number of these. (Only a finite number will actually appear on these pages.) Constants: For each s ∈ S we have a name s. We may have more than one name for s. We have a relation symbol P for each relation, and in particular, a function symbol f for each function. These symbols are called the names of the relations and the functions. We omit the underline for numbers and also for +, ·, <, and for some functions. Remark 1.3.1 We are making a distinction between the name of an object and the object itself. Mathematicians are used to making such distinctions, as for example, distinguishing between numbers and numerals. The number eight, for example, has the names 8, 10, and 1000 with respect to its representation in base 10, 8, and 2, respectively. The number 8 also has the name VIII using Roman numerals. There is also the distinction between a point in n-space and the coordinates of the point in terms of a chosen coordinate system. We now combine all of the logical symbols given above to form a formal language L. Definition 1.3.2 A term is a combination of symbols defined inductively as follows: (i) Each constant and each variable is a term. (ii) If f is a function symbol naming a function of n variables, and τ 1 , . . . , τ n are terms, then f (τ 1 , . . . , τ n ) is a term.
12
P.A. Loeb
A term with no variables is called a constant term. Example 1.3.3 If S denotes the sum function and P the product, we can write S(P(3, x), P(4, y)) for 3x + 4y. Definition 1.3.4 A simple sentence is a string of symbols from the language L taking one of the two following forms: (a) Atomic Sentence: P τ 1 , . . . , τ n , where P is the name of an n-ary relation and τ 1 , . . . , τ n are constant terms, i.e., terms with no variables. (We want to be able to say that an atomic sentence either holds or doesn’t hold.) (b) Compound Sentence: ⎡ (∀x1 ) · · · (∀xn ) ⎣
k
⎤ l − − → → P i τ i −→ Qj σ j ⎦.
i=1
j=1
→ This is an abbreviation for actual formal sentence. The symbol − τ i stands − the k → of terms; i=1 P i τ i stands for k relational statements confor an n i -tuple catenated with ; the only variables involved in the τ s and σ s are x1 , . . . , xn . We say the variables x1 , . . . , xn are bound by the quantifier ∀. Example 1.3.5 We would usually write the atomic sentence = +(3, 4), 7 as 3 + 4 = 7. Example 1.3.6 The Distributive Law can be written with P and S denoting the sum and product functions and R denoting the unary relation of being a real number as follows:
(∀x)(∀y)(∀z) R x ∧ R y ∧ R z → = P(x, S(y, z)), S(P(x, y), P(x, z)) . This is a formal way of writing
(∀x)(∀y)(∀z) R x ∧ R y ∧ R z → x · (y + z) = x · y + x · z .
Note that (∀x)
(∀y) R x →= x, x is a simple sentence, but the string of symbols (∀x) R x → = x, y is not a simple sentence because the variable y is not bound by a quantifier.
1.4 Interpretation of the Language L We interpret the constant terms of our language L in the number system S using the following induction scheme: (1) If a constant symbol s names an element s ∈ S, we interpret it as s. If a constant symbol s does not name an element of S, we say that s is not interpretable in S.
1 Simple Nonstandard Analysis and Applications
13
(2) If τ is the constant term f (τ 1 , . . . , τ n ), then τ is interpretable in S provided i (i) Each of the terms τ i is interpretable in S (naming s ∈ S), and (ii) The n-tuple s 1 , . . . , s n named by τ 1 , . . . , τ n is in the domain of the function f named by the function symbol f . In this case, we interpret the term τ as f (s 1 , . . . , s n ). Otherwise, we say that τ is not interpretable in S.
Example 1.4.1 In R, sin(π/2 + 2π ) is interpretable as 1, but sin(1 + tan(π/2)) is not interpretable. Here, and for the rest of this chapter, we omit the underline for constants, operations, and order relations. Recall that an atomic sentence has the form P τ 1 , . . . , τ n where P is the name of an n-ary relation and τ 1 , . . . , τ n are constant terms, i.e., terms with no variables. We say that such an atomic sentence holds in S or is valid in S if i i (1) Each of the terms in1 S (naming s ∈ S), and 1 τ isninterpretable n (2) The n-tuple s , . . . , s named by τ , . . . , τ is in the relation P named by the relation symbol P. Example 1.4.2 The atomic sentence = tan(π/2), tan(π/2) does not hold in R since tan(π/2) is not interpretable in R. The atomic sentence = +(3, 4), 7 holds in R, but = +(3, 4), 8 does not hold in R.
We say that a Compound Sentence ⎡ (∀x1 ) · · · (∀xn ) ⎣
k
⎤ l − → τ i −→ σ j ⎦, Pi → Qj −
i=1
j=1
holds in S or is valid in S if for each replacement of the variables x1 , . . . , xn with constant symbols → s 1 , . . . , s n , we have the following: If all of the resulting atomic τ i are valid in S, then with statements P i − → the same replacement of the variables, σ j are valid in S. all of the resulting atomic statements Q j − Example 1.4.3 In R, (∀x)[ln(x) = ln(x) → sqrt(x) > 0] is valid, but (∀x)[R x → ln(x) = ln(x)] is not valid. The distributive law can be expressed as follows: (∀x)(∀y)(∀z)[1 = 1 → x · (y + z) = x · y + x · z]. We will often use a double arrow ←→ to abbreviate two separate formal sentences: In the first sentence, the left side of the double implication implies the right, and in the second sentence, the right side implies the left. Similarly, we will write a compound sentence such as ⎡ (∀x1 ) · · · (∀xn ) ⎣
l j=1
⎤ − σ j⎦ Qj →
14
P.A. Loeb
to stand for a compound sentence such as ⎡ (∀x1 ) · · · (∀xn ) ⎣5 = 5 −→
l
⎤ − σ j ⎦, Qj →
j=1
where the left side of the implication is always true. Such a sentence holds or is valid in S if for each replacement of the variables x1 , . . . , xn with constant symbols → σ i are valid in S. s 1 , . . . , s n , all of the resulting atomic statements Q i − We have no formal “not” inour simple language. If τ 1 , . . . , τ n are interpretable in S, then instead of saying P τ 1 , . . . , τ n is not true, we can use the complement P , and say that P τ 1 , . . . , τ n is true. The reader should note, however, that if a 1 term in an atomic sentence P τ , . . . , τ n is not interpretable, then neither the atomic 1 sentence P τ , . . . , τ n nor the atomic sentence P τ 1 , . . . , τ n is true in S. The conjunction “or” is also handled with some difficulty. For example, the Trichotomy Law is handled as follows: (∀x)[R x ∧ x < 0 ∧ x = 0 → x > 0]. We do not have the existential quantifier ∃ in our simple language. This can be worked around using the model itself. If there is always a unique object that exists, then one can use the name of the object. For example, instead of writing (∀x)(∃y)[y + x = 0], we can write (∀x)[−x + x = 0]. Even when there is no unique object that exists, there is always a choice function ψ in the model that picks one of the objects that exists. For example, given any real number, there is always a larger number. If ψ picks one of those larger numbers, then the following sentence holds: (∀x)[R x → ψ(x) > x]. Such a function ψ is called a Skolem function. It picks one of the things that exists whether that thing is unique or not. If φ is the Skolem function that picks the unique multiplicative inverse, then the following sentence holds in R: (∀x)[R x ∧ x = 0 → x · φ(x) = 1]. Informally, one can write (∀x)[R x ∧ x = 0 → ∃(y)[x · y = 1]]. The reader should realize, however, that this is only an abbreviation for the appropriate sentence in our formal language using a Skolem function to pick an element that works to make the statement valid. Example 1.4.4 Let B be the range of a function f of n variables. The fact that B is the range of f is stated by the following two simple sentences taken together:
(∀x1 )(∀x2 ) · · · (∀xn )[ f (x1 , . . . , xn ) = f (x1 , . . . , xn ) → B f (x1 , . . . , xn ) ], (∀y)[B y → f (ψ 1 (y), . . . , ψ n (y)) = y]. In the first sentence, f (x1 , . . . , xn ) = f (x1 , . . . , xn ) is valid if and only if for all choices of values for x1 , . . . , xn , f (x1 , . . . , xn ) is defined. In the second of these sentences, each of the functions ψi picks a particular value of xi that works. Informally, we can write this second sentence as
1 Simple Nonstandard Analysis and Applications
15
(∀y) B y → (∃x1 ) · · · (∃xn ) f (x1 , . . . , xn ) = y . Problem: Write a simple sentence stating that a given nonempty set A ⊆ N has a first element. Answer: Let m be the name of the first element of A. Here, we are using our knowledge about the actual set being described by our formal language. That is, we know all about A, and we know the name of the first element m. A sentence that works is (∀x)[A x → A m ∧ m ≤ x]. We cannot abbreviate with an existential quantifier here, because that would mean there is a Skolem function acting on sets of numbers that produces the number that exists. In this chapter, we are not considering functions acting on sets. Problem: Characterize the fact that A is the domain of a real-valued function f of n variables. Answer:
(∀x1 ) · · · (∀xn ) A x1 , . . . , xn ←→ R f (x1 , . . . , xn ) . Problem: Express with a sentence of L the continuity of a real-valued function f with domain A ⊆ R at a ∈ A. Answer: For this, we must use a positive function δ of one variable. That variable is restricted to the positive real numbers. We assume we know all about this Skolem function δ that works for f and a; in particular, we know its domain. The sentence we want is (∀x)(∀ε) A x ∧ |x − a| < δ(ε) → | f (x) − f (a)| < ε . Again, informally, we can abbreviate the above sentence with the sentence (∀x)(∀ε > 0)(∃δ > 0) A x ∧ |x − a| < δ → | f (x) − f (a)| < ε .
1.5 Transfer Principle for ∗ R We can form two languages L R and L ∗ R from the language L by letting S be R and ∗ R, respectively. We next show how to transform each simple sentence in L into a R simple sentence in L ∗ R . Here are the rules.
16
P.A. Loeb
CONVENTIONS: (1) The name c of c ∈ R also names ∗ c; we identify c and ∗ c. (2) If P names the relation P, ∗ P names ∗ P. In particular, if f names a function f , ∗ f names ∗ f . We leave off ∗ for some relations and functions such as <, +, and ·. THE ∗-TRANSFORM OF TERMS: (1) The transform of a constant or variable is that constant or variable. (2) The ∗-transform of a term f (τ 1 , . . . , τ n ) is ∗ f (∗ τ 1 , . . . ,∗ τ n ). THE ∗-TRANSFORM OF SIMPLE SENTENCES: (1) The ∗-transform of P τ 1 , . . . , τ n is ∗ P ∗ τ 1 , . . . ,∗ τ n . (2) The ∗-transform of ⎡ (∀x1 ) · · · (∀xn ) ⎣
k
i=1
is
⎡ (∀x1 ) · · · (∀xn ) ⎣
k
⎤ l − → τ i −→ σ j⎦ Pi → Qj − j=1
⎤ l → → ∗ ∗ τ i −→ σ j ⎦. P i ∗− Q j ∗−
i=1
j=1
Example 1.5.1 The ∗-transform of sin(x + ln(2 · y)) is ∗ sin(x + ∗ ln(2 · y)); note we do not write ∗ + or ∗ ·. Because we have used restricted languages, we now have an easy proof of the following fundamental result about the nonstandard number system ∗ R constructed from real sequences using a given free ultrafilter U on N. Theorem 1.5.2 (Transfer Principle) The ∗-transform of a simple sentence that is true for R is true for ∗ R. We will say a property is true by transfer if its validity for ∗ R follows from the transfer principle. Remark 1.5.3 It is easy to discuss the validity of the transform of an atomic sentence, since there are no variables, only constants. In determining whether a sentence of the form ⎤ ⎡ k l − − → → (∀x1 ) · · · (∀xn ) ⎣ P i τ i −→ Qj σ j ⎦ i=1
j=1
is valid when transferred, one should keep in mind the following principles: The transform of a constant or variable is that constant or that variable. The constants in the original sentence will all be names of elements of R. First one transforms the
1 Simple Nonstandard Analysis and Applications
17
sentence, and then one replaces the variables with names. Those names are names of elements of ∗ R. That is, names of elements of ∗ R replace the variables in the transformed sentence. Example 1.5.4 The atomic sentence ≥ sin(π ), cos(π ) (written without underlines) is transformed to the atomic sentence ∗ ≥ ∗ sin(π ), ∗ cos(π ) . The transformed sentence holds in ∗ R because the original sentence says that 0 ≥ −1, and since the relation ∗ ≥ and functions ∗ sin and ∗ cos retain their values on elements of R, the transformed sentence says the same thing. Example 1.5.5 Omitting underlines, the compound sentence ≥ π, x −→ ≥ sin(x), 0
(∀x) ≥ x, 0
is transformed to the sentence ∗ ≥ π, x −→ (∀x) ∗ ≥ x, 0
∗
≥
∗
sin(x), 0 .
Suppose we fix an element [ ri ] ∈ ∗ R for which the left side of the implication in the transformed sentence is valid. (This means that when we replace x with the name of [ ri ], the left side holds in ∗ R.) There is an element U in the ultrafilter such that for all i ∈ U , 0 ≤ ri ≤ π , whence, sin(ri ) ≥ 0. Therefore, ∗ sin ([ ri ]) = [ sin(ri ) ] ∗ ≥ 0. It follows that the transformed sentence holds in ∗ R. Proof of the Transfer Principle: Assume is the atomic sentence P τ 1 , . . . , τ n , and holds for R. Then contains only constants naming elements of R. Since the ∗-transforms of functions are extensions of the original functions, all of the terms in ∗ R and τ 1 , . . . , τ n are interpretable have the same interpretation as in R. Since ∗ P extends P, we have τ 1 , . . . , τ n ∈ ∗ P, so ∗ holds for ∗ R. Now assume that is the compound sentence ⎡ (∀x1 ) · · · (∀xn ) ⎣
k
⎤ l − → τ i −→ σ j ⎦. Pi → Qj −
i=1
j=1
Choose n names ρ 1 , . . . , ρ n of elements of ∗R that replace x1 , .. . , xn to make k ∗ → n 1 ∗− ∗ the transformed left-hand side i=1 P i τ i hold in R. Let ri , . . . , ri be n j
sequences of real numbers such that ri is in the equivalence class named by ρ j , 1 ≤ j ≤ n. There is a set U in the free ultrafilter U, such that for all i ∈ U , replacing the variables x1 , . . . , xn with the names ri1 , . . . , rin makes the left-hand → k side i=1 τ i hold in R. Since holds in R, that same replacement makes the Pi − → right-hand side lj=1 Q j − σ j hold in R. Since this is true for each i ∈ U , the right-
hand side holds in ∗ R when the variables x1 , . . . , xn are replaced by ρ 1 , . . . , ρ n . Therefore, ∗ holds for ∗ R.
18
P.A. Loeb
Note that this transfer principle for the restricted languages would be valid if we worked just with the Fréchet filter on N, i.e., the filter of cofinite subset of N. A property that does not follow from this transfer principle is the property that for a relation P and its complement P , (∗ P) = ∗ (P ). For example, we do not know directly from the transfer principle that ∗ = is =. We now give a direct proof of this property using the fact we have already established in Proposition 1.2.15 that for a given n-ary relation P, ∗ χ P = χ∗ P . Recall that the proof of Proposition 1.2.15 used the fact that if a subset A of N is not in the ultrafilter U, then N \ A is in U. Proposition 1.5.6 Let P be an n-ary relation on R. Then (∗ P) = ∗ (P ). Proof For the proof, we transfer the following sentences and keep in mind that ∗ χ P is defined and takes either the value 0 or the value 1 for all inputs. (i) (∀x1 ) · · · (∀xn )[P x1 , . . . , xn ←→ χ P (x1 , . . . , xn ) = 1], (ii) (∀x1 ) · · · (∀xn )[P x1 , . . . , xn ←→ χ P (x1 , . . . , xn ) = 0]. It follows that for any a1 , . . . , an in ∗ R, ∗ P a1 , . . . , an holds if and only if ∗ P a , . . . , a does not P (a1 , . . . , an ) = 1, and this is true if and only if 1 n ∗ hold in R.
∗χ
1.6 The Nonstandard Real Numbers Having established the Transfer Principle, we now begin to use more informal language as an abbreviation of the formal language. We will, for example, drop the underlines from names. The reader should continue to keep in mind the difference between the name of an object and the object itself. We will continue to write R x
and N x to indicate that we are working with sets such as R and N as unary relations. Here is one of the fundamental facts about the nonstandard reals. Theorem 1.6.1 The number system (∗ R, +, ·, <) is an ordered field extension of (R, +, ·, <). Proof Since (R, +, ·, <) forms an ordered field, the result follows by transfer of the appropriate simple sentences. We sketch just some of that proof: Since (∀x)(∀y)[R x ∧ R y → x + y = y + x], we have (∀x)(∀y)[∗ R x ∧ ∗ R y → x + y = y + x]. That is, + is commutative in ∗ R. To establish the existence of a multiplicative inverse, we use the (Skolem) function ψ defined on the nonzero real numbers with ψ(x) = 1/x for each x. Then (∀x)[x = 0 → x · ψ(x) = 1] holds in R. It follows by transfer that, (∀x)[x = 0 → x · ∗ ψ(x) = 1] holds in ∗ R. The extension ∗ ψ of the function ψ continues to give the multiplicative inverse in ∗ R. These sentences using ψ can be abbreviated with the sentence (∀x = 0)(∃y)[x · y = 1]. For the order relation ≤, we have the trichotomy law by transferring the sentence (∀x)[x < 0 ∧ x = 0 → x > 0]. That is, the transfer is the trichotomy law for ∗ R.
1 Simple Nonstandard Analysis and Applications
19
Proposition 1.6.2 If f is a function of n variables on R, then ∗ f is a function of n variables on ∗ R; it is an extension of f with ∗ (dom f ) = dom(∗ f ) and ∗ (range f ) = range(∗ f ). Proof The fact that ∗ f is a function follows either from our convention or transfer of the sentence (∀x1 ) · · · (∀xn )(∀y)(∀z)[ f x1 , . . . , xn , y ∧ f x1 , . . . , xn , z → y = z]. The fact that ∗ (dom f ) = dom(∗ f ) follows by transferring of the sentence (∀x1 ) · · · (∀xn )[dom f x1 , . . . , xn ←→ R f (x1 , . . . , xn ) ]. If B is the range of f , then range(∗ f ) is ∗ B ; this follows by transferring of the sentences (∀x1 )(∀x2 ) · · · (∀xn )[ f (x1 , . . . , xn ) = f (x1 , . . . , xn ) → B f (x1 , . . . , xn ) ], and (∀y)[B y → f (ψ1 (y), . . . , ψn (y)) = y], where the ψi are appropriate Skolem functions. This last sentence can be abbreviated with the sentence (∀y ∈ B)(∃x1 ) · · · (∃xn )[ f (x1 , . . . , xn ) = y]. Proposition 1.6.3 For sets in Rn , (i) ∗ ∅ = ∅. (ii) ∗ (A ∪ B) = ∗ A ∪ ∗ B, ∗ (A ∩ B) = ∗ A ∩ ∗ B, (∗ A) = ∗ (A ). (iii) For Ai , i ∈ I , a family of subsets of Rn , ∪i∈I ∗ Ai ⊆ ∗ (∪i∈I Ai ), and ∩i∈I
∗
Ai ⊇ ∗ (∩i∈I Ai ).
Proof (i) Since χ∅ is identically 0, so is ∗ χ∅ = χ∗∅ . (ii) This can be proved using characteristic functions. For example, χ(A∩B) = χ A · χ B , χ(A∪B) = χ A + χ B − χ A · χ B , χ A = 1 − χ A . Here, we are using the fact that the transform of a characteristic function is a characteristic function. For ∩, one can also transfer the following sentence: (∀x1 ) · · · (∀xn ) [A x1 , . . . , xn ∧ B x1 , . . . , xn ←→ A ∩ B x1 , . . . , xn ] .
20
P.A. Loeb
To show ∗ (A ∪ B) = ∗ A ∪ ∗ B for the case n = 1, transfer the following sentence; the proof for n > 1 is similar: (∀x)[A x → A ∪ B x ], (∀x)[B x → A ∪ B x ], (∀x)[A ∪ B x ∧ A x → B x ]. (iiia) For the case n = 1, given j ∈ I , (∀x)[x ∈ A j → x ∈ ∪i∈I Ai ], so this follows by transfer. What we have written is shorthand for (∀x)[A j x → ∪i∈I Ai x ]. The proof for n > 1 is similar. (iiib) For the case n = 1, we have for each j in I , the sentence (∀x)[∩i∈I Ai x → A j x ], which we transfer. The proof for n > 1 is similar. Recall that for a given ρ ∈ ∗ R, we say that ρ is unlimited or infinite if |ρ| > n for all standard n ∈ N, ρ is limited or finite if |ρ| < n for some standard n ∈ N, and ρ is infinitesimal if |ρ| < 1/n for all standard n ∈ N. The number 0 is the only real infinitesimal. ∗ ∗ Example 1.6.4 Note that ∞ n=1 [−n, n] is the set of finite or limited numbers in R, N n∈ ∞ ∗ −1 , 1 is the set of all while ∗ [−n, n] = ∗ R. On the other hand, ∞ n=1 n=1 n n n∈N n∈N ∞ −1 1 ∗ ∗ infinitesimal numbers in R, while = ∗ {0} = {0}. n=1 n ,n n∈N
Proposition 1.6.5 Let B denote a subset of Rn . Then ∗ B ∩ Rn = B. Proof We give the proof for n = 1. If r is a real number not in B, then the atomic sentence r ∈ / B holds for B and thus for the extension of B. Definition 1.6.6 An n-tuple a1 , . . . , an is standard if each ai ∈ R. Proposition 1.6.7 A number ρ ∈ ∗ R is positive and unlimited if and only if 1/ρ is strictly positive and infinitesimal; ρ ∈ ∗ R is negative and unlimited if and only if 1/ρ is strictly negative and infinitesimal. Proof The preservation of positivity (negativity) under x → 1/x follows by transfer. Moreover, the fact that for all standard natural numbers n, |x| > n iff |1/x| < 1/n follows by transfer of the sentence (∀x)(∀m)[N m ∧ |x| > m ←→ N m ∧ |1/x| < 1/m]. Theorem 1.6.8 The following properties hold for ∗ R: (i) Finite sums, differences, products of limited numbers are limited. (ii) Finite sums, differences, products of infinitesimal numbers are infinitesimal. (iii) The infinitesimal numbers form an ideal in the ring of limited numbers; i.e., the product of a limited and an infinitesimal number is infinitesimal.
1 Simple Nonstandard Analysis and Applications
21
(iv) The limited and the infinitesimal numbers form vector spaces over R. Proof If |ρ| < n, and |τ | < m for n, m ∈ N, then n + m + n · m bounds the absolute values of the sum, difference and product of ρ and τ . Fix ρ and τ infinitesimal, and fix α limited in ∗ R. Given any n ∈ N, |ρ| < 1/(2n) and |τ | < 1/(2n), so |ρ + τ | < 1/n. There is an m ∈ N such that |α| < m. Since |ρ| < 1/(m · n), |α · ρ| < m/(m · n) = 1/n. The rest is left to the reader. Definition 1.6.9 We say that x and y are infinitesimally close or infinitely close if x − y is infinitesimal. Here one writes x y or x ≈ y. We say that x and y are finitely close if x − y is finite or limited. Here we write x ∼ y . (Both and ∼ are equivalence relations.) The equivalence class for containing x is called the monad of x and written m(x). That is, m(x) = {y ∈ ∗ R : y x}. The equivalence class for ∼ containing x is called the galaxy of x and written G(x). Remark 1.6.10 The monad of 0, m(0), is the set of infinitesimals; moreover, for all x ∈ ∗ R, m(x) = x + m(0). Similarly, G(x) = {y ∈ ∗ R : y ∼ x}. The galaxy of 0, G(0) is the set of limited elements of ∗ R. It is also denoted by Fin (∗ R). For each x ∈ ∗ R, G(x) = x + G(0). Proposition 1.6.11 Between any two monads is another monad; between any two galaxies is another galaxy. Proof If x and y are in different monads, then (x +y)/2 is between x and y. Moreover, x+y x−y 2 − y = 2 is not infinitesimal. Similarly, x is not in the monad of (x + y)/2. A similar proof works for galaxies. Remark 1.6.12 We usually center monads at standard real numbers, and speak of the monad of r for r ∈ R. Theorem 1.6.13 Every limited ρ ∈ ∗ R is in the monad of a unique r ∈ R. Proof Fix a limited ρ ∈ ∗ R, and set A := {s ∈ R : s ≤ ρ}. Here, we have used ρ to define an upper bounded subset of ordinary real numbers. Let r be the least upper bound in R of the set A. Assume |r − ρ| is not infinitesimal. Then for some n ∈ N, 1/n ≤ |r − ρ|. In this case, if r < ρ, then r + 1/n is still in A, so r is not the least upper bound of A. On the other hand, if r > ρ, then r − 1/n is an upper bound of A. This again contradicts the definition of r . It follows that, r ρ. If we also have s ∈ R and s ρ, then r − s 0. Since 0 is the only infinitesimal in R, s = r . Definition 1.6.14 If ρ is limited, then the unique real number r with ρ r is called the standard part of ρ. We write r = st(ρ) or r = ◦ ρ. The mapping st : G(0) → R is called the standard part map.
22
P.A. Loeb
The standard part map will be quite important in all later parts of this book. Theorem 1.6.15 The standard part of a sum, difference, product, or quotient of two limited numbers is, respectively, the sum, difference, product or quotient of the standard parts of those numbers, with the exception that a denominator must not be infinitesimal. If ρ ≤ τ , then ◦ ρ ≤ ◦ τ . Proof Fix limited numbers r + ε and s + δ, where r and s are real numbers, and ε and δ are infinitesimal (possibly 0). Then: (r + ε) ± (s + δ) = (r ± s) + (ε ± δ) r ± s, (r + ε) · (s + δ) = (r · s) + (r · δ) + ε · (s + δ) r · s. To establish the rule for quotients, we assume that r > 0, and we note that for some n, m ∈ N, n1 < r 2 − m1 , and of course m1 > |r · ε| 0, so n1 < r 2 + ε · r , whence 1 1 n > r 2 +ε·r > 0. Since r 2 +ε·r is limited, 1 ε 1 − = 2 0. r r +ε r +r ·ε The proof for r < 0 is similar, and the rest follows from the product rule. If (r + ε) ≤ (s + δ) , then r ≤ s + (δ − ε) < s + 1/n for any n ∈ N. It follows that r ≤ s. Corollary 1.6.16 The quotient G(0)/m(0) is isomorphic to the field R. Proof The set m(0) is the kernel of the linear map st taking the vector space (over R) G(0) onto R. Definition 1.6.17 For any subset A of R, we will write ∗ A∞ to denote the set \ G(0).
∗A
Theorem 1.6.18 The only limited elements of ∗ N are the standard natural numbers, so ∗ N∞ = ∗ N \ N. If A is an infinite subset of N, then ∗ A contains arbitrarily large unlimited elements. In particular, ∗ A ∩ ∗ N∞ is not empty. Proof If s ∈ ∗ N and s is limited, then by definition, for some standard n ∈ N, s ≤ n. Now the transfer of the sentence (∀x)[N x ∧ x ≤ n ∧ x = 1 ∧ · · · ∧ x = n − 1 → x = n] forces s to equal a standard natural number between 1 and n. For the second part, if A ⊆ N is infinite, then there is a Skolem function ψ such that (∀n)[N n → ψ(n) ≥ n ∧ A ψ(n) ]. The transfer of this sentence says that there are arbitrarily large elements of ∗ A. Definition 1.6.19 The set ∗ N is called the set of nonstandard natural numbers or hypernatural numbers. The extension of the integers ∗ Z is called the set of nonstandard integers. (It is formed from ∗ N as Z is formed from N.) Remark 1.6.20 We have the following easy to establish facts.
1 Simple Nonstandard Analysis and Applications
23
(1) The ∗-transform of the greatest integer function [·] continues to work in the way described by the sentence (∀x)[R x → Z [x] ∧ [x] ≤ x < [x] + 1]. (2) If A = {a1 , . . . , an } is a finite set in R, then ∗ A = A. This follows by transfer of the sentence (∀x)[A x ∧ x = a1 ∧ · · · ∧ x = an−1 → x = an ]. (3) The real numbers R = (∗ Q ∩ G(0))/m(0). (4) Not every subset of ∗ R, is the extension of a standard one. For example, N cannot be the extension of a finite subset A of N since otherwise, ∗ A = A, but if A is an infinite subset of N, then ∗ A contains unlimited elements. Problem: Show ∗ cos is periodic with period 2π . Answer: Transfer the sentence: (∀x)[R x → cos(x) = cos(x + 2π )]. To show that no smaller positive number is a period, transfer the sentence (∀r ∈ (0, 2π ))(∃x ∈ R)[cos(x + r ) = cos(x)]. This is actually an abbreviation of the following sentence using a Skolem function ψ: (∀r )[χ(0,2π ) (r ) = 1 → cos(ψ(r ) + r ) = cos(ψ(r ))]. Problem: Show that a bound M on the range of a function f remains in effect for ∗f. Answer: Transfer the sentence (∀x)[ f (x) = f (x) → | f (x)| ≤ M]. Problem: Show that the ∗-transform of the set { f > c} is the set {∗ f > c}. Answer: Let A be the set { f > c}. Transfer the sentence (∀x)[A x ←→ f (x) > c].
1.7 Sequences Since a sequence is a function from N into R, it has an extension that maps ∗ N into ∗ R. We write s for the original sequence and ∗ s for its extension. Note that for n n all n ∈ N, ∗ sn = sn . The results in this section are due to Robinson [6].
24
P.A. Loeb
Theorem 1.7.1 A sequence sn has limit L if and only if for all η ∈ ∗ N∞ , ∗ sη L. That is, sn → L iff L = st(∗ sη ) ∀η ∈ ∗ N∞ . Proof Assume sn → L. Given an ε > 0 in R, there is a k ∈ N for which the sentence (∀n)[N n ∧ n ≥ k → |sn − L| < ε] holds for R. It follows by transfer that ∀η ∈ ∗ N∞ , |∗ sη − L| < ε. Since ε is arbitrary in R+ , |∗ sη − L| 0 ∀η ∈ ∗ N∞ . Now assume it is not true that sn → L. Then there is an ε > 0 and a Skolem function ψ : N → N such that the following sentence holds for R: (∀k)[N k → ψ(k) ≥ k ∧ |sψ(k) − L| ≥ ε]. It follows by transfer that there are unlimited η ∈ ∗ N such that |∗ sη − L| ≥ ε. Example 1.7.2 The sequence 1/n : n ∈ N becomes 1/n : n ∈ ∗ N . For each unlimited η, 1/η 0, so 1/n → 0. Theorem 1.7.3 Assume sn → L and tn → M. then (i) sn + tn → L + M, (ii) sn · tn → L · M, (iii) sn /tn → L/M provided M = 0. Proof By assumption, for any unlimited η ∈ ∗ N, ∗ sη L and ∗ tη M, so ∗ sη + ∗ ∗ ∗ ∗ η L + M, sη · tη L · M; moreover, sη / tη L/M provided M = 0.
∗t
Theorem 1.7.4 A sequence sn is bounded if and only if for all η ∈ ∗ N∞ , ∗ sη is limited. Proof If sn is bounded, then there is an M > 0 such that (∀n)[N n → |sn | ≤ M] holds for R. Its transfer shows that ∀η ∈ ∗ N, ∗ sη is limited. If sn is not bounded, then there is a function ψ : N → N such that ∀n ∈ N, ψ(n) ≥ n and sψ(n) ≥ n. By transfer, if η ∈ ∗ N∞ , then λ = ∗ ψ(η) is unlimited and |∗ sλ | ≥ η. Thus, ∗ sλ is unlimited. Problem: Show that a sequence sn is Cauchy if and only if for all η, γ ∈ ∗ N∞ , ∗ η sγ . Answer: Assume sn is Cauchy. Given ε > 0 in R, ∃kε such that
∗s
(∀n)(∀m)[N n ∧ N m ∧ n ≥ kε ∧ m ≥ kε → |sn − sm | < ε]. ∗ By transfer we see that for η, γ ∈ N∞ , sη − sγ < ε. Since ε is arbitrary, sη − sγ 0. Now assume that sn is not Cauchy. Then there is an ε > 0 in R and functions ϕ and ψ from N into N such that
1 Simple Nonstandard Analysis and Applications
25
(∀k)[N k → ϕ(k) ≥ k ∧ ψ(k) ≥ k ∧ sϕ(k) − sψ(k) ≥ ε]. By transfer, ∃η, γ ∈ ∗ N∞ , with ∗ sη − ∗ sγ ≥ ε. Problem: Show that a Cauchy sequence sn must be bounded. Answer: Suppose sn is a sequence that is not bounded. Then for some Skolem function ψ, (∀k)[N k → N ψ(k) ∧ ψ(k) > k ∧ sψ(k) − sk ≥ 1] holds in R. By transfer, if η ∈ ∗ N∞ , then λ = ∗ ψ(η) > η and ∗ sη − ∗ sλ ≥ 1, whence sn is not Cauchy. Problem: Suppose sn → L, tn → M, and sn ≤ tn ∀n. Show that L ≤ M. Answer: Fix η ∈ ∗ N∞ . Note that by transfer ∗ sη ≤ ∗ tη , and take standard parts. Problem: Assume (sn − 1)/(sn + 1) → 0. Show that sn → 1. Answer: For any η ∈ ∗ N∞ , there is an ε 0 such that sη − 1 = εsη + ε. It follows that (1 − ε)sη = 1 + ε, whence sη = (1 + ε)/(1 − ε) 1. Problem: Show that a sequence sn converges if and only if it is Cauchy. Answer: If sn → L, then for unlimited η and γ , ∗ sη L ∗ sγ . If sn is Cauchy, pick η ∈ ∗ N∞ . Since sn is bounded, ∗ sη is limited, so we may let L = st(∗ sη ). Now for any γ ∈ ∗ N∞ , ∗ sγ ∗ sη L. Theorem 1.7.5 A real number L is a limit point of a sequence sn if and only if there is an η ∈ ∗ N∞ with ∗ sη L. Proof Assume that L is a limit point of sn . There is a function ψ : R+ × N → N such that (∀ε)(∀k)[ε > 0 ∧ N k → N ψ(ε, k) ∧ ψ(ε, k) ≥ k ∧ sψ(ε,k) − L < ε]. By transfer, if ε > 0 but ε 0 and η ∈ ∗ N∞ , then λ = ∗ ψ(ε, k) ∈ ∗ N∞ and λ L. Conversely, if L is not a limit point of sn , then ∃ε > 0 and a k ∈ N such that (∀n)[N n ∧ n ≥ k → |sn − L| ≥ ε]
∗s
holds in R. It follows by transfer that for all η ∈ ∗ N∞ , ∗ sη − L ≥ ε. Example 1.7.6 Let sn = (−1)n (1 − 1/n). For any unlimited η, ∗ sη 1 for η even and ∗ sη −1 for η odd. Theorem 1.7.7 (Bolzano-Weierstrass) Every bounded sequence has a limit point. Proof If sn is bounded and η ∈ ∗ N∞ , then ∗ sη is limited. Let L = st(∗ sη ). Now, there is an unlimited element of ∗ N, namely η, such that ∗ sη L.
26
P.A. Loeb
1.8 Topology on the Reals Recall that the monad m(a) of a real number a ∈ R consists of the points infinitesimally close to a in ∗ R. Many of the arguments in this section generalize to 1st countable topological spaces where the monad of a point x is the set m(x) := ∩{∗ V : V a standard open neighborhood of x}. The compactness results of this section generalize to Rn . Theorem 1.8.1 Let A be a subset of R. (i) A is open if and only if for all a ∈ A, m(a) ⊂ ∗ A. (That is, points an infinitesimal distance away from a are still in ∗ A.) (ii) A is closed if and only if for all a ∈ A , m(a) ∩ ∗ A = ∅. Proof Clearly, (ii) follows from (i). If A is open and a ∈ A, then for some δ > 0, (∀x)[R x ∧ |x − a| < δ → A x ] holds for R. If x ∈ ∗ R, and x ∈ m(a), i.e. x a, then |x − a| < δ, so by transfer, x ∈ ∗ A. If A is not open, then ∃a ∈ A and a sequence sn such that
(∀n)[N n → A sn ∧ |sn − a| < 1/n]. By transfer, for η ∈ ∗ N∞ , ∗ sη a and ∗ sη ∈ contained in ∗ A.
∗
A = (∗ A) , whence m(a) is not
Problem: Show that arbitrary unions and finite intersections of open sets are open, and arbitrary intersections and finite unions of closed sets are closed. Answer: If a ∈ A1 ∩ A2 ∩ · · · ∩ An , all open, then m(a) ⊂ ∗ Ai for i = 1, . . . , n, so m(a) ⊂ ∗ A1 ∩ · · · ∩ ∗ An = ∗ (A1 ∩ · · · ∩ An ). The rest is clear. Theorem 1.8.2 A point c is an accumulation point of A ⊆ R if and only if there is an x ∈ ∗ A with x = c but x c. Proof If c is an accumulation point of A, then there is a sequence sn such that (∀n)[N n → A sn ∧ 0 < |sn − c| < 1/n]. For the desired point, let x = ∗ sη for some η ∈ ∗ N∞ . If c is not an accumulation point of A, then ∃ε > 0 in R such that
(∀x)[0 < |x − c| < ε → A x ], so m(c) ∩ (∗ A \ {x}) = ∅. Theorem 1.8.3 The closure A of A ⊆ R is the set {x ∈ R : m(x) ∩ ∗ A = ∅}. Proof If x ∈ A or x ∈ R is an accumulation point of A, then m(x) ∩ ∗ A = ∅. If m(x) ∩ ∗ A = ∅ and x ∈ / A, then x is an accumulation point of A.
1 Simple Nonstandard Analysis and Applications
Remark 1.8.4 The closure A = st
∗
27
A ∩ G(0) .
We next give Robinson’s criterion for compactness. For our elementary setting, we need a standard-analysis result not needed in the general setup of nonstandard analysis. That result generalizes to Rn . We should note that Robinson’s Theorem is also valid for spaces that are not second countable. Proposition 1.8.5 A set A ⊂ R is compact if and only if each covering of A by open intervals with rational end points has a finite subcovering. Proof Fix an arbitrary open covering of A. Given a ∈ A, there is an open U in the covering and an open interval with rational end points (α, β) such that a ∈ (α, β) ⊂ U . This gives a covering of A by intervals with rational end points. If this covering has a finite subcover {(αi , βi ) : 1 ≤ i ≤ n}, thenfor each i, there is a set Ui in the original cover with (αi , βi ) ⊆ Ui . Clearly, A ⊆ 1≤i≤n Ui . The rest is clear. Definition 1.8.6 We say that ρ ∈ ∗ A is near-standard in A if ◦ ρ = st(ρ) ∈ A. Theorem 1.8.7 (Robinson) A set A ⊂ R is compact if and only if for all ρ ∈ ∗ A, there is an a ∈ A with ρ a. That is, every point of ∗ A, is near-standard in A. Proof Assume A is compact but ∃ρ ∈ ∗ A not in the monad of any standard point of A. Then ∀a ∈ A, |ρ − a| is a non-infinitesimal number, so ∃δa > 0 in R such that |ρ − a| ≥ δa . Since A is compact, there is a finite set {a1 , . . . , an } ⊂ A and numbers {δ1 , . . . , δn } ⊂ R+ with δi = δai such that the following sentence holds for R: (∀x)[A x ∧ |x − a1 | ≥ δ1 ∧ · · · ∧ |x − an−1 | ≥ δn−1 → |x − an | < δn ]. By transfer, |ρ − an | < δn , which contradicts the choice of δn = δan . It follows that if A is compact, each ρ ∈ ∗ A is near-standard in A. Now assume there are sequences αn and βn of rational numbers such that for each n, αn < βn , and the open intervals (αn , βn ) form a covering of A with no finite subcovering. For each n ∈ N, let ψ(n) be an element of A not in the first n intervals. Then we have (∀n)(∀k)[N n ∧ N k ∧ k ≤ n ∧ αk < ψ(n) → βk ≤ ψ(n)], (∀n)(∀k)[N n ∧ N k ∧ k ≤ n ∧ βk > ψ(n) → αk ≥ ψ(n)]. / ∗ (αk , βk ) for any standard k ∈ N. Since By transfer, for η ∈ ∗ N∞ , ρ = ∗ ψ(η) ∈ for each a ∈ A, ∃k ∈ N with a ∈ m(a) ⊂ ∗ (αk , βk ), ρ is not in the monad of any standard point of A. Theorem 1.8.8 (Heine-Borel) A set A ⊂ R is compact if and only if it is closed and bounded.
28
P.A. Loeb
Proof Assume A is not closed. Then for some x ∈ A , m(x) ∩ ∗ A = ∅, i.e., m(x) / A, A is not compact. Assume A is contains a point y ∈ ∗ A . Since st(y) = x ∈ not bounded; that is, for each n ∈ N, there is a point φ(n) ∈ A with |φ(n)| ≥ n. Then by transfer, ∗ A contains an unlimited element, which is not in the monad of any standard point. Now assume that A is closed and that A is bounded by a constant M. Then (∀x)[A x → |x| ≤ M] holds in R. Given ρ ∈ ∗ A, |ρ| ≤ M, so ρ has a standard part, call it r . Since m(r ) ∩ ∗ A contains ρ and A is closed, r ∈ A. Remark 1.8.9 All of these notions are easily extended to Rn . For example, we have x1 , . . . , xn y1 , . . . , yn if and only if for all i, xi yi . Problem: Does ◦ x ≤ ◦ y ⇒ x ≤ y? Answer: No. If ε > 0 is infinitesimal, then 0 < ε, while 0 = ◦ ε ≤ ◦ 0 = 0. It is true that ◦ x < ◦ y ⇒ x < y. That is, if ρ = r + ε and τ = t + δ, with r < t in R and ε 0, δ 0, then t − r is not infinitesimal, so ρ = r + ε < τ = t + δ. Problem: Show that if A and B are compact in R, so is A + B. Answer: Given y ∈ ∗ (A + B), ∃α ∈ ∗ A and β ∈ ∗ B such that y = α + β. Since st(α) ∈ A and st(β) ∈ B, st(y) exists and is in A + B.
1.9 Limits and Continuity It is quite natural to treat limits and continuity using infinitesimals. Theorem 1.9.1 Suppose a is an accumulation point of A and f : A −→ R. Then lim x→a f (x) exists and equals L ∈ R if and only if for all x ∈ ∗ A with x a but x = a, ∗ f (x) L; that is, ∗ f [(m(a) ∩ ∗ A) \ {a}] ⊆ m(L). Proof Assume that lim x→a f (x) = L. Fix ε > 0 in R. Then ∃δ > 0 such that (∀x)[A x ∧ 0 < |x − a| < δ → | f (x) − L| < ε] holds for R. By transfer, if x ∈ (m(a) ∩ ∗ A) \ {a}, then | f (x) − L| < ε, and this is true for any ε > 0 in R, whence f (x) L. Now assume that L is not the limit of f (x) as x → a. Then there is an ε > 0 in R and a sequence sn such that (∀n)[N n → A sn ∧ 0 < |sn − a| < 1/n ∧ | f (sn ) − L| ≥ ε] holds in R. By transfer, if η ∈ ∗ N∞ , then ∗ sη ∈ (m(a) ∩ ∗ A) \ {a} but ∈ / m(L).
∗
f (∗ s η )
1 Simple Nonstandard Analysis and Applications
29
Theorem 1.9.2 If lim x→a f (x) = L and lim x→a g(x) = M, then (i) lim x→a ( f + g)(x) = L + M, (ii) lim x→a ( f · g)(x) = L · M, (iii) lim x→a ( f / g)(x) = L/M if M = 0. Proof The result follows by taking standard parts of (m(a) ∩ ∗ A) \ {a}.
∗
f (x) and ∗ g(x) when x ∈
Theorem 1.9.3 If f is defined on A then f is continuous at a ∈ A if and only if for all x ∈ m(a) ∩ ∗ A, ∗ f (x) f (a). That is, if and only if for x = x − a 0, we have y = ∗ f (x) − f (a) 0. Proof The result follows from the previous result for limits. Example 1.9.4 If f (x) = x 2 and x = x − a 0, then y = (a + x)2 − a 2 = 2a · x + x 2 0. Corollary 1.9.5 The sum, product, and quotient of functions continuous at a are continuous at a, provided that for the quotient, the denominator does not vanish at a. Theorem 1.9.6 (Intermediate Value Theorem) If f is continuous on [a, b] and f (a) < d < f (b), then there is a c ∈ (a, b) with f (c) = d. A similar result holds if f (b) < d < f (a). Proof For each n ∈ N, partition [a, b] into steps of length (b − a)/n . There is a first time in going from a partition point a + (k − 1)(b − a)/n to a partition point a + k(b − a)/n that f crosses from below d to a level at or above d. This gives a sequence sn such that b−a (∀n) N n → a ≤ sn < b ∧ f (sn ) < d ∧ d ≤ f sn + . n For η ∈ ∗ N∞ , ∗ sη is limited, and c = st(∗ sη ) ∈ [a, b]. Moreover, since ∗ sη b−a b−a ∗ η + η , c = st( sη + η ). By continuity, f (c) ≤ d and f (c) ≥ d, whence f (c) = d. For the rest, replace f with − f .
∗s
Theorem 1.9.7 (Extreme Value Theorem) If f is continuous on [a, b], then there is a point c ∈ [a, b] with f (c) ≥ f (x) for all x ∈ [a, b]. Proof For each n ∈ N, construct the points xn,k = a + nk · (b − a), 0 ≤ k ≤ n. Choose ψ(n) so that f (xn,ψ(n) ) ≥ f (xn,k ) ∀k with 0 ≤ k ≤ n. Fix η ∈ ∗ N∞ . By transfer, ∗ f (∗ xη,∗ ψ(η) ) ≥ ∗ f (∗ xη,k ) for any k ∈ ∗ Z with 0 ≤ k ≤ η. Moreover, c = st(∗ xη,∗ ψ(η) ) ∈ [a, b]. If d ∈ [a, b], then for the appropriate Skolem function ϕ determined by d we have ∗ ϕ(η) ∈ ∗ Z, 0 ≤ ∗ ϕ(η) ≤ η, and
30
P.A. Loeb
a ≤ ∗ xη,∗ ϕ(η) ≤ d < ∗ xη,∗ ϕ(η) +
b−a . η
Since f is continuous, f (c) ∗ f (∗ xη,∗ ψ(η) ) ≥ ∗ f (∗ xη,∗ ϕ(η) ) f (d). Since f (c) and f (d) are real, f (c) ≥ f (d). Remark 1.9.8 With the general approach to nonstandard analysis that we will present in the next chapter, we would be able to simplify the above proof by using a function that picks the biggest element from any hyperfinite set. Theorem 1.9.9 A function f is uniformly continuous on a set A if and only if for each x and y in ∗ A with x y, ∗ f (x) ∗ f (y). Proof Assume f is uniformly continuous on A. Given ε > 0, ∃δ > 0 so that (∀x)(∀y)[A x ∧ A y ∧ |x − y| < δ → | f (x) − f (y)| < ε]. Thus if x y in ∗ A, then |∗ f (x) −∗ f (y)| < ε. Since ε is arbitrary in R+ , we have f (x) ∗ f (y). Now assume that f is not uniformly continuous on A. Then ∃ε > 0 and a pair of sequences an and bn such that ∗
(∀n)[N n → A an ∧ A bn ∧ |an − bn | < 1/n ∧ | f (an ) − f (bn )| ≥ ε]. By transfer, for η ∈ ∗ N∞ , ∗ aη ∗ bη but ∗ f (∗ aη ) − ∗ f (∗ bη ) ≥ ε. Example 1.9.10 The function f (x) = 1/x is continuous on (0, 1), since for a ∈ (0, 1) and h 0, 1/(a + h) 1/a. However, f is not uniformly continuous on (0, 1) since for η ∈ ∗ N∞ , 1/η 1/2η 0, but ∗ f (1/η) = η and ∗ f (1/2η) = 2η. Theorem 1.9.11 If A ⊂ R is compact and f is continuous on A, then f is uniformly continuous on A. Proof Fix x y in ∗ A. Let a = st(x). Then a = st(y). By continuity, ∗ f (x) f (a) ∗ f (y).
1.10 Differentiation Theorem 1.10.1 Let f be defined on an interval [a, a + δ) (or (a − δ, a]) for some positive δ. Then f has a right-hand (left-hand) derivative at a if for all strictly positive (negative) h 0, (∗ f (a + h) − f (a)) / h is finite and has a standard part independent of h. The right-hand (left-hand) derivative is that standard part.
1 Simple Nonstandard Analysis and Applications
31
Proof This follows immediately from the nonstandard criterion for a limit. As usual, we say that f has a derivative at a if f has both a right-hand and lefthand derivative at a taking the same value, or if a is the end point of the domain of f and f has the appropriate one sided derivative at a. In this case we write f (a) for the derivative of f at a. Theorem 1.10.2 Let f be defined on [a, b]. Then f has a continuous derivative on [a, b] if and only if for every c ∈ [a, b] and for all x, x , y, y ∈ ∗ [a, b] ∩ m(c), with x = y and x = y , we have ∗ f (x ) − ∗ f (y ) ∗ f (x) − ∗ f (y) ∈ G(0). x−y x − y In this case,
∗ f (x) − ∗ f (y) . f (c) = st x−y
Proof If f has a continuous derivative on [a, b] and for c ∈ [a, b] we have x < y in the monad of c, then by the transfer of the mean value theorem, there is a z with ∗ − ∗ f (y) = ∗ f (z). Here a Skolem function gives z as the x < z < y such that f (x)x−y output for the given input x and y. Since f is continuous, ∗ f (z) f (c), and the result follows. Assume now the conditions for difference quotients hold. Fixing c ∈ [a, b] and letting y and y take the value c, we conclude that f (c) exists. Since c is arbitrary, f exists on [a, b]. To show f is continuous, we again pick any c ∈ [a, b] and a point y ∈ m(c) ∩ ∗ [a, b]; we must show that ∗ f (y) f (c). Using an appropriate Skolem function and the Transfer Principle, we see that there are points x y and x c such that f (c)
∗ f (x ) − ∗ f (y) ∗ f (x) − f (c) and ∗ f (y) , so, f (c) ∗ f (y). x −c x − y
Example 1.10.3 If f (x) = x 2 , then for any limited x and non-zero x 0, y = (x + x)2 − x 2 = 2x · x + (x)2 , so y/x = 2x + x 2x. Remark 1.10.4 If f (c) exists for c ∈ [a, b], then when x 0 but x = 0, setting dy = f (c) · x, we have y = ∗ f (c + x) − f (c) = f (c) · x + ε · x = dy + ε · x where ε 0. If x = 0, this formula for y holds for any limited value of ε. In any case, since dy 0, we have y 0.
32
P.A. Loeb
Remark 1.10.5 A corollary of the last theorem is that f is continuous on [a, b] if and only if for every x ∈ ∗ [a, b] (be it standard or nonstandard), if x 0, then y = dy + ε · x where ε 0. The standard rules of differentiation are easily established using infinitesimals. For example, the chain rule has the following proof: If y = f (x) and x = g(t), with c = g(b), then when t 0, we have x 0, so
y = ( f (c) + ε)x = ( f (c) + ε) · (g (b) + δ) · t , where ε 0 and δ 0. If x = 0 but t = 0, then g (b) + δ = 0. Since g (b) is a standard number, g (b) = 0. In any case, it follows that y/t f (c) · g (b).
1.11 Riemann Integration The following treatment of Riemann integration is derived from the treatment in H.J. Keisler’s book [3]. Further developments by this author that allow the results given here to be presented in a standard calculus course without using infinitesimals are outlined in [5]. b Given a continuous function f on [a, b], we let S a ( f, P), S ab ( f, P), and Sab ( f, P) denote the upper, lower, and ordinary Riemann sum for f with respect to a partition P of [a, b]. For an ordinary Riemann sum, we take the evaluation of f at the left endpoint of each interval. We work with partitions of [a, b] that are determined by positive values of x; the partition points in (a, b) are of the form a + k · x for 1 ≤ k ≤ n − 1. For the last interval [xn−1 , b], we have b − xn−1 ≤ x. For each b value of x, we denote the corresponding Riemann sums by S a ( f, x), S ab ( f, x), b and Sa ( f, x), respectively. These functions of a, b, and x can be extended to ∗ R. For a continuous function f on [a, b] and x > 0, let Mi − m i be the difference between the maximum and the minimum of f on the i th partition interval xi−1 , xi . Let E(x) = maxi (Mi −m i ). A property that is equivalent to the uniform continuity of f on [a, b] is the fact that limx→0+ E(x) = 0. This fact allows a simple proof, both standard and nonstandard, that the Riemann integral of f on [a, b] exists. Theorem 1.11.1 Let f be a continuous function on [a, b]. If x is a positive infinb itesimal, then ∗ S a ( f, x) ∗ S ab ( f, x). Proof There are Skolem functions ϕ and ψ that given a positive input x give as output two points ϕ(x) and ψ(x) in [a, b] with |ϕ(x) − ψ(x)| ≤ x and b
S a ( f, x) − S ab ( f, x) ≤ [ f (ψ(x)) − f (ϕ(x))] · (b − a). (To see this, find the first subinterval Ii on which is found the maximum difference Mi −m i between the maximum and the minimum of f on each subinterval; let ψ(x)
1 Simple Nonstandard Analysis and Applications
33
be the point at which the maximum Mi is taken, and let ϕ(x) be the point where the minimum m i is taken in that same subinterval.) Given a positive infinitesimal x, there is a point c ∈ [a, b] such that ∗ ϕ(x) c ∗ ψ(x). Since f is continuous at c, it follows that ∗ b S a ( f, x) − ∗ S ab ( f, x)
≤ [∗ f (∗ ψ(x)) − ∗ f (∗ ϕ(x))] · (b − a) 0.
Corollary 1.11.2 If f is continuous on [a, b], then f is Riemann integrable there, b and for any positive infinitesimal x, a f (x)dx = st[∗ Sab ( f, x)]. b
Proof For any partition P, we always have S ab ( f, P) ≤ Sab ( f, P) ≤ S a ( f, P) . If we assume that the area under the curve exists, then every upper sum is too big and every lower sum is to small. It follows that for x 0, the corresponding upper and lower sums are infinitesimally close to the area, as is the Riemann sum ∗ Sab ( f, x). If we do not assume that the area exists, then one needs the standard fact that for any pair of partitions P1 and P2 with common refinement P, b
b
S ab ( f, P1 ) ≤ S ab ( f, P) ≤ S a ( f, P) ≤ S a ( f, P2 ). Therefore, there is at least one real number that is below every upper sum and above every lower sum. Since for x 0, the upper and lower sums are infinitesimally b close, that number is unique (we denote it by a f (x)dx); it equals st[∗ Sab ( f, x)] for any positive infinitesimal x. We proceed now on a somewhat more informal level that can be justified by the full machinery of the next chapter. x Theorem 1.11.3 If f is continuous on [a, b] and F(x) = a f (t)dt, then for every c in [a, b], F (c) = f (c). Proof Given c ∈ [a, b) and a positive h 0, ∗ F(c
1 + h) − F(c) = h h
c+h
∗
f (t) dt.
c
This value is between the maximum and minimum value of ∗ f (t) on [c, c + h], and so it is infinitesimally close to f (c). A similar proof works for c ∈ (a, b] and a negative h 0. Proposition 1.11.4 Given a partition by x 0 and a corresponding sum of the form S = εi · xi with each εi 0, the sum S 0.
34
P.A. Loeb
Proof Let ε = maxi εi . Then |S| ≤ |εi | · xi ≤ ε · xi = ε · (b − a) 0. Theorem 1.11.5 (Keisler’s Infinite Sum Theorem) Let S be a standard quantity such that for a continuous function f on [a, b] and a partition of ∗ [a, b] by x 0, b S = i Si = i (∗ f (xi ) · xi + εi · xi ) with each εi 0. Then S = a f (x)dx. Proof Since i εi ·xi 0, S i f (xi )·xi are real numbers, they are equal.
b a
f (x)dx. Since S and
b a
f (x)dx
Remark 1.11.6 This is a nonstandard form of a simplified version of Duhamel’s Principle (see [5, 7].) The standard form replaces the infinitesimal condition with the condition that for each x > 0 and each i, |Si − f (xi−1 ) · xi | ≤ E(x) · x where E(x) is a function of x with limit 0 at x = 0. Theorem 1.11.7 (Fundamental Theorem of Calculus) If F has a continuous derivb ative f on [a, b], then a f (x)dx = F(b) − F(a). Proof Let y = F(x). For a partition of ∗ [a, b] by x 0 there are infinitesimals εi such that, F(b) − F(a) = i yi = i dyi + εi · xi i dyi = i ∗ f (xi ) · xi
b
f (x)dx.
a
Example 1.11.8 To compute the total force F of water on the bottom half of a circular window, one must consider that for a partition by horizontal strips, the maximum length l is at the top of the strip while the maximum pressure p is a the bottom. It does not follow, a priori that upper and lower Riemann sums bound the quantity. By continuity, for a partition of the depth by an infinitesimal y, the force Fi on a strip corresponding to the interval [yi , yi + y] equals l(yi ) · p(yi ) · y + εi · y where εi 0. Therefore, F = (l · p)dy. Here is one more application of our simple theory. It is a typical construction using nonstandard partitions yielding a standard continuous object. The proof is due to Robinson. Theorem 1.11.9 (Cauchy-Peano Existence Theorem) Let f be continuous on the b rectangle [x0 −a, x0 +a]×[y0 −b, y0 +b] with | f | ≤ M there. Let c = min(a, M ), 1 and let Ic = [x0 − c, x0 + c]. Then there is a function ∈ C (Ic ) with (x0 ) = y0 and (x) = f (x, (x)) for all x ∈ Ic .
1 Simple Nonstandard Analysis and Applications
35
Proof We will establish the result for I = [x0 , x0 +c]. Given n ∈ N, let xk = x0 + nk c for 0 ≤ k ≤ n. Let n be the polygonal path given by n (x0 ) = y0 , and for 0 ≤ k ≤ n − 1 and xk < x ≤ xk+1 , n (x) = n (xk ) + f (xk , n (xk )) · (x − xk ). Here, we are writing n (x) for (n, x). Now the following holds for R: (∀n)(∀x)(∀z)[N n ∧ I x ∧ I z
→ n (x0 ) = y0 ∧ |n (x) − n (z)| ≤ M · |x − z|]. It follows by transfer that for each n ∈ ∗ N and each x ∈ ∗ I , ∗ n (x) − y0 ≤ M · |x − x0 | ≤ M · c ≤ b. Fix η ∈ ∗ N∞ ; ∗ η is finite valued. For each x ∈ I , set (x) = st(∗ η (x)); we will show that is the desired solution. To show is uniformly continuous, note that for x, z ∈ I, |(x) − (z)| ∗ η (x) −∗ η (z) ≤ M · |x − z| . Moreover, ∀x ∈ ∗ I , ∗
(x) (st(x)) ∗ η (st(x)) ∗ η (x);
that is, ∗ is uniformly infinitesimally close to ∗ η . Now given x ∈ I , there is a k ∈ ∗ N with 0 ≤ k ≤ η − 1 and a corresponding xk with xk ≤ x ≤ xk+1 so that for x = c/η, k−1 ∗ η (xk ) = y0 + i=0 f (xi ,∗ η (xi )) · x x k−1 ∗ y0 + i=0 f (xi ,∗ (xi )) · x y0 + f (t, (t))dt.
(x)
∗
x0
Therefore, has a continuous derivative with (x) = f (x, (x)) for all x ∈ I .
References 1. R.M. Anderson, A non-standard representation for Brownian motion and Itô integration. Isr. J. Math. 25, 15–46 (1976) 2. A.E. Hurd, P.A. Loeb, An Introduction to Nonstandard Real Analysis (Academic Press, Orlando, 1985) 3. H.J. Keisler, Elementary Calculus, An Infinitesimal Approach, 1st edn. 1976, 2nd edn. 1986 (Prindle, Weber & Smith, Boston, 1976)
36
P.A. Loeb
4. P.A. Loeb, Conversion from nonstandard to standard measure spaces and applications in probability theory. Trans. Am. Math. Soc. 211, 113–122 (1975) 5. P.A. Loeb, A lost theorem of calculus. Math. Intell. 24, 15–18 (2002) 6. A. Robinson, Non-Standard Analysis (North-Holland, Amsterdam, 1966) 7. A.E. Taylor, Advanced Calculus (Ginn & Company, Boston, 1955)
Chapter 2
An Introduction to General Nonstandard Analysis Peter A. Loeb
2.1 Superstructures In this chapter, we develop the general framework of nonstandard analysis and the necessary logic for the transfer principle. We will begin each section with a brief summary for readers who want to postpone the technical details until a later reading. The summary will note any important definitions and results of the section that the reader should know before going on. For example, Definition 2.1.1 describing a superstructure and Remark 2.1.3 are important in this section. A reader who wants quickly to get to later applications may skip Sects. 2.5, 2.7, and 2.9. The reader who has read the first chapter of this book will appreciate that Skolem functions will no longer be needed to replace the existential quantifier. The results obtained in the last chapter using our simple transfer principle will still be valid, since the transfer principle used here extends that simple one. The outline of this chapter is similar to that of Chap. 2 of the author’s book with Albert E. Hurd, [4]. To work with general mathematical analysis, we need to consider sets, sets of sets, etc. All of these are constructed starting with a set of individuals. We think of an individual as an object different from a set. In particular, an individual contains no elements. We build our universe from the set X of individuals using the power set operation P. The set X will always contain the natural numbers N; usually it will contain R. Definition 2.1.1 Fix a set X containing N. Let V0 (X ) = X , and for each n ∈ N, let Vn (X ) = Vn−1 (X ) ∪ P(Vn−1 (X )). The superstructure over X is the set V (X ) = ∪∞ n=0 Vn (X ). Entities in X are said to be of rank 0, and for n ≥ 1, entities in Vn (X ) \ Vn−1 (X ) are said to be of rank n. Example 2.1.2 Individuals are of rank 0. The number 7 and the set {7} are in V1 (X ). The number 7, the set {7}, and the set of all finite subsets of N are in V2 (X ). Note that V1 (X ) ∈ V2 (X ) and V1 (X ) ⊂ V2 (X ). P.A. Loeb (B) Department of Mathematics, University of Illinois, 1409 West Green Street, Urbana, IL 61801, USA e-mail:
[email protected] © Springer Science+Business Media Dordrecht 2015 P.A. Loeb and M.P.H. Wolff (eds.), Nonstandard Analysis for the Working Mathematician, DOI 10.1007/978-94-017-7327-0_2
37
38
P.A. Loeb
Remark 2.1.3 In this chapter’s appendix, written by Horst Osswald, it will be shown that members of the set X of individuals can be coded so that they contain no elements of V (X ). That is if b ∈ X , there is no a with a ∈ b. For example, the equivalence class of Cauchy sequences of rational numbers with limit 7 is in V (X ). It is not, however, the same as the object in X we will call 7. Given a superstructure V (X ) and an entity b in V (X ) we will assume that only entities a in V (X ) can satisfy the relation a ∈ b. If we speak of an element a, with a ∈ b, a will automatically be in V (X ). Definition 2.1.4 An ordered pair a, b is the set {{a}, {a, b}}. Example 2.1.5 The ordered pair 3, π is the set {{3}, {3, π}}. The ordered pair 3, 3 is the set {{3}}. Definition 2.1.6 For n ≥ 1, an ordered n-tuple x1 , . . . , xn of entities x1 , . . . , xn is the set of ordered pairs {1, x1 , . . . , n, xn }. If c1 , . . . , cn are sets, we have c1 × · · · × cn = {x1 , . . . , xn : xi ∈ ci for 1 ≤ i ≤ n}. Moreover, cn = c × · · · × c (n factors). For n ≥ 2, an n-ary relation P on c1 × · · · × cn is a subset of c1 × · · · × cn . We identify a 1-tuple x1 with the corresponding element x1 , so a 1-ary relation on c is a subset of c. Given a relation P as above, there will be a k such that each ci is in Vk (X ). The relation P itself will be in Vk+4 (X ). Functions are relations with the usual restriction. Suppose f is a function of n-variables, i.e., the domain consists of ntuples xi , . . . , xn from c1 × · · · × cn , with each variable in the domain and the range taking values in Vk (X ). Then f is an element of Vk+7 (X ) since a typical element of f is a two tuple of the form x1 , . . . , xn , xn+1 . Lemma 2.1.7 The n-tuple x1 , . . . , xn = y1 , . . . , yn as sets if and only if xi = yi for i = 1, . . . , n. Proof The proof is left to the reader (The only problem is if xi = i).
2.2 Language for Superstructures In this section we describe the construction of formal statements in a formal language L X about a superstructure V (X ). Given X , the language L X for the superstructure V (X ) over X has the following symbols: Connectives: ¬, ∧, ∨, →, ←→ Quantifiers: ∀, ∃ Parentheses: [, ], (, ), <, > Constant Symbols: At least one name for each entity in V (X ). Variable Symbols: A countable number of them will do. Equality Symbol: Denotes equality for elements of X and set equality otherwise. Set membership: ∈. We will not have terms in our language.
2 An Introduction to General Nonstandard Analysis
39
Definition 2.2.1 A formula of L X is built up inductively with the following rules: (a) If x1 , . . . , xn , x, and y are either constants or variables, then the following are formulas called atomic formulas: x ∈ y, x = y; x1 , . . . , xn ∈ y; x1 , . . . , xn = y; x1 , . . . , xn , x ∈ y; x1 , . . . , xn , x = y. (b) If and are formulas, so are ¬, ∧ , ∨ , → , and ←→ . (c) If x is a variable symbol and y is either a variable symbol or a constant symbol and is a formula that does not already contain a formula of the form (∀x ∈ z) or (∃x ∈ z), or (∀y ∈ z) or (∃y ∈ z), then (∀x ∈ y) and (∃x ∈ y) are formulas. Definition 2.2.2 A variable x is bound in a formula if it occurs in and every occurrence takes the form (∀x ∈ z) or (∃x ∈ z). Here, z may be a constant or a variable. A variable occurring in a formula but not bound in is called a free variable in . A sentence in L X is a formula in which all variables are bound.
2.3 Interpretation of the Language for Superstructures In this section we give the rules for interpreting the formal language L X . The rules are as follows: (a) The atomic sentences a ∈ b, a1 , . . . , an ∈ b, a1 , . . . , an , c ∈ b are true or hold in V (X ) if the entities corresponding to the names a, a1 , . . . , an , or, respectively, a1 , . . . , an , c belong to the object named by b. The atomic sentences a = b, a1 , . . . , an = b, a1 , . . . , an , c = b are true or hold in V (X ) if the entities corresponding to the names a, a1 , . . . , an , or, respectively, a1 , . . . , an , c are identical to the object named by b. (b) If and are sentences, then (i) (ii) (iii) (iv) (v)
¬ is true in V (X ) if is not true (does not hold) in V (X ); ∧ is true in V (X ) if both and are true in V (X ); ∨ is true in V (X ) if either or is true in V (X ); → is true if either is not true or is true in V (X ); ←→ is true if and are either both true or both not true in V (X ).
(c) Let = (x) be a formula in which x either does not occur and all variables are bound or x is the only free variable. Given a constant a, we will write (a) for with all occurrences of x replaced by a. Let b be a constant naming a set β ∈ V (X ). (i) (∀x ∈ b) is true in V (X ) if for all entities α ∈ β, (a) is true in V (X ), where a is any constant naming α. (ii) (∃x ∈ b) is true in V (X ) if there is an entity α ∈ β such that (a) is true in V (X ), where a is any constant naming α.
40
P.A. Loeb
Remark 2.3.1 Although we do not have terms in our formal language, we will use xn+1 = f (x1 , . . . , xn ) as shorthand for x1 , . . . , xn , xn+1 ∈ f and y = f (x) as shorthand for x, y ∈ f . Example 2.3.2 The sentence that says that every nonzero real number has a multiplicative inverse has the following form using R to denote the set of real numbers and P to denote the product function: (∀x ∈ R)[¬(x = 0) → (∃y ∈ R)[x, y , 1 ∈ P]]. With our shorthand, this sentence has the form (∀x ∈ R)[¬(x = 0) → (∃y ∈ R)[P(x, y) = 1]]. Using S to denote the sum function, the distributive law has the form (∀x ∈ R)(∀y ∈ R)(∀z ∈ R)(∀α ∈ R)(∀β ∈ R)(∀γ ∈ R)(∀δ ∈ R) [[[S(y, z) = α] ∧ [P(x, α) = β] ∧ [P(x, y) = γ] ∧ [P(x, z) = δ]] → [S(γ, δ) = β]]. Remark 2.3.3 Note that we are missing the composition of terms. We will usually use ordinary mathematical sentences employing terms. One should keep in mind, however, that these are shorthand for more complicated parts of sentences of L X . Example 2.3.4 To say that a function f defined on R is continuous at a, let R + be the symbol used to denote the strictly positive real numbers. Let ρ denote the distance function, i.e., ρ(x, y) = |x − y|, and let I denote strict inequality, i.e., x, y ∈ I iff x < y. With f (a) = b, the sentence (∀ε ∈ R + )(∃δ ∈ R + )(∀x ∈ R)(∀α ∈ R)(∀β ∈ R)(∀γ ∈ R) [[ρ(x, a) = α ∧ α, δ ∈ I ∧ f (x) = β ∧ f (a) = b ∧ ρ(β, b) = γ] → [γ, ε ∈ I ]]. is abbreviated with the sentence (∀ε ∈ R + )(∃δ ∈ R + )(∀x ∈ R)[|x − a| < δ → | f (x) − f (a)| < ε]. Problem: Write sentences in the language for the real numbers expressing the commutative and associative laws for addition. Answer: Let S denote the sum function. The commutative law for addition is expressed by (∀x ∈ R)(∀y ∈ R)(∀α ∈ R)(∀β ∈ R)[[[S(x, y) = α] ∧ [S(y, x) = β]] → [α = β]].
2 An Introduction to General Nonstandard Analysis
41
The associative law for addition is expressed by (∀x ∈ R)(∀y ∈ R)(∀z ∈ R)(∀α ∈ R)(∀β ∈ R)(∀γ ∈ R)(∀δ ∈ R) [[[S(y, z) = α] ∧ [S(x, α) = β] ∧ [S(x, y) = γ] ∧ [S(γ, z) = δ]] → [β = δ]]. Problem: Write a sentence in the language for the real numbers saying that for a given function f , lim x→a∈A f (x) = L. Answer: We use R + to denote the strictly positive real numbers, ρ to denote the distance function, i.e., ρ(x, y) = |x − y|, and I to denote strict inequality, i.e., x, y ∈ I iff x < y. The sentence has the form (∀ε ∈ R + )(∃δ ∈ R + )(∀x ∈ A)(∀α ∈ R)(∀β ∈ R)(∀γ ∈ R) [[¬[x = a] ∧ ρ(x, a) = α ∧ α, δ ∈ I ∧ f (x) = β ∧ ρ(β, L) = γ] → [γ, ε ∈ I ]].
2.4 Monomorphisms and the Transfer Principle Just as in Chap. 1, where we worked with R and ∗ R, we now will work back and forth between X and its nonstandard extension ∗ X . We will in fact work with the superstructures V (X ) and V (∗ X ) using a “∗-mapping ”, which is a one-to-one mapping from V (X ) into, but not onto, V (∗ X ). The abstraction of the basic properties of the mapping ∗ originates with the work of Robinson and Zakon in [10]. For the moment, we write Y for ∗ X . We assume X and Y are two sets of individuals. We associate the superstructure V (X ) and language L X with X and the superstructure V (Y ) and language LY with Y . Each constant symbol in L X names something in V (X ), and there is at least one such symbol for each entity in V (X ). A similar statement is true for Y . We work with a one-to-one map ∗ from V (X ) into V (Y ). For each a ∈ V (X ), we write ∗ a for ∗(a). In this chapter, we will use the same symbol for ∗ a and for its name. The main results and definitions in this section are Definitions 2.4.1 and 2.4.3 along with Remark 2.4.4, Theorem 2.4.5, and Example 2.4.8. Definition 2.4.1 If is a formula in L X , the ∗-transform of , ∗ , is the formula in LY obtained by replacing each constant c in with ∗ c. Example 2.4.2 The ∗-transform of the sentence that says there is a multiplicative inverse in the real numbers is (∀x ∈ ∗ R)[¬(x = ∗ 0) → (∃y ∈ ∗ R)[ x, y , ∗ 1 ∈ ∗ P]]. Definition 2.4.3 (Robinson–Zakon [10]) The injection ∗ from V (X ) into V (Y ) is called a monomorphism if the following conditions hold.
42
P.A. Loeb
(i) (ii) (iii) (iv) (v)
∗ (∅)
= ∅, where ∅ denotes the empty set. If a ∈ X , then ∗ a ∈ Y . If a has rank n, ∗ a has rank n. If a ∈ ∗ Vn (X ) for n ≥ 1 and b ∈ a, then b ∈ ∗ Vn−1 (X ). (Transfer Principle) For each sentence in L X , if holds in V (X ) then ∗ holds in V (Y ).
Remark 2.4.4 It can be shown (the reader is invited to construct the proof) that Properties (i)–(iv) follow from the Transfer Principle, i.e., Property (v). We have listed them here as separate properties to help in understanding the notion of a monomorphism. We use them in the next section to help with the particular monomorphism constructed there. Property (iv) will be interpreted to say, among other things, that elements of “internal sets” are internal. Given Property (ii), we will assume from now on that ∗ a = a for each individual a ∈ X ; in particular, for n ∈ N, ∗ n = n. That is, we will assume that X ⊆ Y . We will write a for both a ∈ X and ∗ a ∈ Y . The Transfer Principle has the following consequence. Theorem 2.4.5 (Downward Transfer Principle) For each sentence in L X , if ∗ holds in V (Y ) then holds in V (X ). Proof If ¬ holds in V (X ), then ∗ (¬) = ¬(∗ ) holds in V (Y ). We sometimes state the transfer principle as follows: holds in V (X ) if and only if ∗ holds in V (Y ). Recall that the domain of a binary relation P is the set of all x for which there is a y with x, y ∈ P. The range of P is the set of all y for which there is an x with x, y ∈ P. Proposition 2.4.6 We have the following properties of the ∗-mapping. (a) Let a, b, a1 , . . . , an be fixed entities in V (X ). Then (i) (ii) (iii) (iv) (v) (vi) (vii)
∗ {a
∗ ∗ 1 , . . . , an } = { a1 , . . . , an }; ∗ a , . . . , a = ∗ a , . . . ,∗ a ; 1 n 1 n a ∈ b iff ∗ a ∈ ∗ b;
a = b iff ∗ a = ∗ b; a ⊆ b iff ∗ a ⊆ ∗ b; n a ) = ∪n ∗ a , ∗ (∩n a ) = ∩n ∗ a ; For n ∈ N, ∗ (∪i=1 i i=1 i i=1 i i=1 i ∗ (a × a × · · · × a ) = ∗ a × ∗ a × · · · × ∗ a . 1 2 n 1 2 n
(b) If P is a relation on a1 × a2 × · · · × an , then ∗ P is a relation on ∗ a1 × ∗ a2 × · · · × ∗ an . Moreover, if n = 2, and a and b are the domain and range of P, then ∗ a and ∗ b are the domain and range of ∗ P . (c) If f is a function from a into b, then ∗ f is a function from ∗ a into ∗ b, such that ∀c ∈ a, ∗ [ f (c)] = ∗ f (∗ c). If f maps a onto b, ∗ f maps ∗ a onto ∗ b. If f is one-to-one (i.e., injective), so is ∗ f .
2 An Introduction to General Nonstandard Analysis
43
Proof of Part a: (i) Let s = {a1 , . . . , an }, and transform the sentence (∀x ∈ s)[x = a1 ∨· · ·∨x = an ]. Also transform sentences a1 ∈ s, . . . , an ∈ s. (ii) ∗
a1 , . . . , an = ∗ {{{1}, {1, a1 }}, . . . , {{n}, {n, an }} = {∗ {{1}, {1, a1 }}, . . . ,∗ {{n}, {n, an }} = {{∗ {1},∗ {1, a1 }}, . . . , {∗ {n},∗ {n, an }} = {{{1}, {1,∗ a1 }}, . . . , {{n}, {n,∗ an }}.
(iii) and (iv) are clear. (v) Transform (∀x ∈ a)[x ∈ b]. (vi) Left to the reader. (vii) We show ∗ (a × b) = ∗ a × ∗ b. Transform (∀z ∈ (a × b))(∃x ∈ a)(∃y ∈ b)[x, y = z] to show that ∗ (a × b) ⊆ ∗ a × ∗ b. Transform (∀x ∈ a)(∀y ∈ b)(∃z ∈ (a × b))[x, y = z] to show ∗ a × ∗ b ⊆ ∗ (a × b). Proof of Part b: To show ∗ P is a relation on ∗ a1 × ∗ a2 × · · · × ∗ an , transform the sentence (∀x ∈ P)(∃x1 ∈ a1 ) . . . (∃xn ∈ an )[x1 , . . . , xn = x]. Thus we know for n = 2 and a and b the domain and range of P, that ∗ a contains the domain of ∗ P and ∗ b contains the range of ∗ P. To show that ∗ a and ∗ b are the domain and range of ∗ P, transform (∀x ∈ a)(∃y ∈ b)[x, y ∈ P] and (∀y ∈ b)(∃x ∈ a)[x, y ∈ P]. Proof of Part c: Let f be a function from a into b. By the transform of (∀x ∈ a)(∀y ∈ b)(∀z ∈ b)[[x, y ∈ f ∧ x, z ∈ f ] → [y = z]] and Part b, ∗ f is a function from ∗ a into ∗ b. If c, d ∈ f , then ∗ c,∗ d ∈ whence ∀c ∈ a, ∗ [ f (c)] = ∗ f (∗ c). The rest is left to the reader.
∗
f,
When a ∈ X , we will, as already noted, associate a with ∗ a. In general however, as the next example shows, we will have to be more careful. For this and the next example, we assume ∗ is a monomorphism between V (R) and V (∗ R). Example 2.4.7 Let I denote the set of closed and bounded intervals in R. Then I ∈ V2 (R). Moreover, the following sentences are true for V (R):
44
P.A. Loeb
(∀x ∈ I)(∃a, b ∈ R)(∀y ∈ R)[a ≤ y ≤ b ←→ y ∈ x] (∀a, b ∈ R)[a ≤ b → [(∃x ∈ I)(∀y ∈ R)[a ≤ y ≤ b ←→ y ∈ x]] Therefore, ∗ I contains the extensions of standard intervals as well as new ones. For example, if ε 0 is positive, then [ε, 2ε] is in ∗ I but it is not the extension of any standard interval. Even for a standard interval [a, b], with a = b, there are points in ∗ [a, b] not in the original interval [a, b]. Thus we can not directly associate [a, b] and ∗ [a, b]. Example 2.4.8 The Downward Transfer Principle can often be used to avoid arguments by contradiction. For example, to show that sn → L if ∀η ∈ ∗ N∞ , ∗ sη L, we note that for a given ε > 0, the ∗-transform of the sentence = (∃k ∈ N)(∀n ∈ N)[n ≥ k → |sn − L| < ε] is true in V (∗ R); just let k ∈ ∗ N∞ . It follows that holds in V (R). Problem: Assume ∗ is a monomorphism between V (R) and V (∗ R), and use downward transfer to show that f is uniformly continuous on A if ∀x ∈ ∗ A, ∀y ∈ ∗ A with x y, we have ∗ f (x) ∗ f (y). Answer: Fix ε > 0. We want to show the truth of the sentence = (∃δ ∈ R+ )(∀x ∈ A)(∀y ∈ A)[|x − y| < δ → | f (x) − f (y)| < ε]. Now ∗ is true for V (∗ R); just take δ 0. Therefore, is true for V (R).
2.5 Ultrapower Construction of Superstructures and Monomorphisms We now fix V (X ), an index set I , and an ultrafilter U in I . From these we will construct a new superstructure V (∗ X ) from V (X ) and a corresponding monomorphism. We may have I = N, we may even have U fixed at some i 0 ∈ I ; i.e., U ∈ U if and only if i 0 ∈ U . If U is fixed in this way, we will see that we get nothing new. In the next section we will see that by assuming additional properties for I and U we obtain a monomorphism with the desired properties for a nonstandard extension. Because an ultrafilter corresponds to a finitely additive measure on the power set of I , with that measure taking only the values 0 and 1, we say a property holds almost everywhere or a.e. if it holds on some U ∈ U. Given S ∈ V (X ), we write S for the set of all maps from I into S. For each such map a, we write ai for a(i). Two maps a and b are equivalent (with respect to U) and we write a = U b if ai = bi a.e. (that is, ai = bi for all i in some U ∈ U.) The relation = U is an equivalence relation. We write U S for the set of equivalence
2 An Introduction to General Nonstandard Analysis
45
classes, and [a] for the equivalence class containing the mapping a. If b ∈ V (X ), we write b for the constant function bi = b. Let V−1 (X ) denote the empty set. The reader should at least note the following two definitions and related remark. Definition 2.5.1 The bounded ultrapower 0U V (X ) of V (X ) is defined by setting 0U V (X ) := ∪∞ n=0 U [Vn (X ) \ Vn−1 (X )]. We denote by e the mapping from V (X ) into 0U V (X ) defined at each b ∈ V (X ) by e(b) = [ b ]. When [a], [b] ∈ 0U V (X ), we write [a] ∈U [b] if ai ∈ bi a.e. Remark 2.5.2 Recall that 5 is the constant function 5 on I , and [ 5 ] is the corresponding equivalence class. While it is true that [ 5 ] ∈U e(N), the relation ∈U is not the set membership relation we want. We need another map “M” so that we can replace e(N) in 0U V (X ) with the set of numbers M(e(N)) = {[r ] ∈ U X : ri ∈ N a.e.} = {[r ] ∈ U X : [r ] ∈U [ N ]}. Similarly, e(P(N)) = [ P(N) ] is the equivalence class formed from the sequence P(N), P(N), . . . , P(N), . . .. This, however, is not a collection of sets of numbers. While it is true that [ N ] ∈U [ P(N) ], this is not the set membership we want. Once we have used the map M at the level 1 to get true sets of numbers, we will then use M to replace [ P(N) ] with a true collection of sets of numbers, M(e(P(N))). In particular, the set ∗ N = M(e(N)), which we will think of as the set of nonstandard natural numbers, will be a member of the collection of sets M(e(P(N))) in the usual sense. Definition 2.5.3 Let ∗ X = U X = U V0 (X ), and let V (∗ X ) be the associated superstructure built on the set of individuals ∗ X . The Mostowski Collapsing Function M is a mapping of 0U V (X ) into V (∗ X ) defined by induction on the level n as follows: (i) For each element [a] ∈ U V0 (X ) = ∗ X , M([a]) = [a]. (ii) Given [b] ∈ U [Vn (X )\Vn−1 (X )], n ≥ 1, we set M([b]) = {M([a]) : [a] ∈ ∪n−1 k=0 U [Vk (X )\Vk−1 (X )] and [a] ∈U [b]}. We finish this section by showing that the map ∗ = M ◦ e : V (X ) → V (∗ X ) is a monomorphism. On first reading, the reader may wish to skip the following technical proofs and go on to the next section. Proposition 2.5.4 We have the following properties for ∗, e, and M: (i) e and M are 1:1 maps, so ∗ = M ◦ e is a 1:1 map of V (X ) into V (∗ X ). (ii) e maps X into ∗ X , and the restriction M|∗ X is the identity map (by definition). (iii) e(X ) = [ X ] (by definition), and M([ X ]) = ∗ X , so ∗(X ) = ∗ X .
46
P.A. Loeb
(iv) If ∅ is the empty set, then for no [a] is it true that [a] ∈U e(∅) = [ ∅ ], so ∗(∅) = M(e(∅)) = ∅. (v) a ∈ b in V (X ) iff e(a) ∈U e(b); [a] ∈U [b] iff M([a]) ∈ M([b]). (vi) For n ≥ 1, e maps Vn (X )\Vn−1 (X ) into U [Vn (X )\Vn−1 (X )] and M maps U [Vn (X )\Vn−1 (X )] into Vn (∗ X )\Vn−1 (∗ X ). It follows that if a has rank n ≥ 0, ∗(a) has rank n. (vii) If M([b]) ∈ ∗(Vn (X )) and M([a]) ∈ M([b]), then M([a]) ∈ ∗(Vn−1 (X )). Proof (i) If a = b, then ∀i ∈ I , a i = bi , so e(a) = [ a ] = [ b ] = e(b). Therefore, e is 1:1. To show M is 1:1, assume [a] = [b]. Since M is the identity map on ∗ X , if either [a] or [b] is in ∗ X , M([a]) = M([b]). Otherwise, ∃U ∈ U such that either ai \bi = ∅ ∀i ∈ U or bi \ai = ∅ ∀i ∈ U ; assume the latter. Then ∃ [c] / M([a]), so such that ci ∈ bi \ai a.e. Therefore, M([c]) ∈ M([b]), but M([c]) ∈ M([a]) = M([b]). Thus M is 1:1. (ii), (iii), (iv), and (v) are clear. (vi) By definition, if [a] ∈ U V0 (X ) = ∗ X , then M([a]) = [a] ∈ V0 (∗ X ) = ∗ X . Fix n ≥ 1, and assume that for each k < n, if [b] ∈ U [Vk (X )\Vk−1 (X )], then M([b]) ∈ Vk (∗ X )\Vk−1 (∗ X ). Given [b] ∈ U [Vn (X )\Vn−1 (X )] and M([a]) ∈ M([b]), we have [a] ∈ ∪n−1 k=0 U [Vk (X )\Vk−1 (X )], so by assumption, M([a]) ∈ Vn−1 (∗ X ). Moreover, ∃[c] such that ∀i ∈ I , ci ∈ Vn−1 (X )\Vn−2 (X ) and ci ∈ bi . Therefore, M([c]) ∈ Vn−1 (∗ X )\Vn−2 (∗ X ), and thus, M([b]) ∈ Vn (∗ X )\Vn−1 (∗ X ). (vii) Assume M([b]) ∈ ∗(Vn (X )) and M([a]) ∈ M([b]), then [b] ∈U e(Vn (X )), and [a] ∈U [b]. Therefore, there is a set U ∈ U such that ∀i ∈ U , ai ∈ bi ∈ Vn (X ). For these same i, ai ∈ Vn−1 (X ), whence [a] ∈U e(Vn−1 (X )), so M([a]) ∈ ∗(Vn−1 (X )). Now, except for the transfer principle, we have shown that ∗ = M◦e is a monomorphism from V (X ) into V (∗ X ). We will write ∗ a for ∗(a). The next proposition is used to establish the transfer principle. Proposition 2.5.5 Let [a], [a 1 ], . . . , [a n ], [b], and [c] be elements of the bounded ultrapower 0U V (X ); here, any or all of these may be of the form e(d) = [ d¯ ]. (i) (ii) (iii) (iv)
M([a]) (= or ∈) M([c]) iff ai (= or ∈) ci a.e. {M([a 1 ]), . . . , M([a n ])} (= or ∈) M([c]) iff {ai1 , . . . , ain } (= or ∈) ci a.e. M([a 1 ]), . . . , M([a n ]) (= or ∈) M([c]) iff ai1 , . . . , ain (= or ∈) ci a.e. M([a 1 ]), . . . , M([a n ]) , M([b]) (= or ∈) M([c]) iff ai1 , . . . , ain , bi (= or ∈) ci a.e.
Proof We will assume that n = 2 for the proof of (ii) and (iii). (i) (for =) Since M is 1:1, M([a]) = M([c]) iff [a] = [c] iff ai = ci a.e. (i) (for ∈) M([a]) ∈ M([c]) iff [a] ∈U [c] iff ai ∈ ci a.e. (ii) (for =) Assume {ai , bi } = ci a.e. Then
2 An Introduction to General Nonstandard Analysis
47
M([c]) = {M([y]) : yi ∈ {ai , bi } a.e.} = {M([y]) : yi = ai a.e.} ∪ {M([y]) : yi = bi a.e.} = {M([a]), M([b])}.
(ii)
(iii)
(iii) (iv)
Conversely, if M([c]) = {M([a]), M([b])}, then ai ∈ ci a.e. and bi ∈ ci a.e. If di ∈ ci a.e. then either di = ai a.e. or di = bi a.e.; that is, ci has only two points a.e. Therefore, ci = {ai , bi } a.e. (for ∈) Choose representatives for [a] and [b], and let di = {ai , bi } ∀i ∈ I . Then M([d]) = {M([a]), M([b])}. Therefore, {ai , bi } ∈ ci a.e. iff di ∈ ci a.e. iff M([d]) ∈ M([c]) iff {M([a]), M([b])} ∈ M([c]). (for =) Choose representatives for [a] and [b], and ∀i ∈ I , let di = {{1}, {1, ai }} and ei = {{2}, {2, bi }}. By Part ii, M([d]) = {{M([ 1 ]}, {M([ 1 ]), M([a])}} = {{1}, {1, M([a])}} since M([ 1 ]) = 1. A similar result holds for M([e]). Again by Part (ii), ai , bi = ci a.e. iff {di , ei } = ci a.e. iff {M([d]), M([e])} = M([c]) iff M([a]), M([b]) = M([c]). (for ∈) This is the same as ((ii) ∈) with { } replaced with . Apply (iii) to di , bi and M([d]), M([b]), where di = ai1 , . . . , ain a.e., so M([d]) = M([a 1 ]), . . . , M([a n ]) .
Notation. If (x1 , . . . , xn ) is a formula with variables x1 , . . . , xn , either free in or not appearing in and c1 , . . . , cn are constants, then (c1 , . . . , cn ) is with each xi appearing in replaced by ci . To establish the transfer principle for ∗, we need a theorem of Ł˘os (pronounced “Wash”). Theorem 2.5.6 (Ł˘os) Let (x1 , . . . , xn ) be a formula in the language L X with x1 , . . . , xn a set of variables containing all of the free variables in . Fix [a 1 ], . . . , [a n ] in 0U V (X ). Then ∗ (M([a 1 ]), . . . , M([a n ])) holds in V (∗ X ) iff (ai1 , . . . , ain ) holds in V (X ) for almost all i ∈ I . Remark 2.5.7 Note that the previous proposition has established the result for atomic formulas. For example, if (x, y) is the formula x ∈ y, then M([a]) ∈ M([c]) iff ai ∈ ci a.e. If (x) is the formula x ∈ N , then M([a]) ∈ ∗ N = M([ N ]) iff ai ∈ N a.e. If (x1 , . . . , xn , y) is the formula {x1 , . . . , xn , 5, A} ∈ y, where A is a standard set, then we have {M([a 1 ]), . . . , M([a n ]), M([ 5 ]), M([ A ])} ∈ M([c]) iff {ai1 , . . . , ain , 5, A} ∈ ci a.e. Proof of Ł˘os’ Theorem. The proof is by an induction argument similar to the induction used in the construction of formulas. Remember, we are establishing an equivalence; i.e., one sentence holds for V (∗ X ) iff related sentences indexed by i ∈ I hold a.e. for V (X ). (1) The previous proposition has established the equivalence for atomic formulas. (2) Assume the equivalence has been established for the formulas (x1 , . . . , xn ) and (x1 , . . . , xn ). Given this assumption, we establish the equivalence for ¬, and
48
P.A. Loeb
∧ . This will also establish the equivalence for ∨ = ¬(¬ ∧ ¬), → = ¬ ∨ , and ←→ = [ → ] ∧ [ → ]. For ¬, we have ∗
(¬)(M([a 1 ]), . . . , M([a n ])) = ¬(∗ )(M([a 1 ]), . . . , M([a n ])).
/ U iff {i ∈ The latter is true in V (∗ X ) iff {i ∈ I : (ai1 , . . . , ain ) holds} ∈ I : ¬(ai1 , . . . , ain ) holds} ∈ U since U is an ultrafilter. (Note we need an equivalence here, not just a.e. → ∗ .) For ∧ , note that the following are equivalent: ∗ ( ∧ )(M([a 1 ]), . . . , M([a n ])) ∗
(M([a 1 ]), . . . , M([a n ])) ∧ ∗ (M([a 1 ]), . . . , M([a n ]))
{i : (ai1 , . . . , ain ) holds} ∈ U and {i : (ai1 , . . . , ain ) holds} ∈ U {i : (ai1 , . . . , ain ) holds} ∩ {i : (ai1 , . . . , ain ) holds} ∈ U {i : ( ∧ )(ai1 , . . . , ain ) holds} ∈ U. We have used the fact that a superset of a set in U is in U. (3) Assume we have the equivalence for (x1 , . . . , xn , z, y), and d is a constant and z a variable that either does not appear in or is free in . We will establish the result for (∃y ∈ d) and (∃y ∈ z). This will give the result since (∀y ∈ d) = ¬(∃y ∈ d)¬ and (∀y ∈ z) = ¬(∃y ∈ z)¬. For (∃y ∈ z), we must fix [a 1 ], . . . , [a n ], [c] ∈ 0U V (X ) and then show that (∃y ∈ M([c]))∗ (M([a 1 ]), . . . , M([a n ]), M([c]), y) holds in V (∗ X ) if and only if (∃y ∈ ci )(ai1 , . . . , ain , ci , y) holds in V (X ) for almost all i ∈ I . (The proof of (∃y ∈ d) is a special case of this where we replace M([c]) with M([ d ]) and ci with d; i.e., the constant sequence d replaces c.) Assume that (∃y ∈ M([c]))∗ (M([a 1 ]), . . . , M([a n ]), M([c]), y) holds in V (∗ X ). Then there is an M([a]) such that [(M([a]) ∈ M([c])] ∧ [∗ (M([a 1 ]), . . . , M([a n ]), M([c]), M([a]))] holds in V (∗ X ), so {i ∈ I : (ai ∈ ci ) ∧ (ai1 , . . . , ain , ci , ai ) holds} ∈ U. Therefore, the larger set {i ∈ I : (∃y ∈ ci )(ai1 , . . . , ain , ci , y) holds} ∈ U.
2 An Introduction to General Nonstandard Analysis
49
To show the converse, assume there is a set U0 ∈ U such that U0 = {i ∈ I : (∃y ∈ ci )(ai1 , . . . , ain , ci , y)}. For each i ∈ I , pick an element ai ∈ ci , but make the choice so that for all i ∈ U0 , (ai1 , . . . , ain , ci , ai ) holds. There is a U1 ∈ U such that for some n ∈ N and all i ∈ U1 , ai ∈ Vn (X ) − Vn−1 (X ). Choose any α ∈ Vn (X )\Vn−1 (X ) and / U1 . Then for this sequence a = ai : i ∈ I , replace ai with α for i ∈ {i ∈ I : (ai ∈ ci ) ∧ (ai1 , . . . , ain , ci , ai ) holds} ⊇ U0 ∩ U1 ∈ U. Therefore, [(M([a]) ∈ M([c])] ∧ [∗ (M([a 1 ]), . . . , M([a n ]), M([c]), M([a]))] holds in V (∗ X ). If follows that (∃y ∈ M([c]))∗ (M([a 1 ]), . . . , M([a n ]), M([c]), y) holds in V (∗ X ). (4) The theorem now follows by induction. Theorem 2.5.8 The map ∗ : V (X ) → V (∗ X ) defined by ∗ = M ◦ e is a monomorphism. Proof It only remains to show that if is a sentence in L X that is true for V (X ), then is true for V (∗ X ). Since is a sentence, has no free variables, only constants and bound variables. Since is true for all i ∈ I , ∗ is true for V (∗ X ).
∗
Problem: Recall that given [a], [b] ∈ 0U V (X ), we write [a] ∈U [b] if ai ∈ bi a.e. Show that this relation is well-defined, that is, that it is independent of the choice of representative from [a] and [b]. Answer: Assume ai and ai represent [a] and bi and bi represent [b]. Also assume there is a set U ∈ U such that ∀i ∈ U ai ∈ bi . We know that there are sets V and W ai , while for all i ∈ W we have bi = bi . in U such that for all i ∈ V we have ai = bi . Now U ∩ V ∩ W is in U, and for all i in this set, ai ∈ Problem: Given a mapping a from I into Vn (X ), show that for some k ≤ n, ai ∈ Vk (X )\Vk−1 (X ) a.e. Answer: For each k ≤ n, let Ik = {i ∈ I : ai ∈ Vk (X )\Vk−1 (X )}. Then I is the disjoint finite union of the sets Ik , 1 ≤ k ≤ n, so one and only one of the sets Ik is in U.
50
P.A. Loeb
2.6 Special Index Sets Yielding Enlargements We have shown how to construct a monomorphism ∗ using a superstructure V (X ), an index set I , and an ultrafilter U. To get more, however, we need additional assumptions. The construction of an enlargement starting with the material after Example 2.6.5 and ending with Theorem 2.6.7 can be omitted on first reading. First we note that if U is not free, then U is fixed at some i 0 ∈ I ; i.e., the singleton set {i 0 } ∈ U. In this case, every sequence a is equivalent to the constant sequence a i0 since ai = ai0 a.e. Therefore, ∗ X = X and we get nothing new. We assume from now on that the ultrafilter U is free; i.e., for every i ∈ I , the set {i} ∈ / U. Definition 2.6.1 All entities in V (X ) and entities in V (∗ X ) of the form ∗ b for b ∈ V (X ) are called standard. Example 2.6.2 The sets [0, 1] and ∗ [0, 1] are standard entities even though ∗ [0, 1] contains nonstandard numbers. Note that in interpreting a sentence ∗ , the only entities α that arise are related by a finite ∈ chain to a standard entity ∗ b. Such an entity α is of the form M([a]). There are, however, entities in V (∗ X ) that are not of this form. An example is the set N. We will show that if Ai is a sequence of sets such that for each j ∈ N, the equivalence class containing the constant sequence j is in Ai for all i in some element of the ultrafilter, then the equivalence class [Ai ] contains unlimited natural numbers. We now consider “hyperfinite” sets; these are extremely important for the applications of nonstandard analysis. Definition 2.6.3 For each A ∈ V (X )\X , let P F (A) denote the finite subsets of ∗ A. The collection of hyperfinite or ∗-finite sets in V (∗ X ) is ∪∞ n=0 P F (Vn (X )) = ∗ ∪{ P F (A) : A ∈ V (X )\X }. Definition 2.6.4 Given a superstructure V (X ) and a monomorphism ∗, we say that V (∗ X ) is an enlargement of V (X ) if for each set A ∈ V (X ) there is a set B ∈ ∗ P (A) such that for every a ∈ A, ∗ a ∈ B, i.e., B contains all of the standard F entities in ∗ A. It follows from the transfer principle that if A is not a finite set, then there are elements of ∗ A that are not in B. Example 2.6.5 For η ∈ ∗ N∞ , the “initial segment” {1, 2, . . . , η} is a hyperfinite set; it contains N. Fix V (X ). We now construct an index set J and an ultrafilter U on J such that the corresponding superstructure V (∗ X ) is an enlargement of V (X ). Since X is an infinite set (containing N), it will follow that ∗ X = X . Let J be the collection of all nonempty finite sets belonging to V (X ). Note that each element of J is in Vn (X ) for some n. For each a ∈ J , let Ja = {b ∈ J : a ⊆ b}. Let F be the collection of all subsets of J such that for each A ∈ F there is an a ∈ J with Ja ⊆ A. Proposition 2.6.6 The collection F is a free filter on J .
2 An Introduction to General Nonstandard Analysis
51
Proof For each A ∈ F there is an a ∈ J with a ∈ Ja ⊆ A, so A = ∅. Given A and B in F and a, b ∈ J , with Ja ⊆ A and Jb ⊆ B, Ja ∩ Jb = Ja∪b ⊆ A ∩ B, and any subset of J containing a set in F is in F. Therefore, F is a filter. To show F is free, fix a ∈ J and find some b ∈ J with a ∩ b = ∅. Then Jb itself is in F, and the set a is not in Jb . Now use Zorn’s Lemma to obtain a free ultrafilter U on J such that U ⊃ F. Note that for each a ∈ J , Ja ∈ U. Theorem 2.6.7 If V (∗ X ) is constructed from V (X ) using J and U, then it is an enlargement of V (X ). Proof Let A be a nonempty set belonging to V (X ). Choose an a0 ∈ A. Define a map : J → P F (A) by setting a = (a ∩ A) ∪ {a0 } for each a ∈ J . Since A ∈ Vm (X ) for some m ∈ N, there is an n ∈ N and a set U0 ∈ U such that a has rank n for all / U0 . Let B = M([]). a ∈ U0 . Choose any a1 ∈ U0 , and replace a with a1 for a ∈ Then B ∈ ∗ P F (A) since ∀a ∈ J , a ∈ P F (A). Fix c ∈ A. We must show that ∗ c ∈ B. Since the singleton {c} is a finite subset of A, J {c} ∈ U, so c ∈ a for all a ∈ J{c} ∩ U0 , that is, a.e. Therefore, ∗ c ∈ B. Definition 2.6.8 A binary relation P is concurrent or finitely satisfiable on a set A contained in its domain if for each n ∈ N and each finite set {x1 , . . . , xn } ⊆ A there is a y in the range of P such that xi , y ∈ P for 1 ≤ i ≤ n. A relation P is concurrent if it is concurrent on all of its domain. Example 2.6.9 The relations ≤ in N and ⊆ in P F (N) are concurrent relations. Concurrent relations are of interest because of the following property. Theorem 2.6.10 Given X and a monomorphism ∗, the following are equivalent: (i) V (∗ X ) is an enlargement of V (X ). (ii) Given any concurrent relation P ∈ V (X ), there is an element c in the range of ∗ P such that ∗ a, c ∈ ∗ P for every a in the domain of P. Proof (i→ii) Let A be the domain of P, and let B ⊆ ∗ A be a hyperfinite set containing for each a ∈ A. By transfer of the concurrency condition for P, there is a c in the range of ∗ P such that b, c ∈ ∗ P for each b ∈ B, in particular, ∗ a, c ∈ P for each a ∈ A. (ii→i) Fix a set A ∈ V (X ). Since ⊆ in P F (A) is a concurrent relation, there is a hyperfinite set B that contains the extension of every standard finite subset of A. In particular, ∀a ∈ A, ∗ {a} = {∗ a} ⊂ B, so ∗ a ∈ B. ∗a
Corollary 2.6.11 If A is an infinite set in V (X ) and V (∗ X ) is an enlargement, then there is a nonstandard b ∈ ∗ A.
52
P.A. Loeb
Proof The relation = is concurrent in A. Remark 2.6.12 The results of the last chapter are valid for an enlargement of a superstructure V (X ) when X contains R. Problem: Assume {Oα : α ∈ A} is an open covering of S ⊆ R with no finite subcovering. Show that there is a point b ∈ ∗ S such that b is not in the monad of any x ∈ S. Answer: The relation P such that Oα , y ∈ P if y ∈ S\Oα is concurrent, so ∃b ∈ ∗ S such that b ∈ ∗ S\∗ Oα ∀α ∈ A. If x ∈ S, then x ∈ Oα for some α, and so / m(x) for any x ∈ S. m(x) ⊆ ∗ Oα . It follows that b ∈ Problem: Let A be a collection of sets with A ∈ V (X ). Assume A has the finite intersection property. That is, the intersection of any finite number of elements of A is not empty. Suppose that V (∗ X ) is an enlargement. Show that the intersection ∗ monad μ(A) := a∈A a is not empty. Answer: Take B ⊆ a hyperfinite ∗ a. b ⊆ b∈B a∈A
∗A
with ∗ a ∈ B for each a ∈ A. Then ∃y ∈
2.7 A Result in Infinite Graph Theory As an application of the existence of an enlargement, we give an easy proof of a result of de Bruijn and Erd˝os [3]. Definition 2.7.1 A graph (A, E) consists of a set A of “vertices” and a binary, symmetric relation E on A × A. If x, y ∈ E, we say that x and y are connected by an edge. The graph (A, E) is k-colorable if there is a map f : A → {1, . . . , k} (the set of k colors) such that if a, b ∈ E, then f (a) = f (b). If B ⊆ A and E|B is the restriction of the relation E to B × B, then (B, E|B) is called a subgraph of A. The cardinality of a graph (A, E) is that of the set A. Theorem 2.7.2 (de Bruijn and Erd˝os) If each finite subgraph of an infinite graph is k-colorable, then the graph itself is k-colorable. Proof Let (A, E) be the infinite graph, and let F denote the set of all finite subsets of A. The following sentence, given here in shorthand, holds for the original superstructure (∀F ∈ F)(∃g : F → {1, . . . , k})(∀x, y ∈ F)[x, y ∈ E → g(x) = g(y)]. Let B be a hyperfinite subset of ∗ A such that ∀a ∈ A, ∗ a ∈ B. Then B ∈ ∗ F, so ∃g : B → {1, . . . , k} such that ∀x, y ∈ B, if x, y ∈ ∗ E, then g(x) = g(y). In particular, ∀ a, b ∈ A, if a, b ∈ E, then ∗ a,∗ b ∈ ∗ E, so g(∗ a) = g(∗ b). Let f denote the restriction of g to A; that is, f (a) = g(∗ a). Then f is a k-coloring of A.
2 An Introduction to General Nonstandard Analysis
53
2.8 Internal and External Sets In this section we make the important distinction between internal and external objects in V (∗ X ). We also establish some additional properties that will be used in applications of nonstandard analysis. The reader may wish to skip over the proofs of Theorems 2.8.4 and 2.8.11 on first reading. Definition 2.8.1 An entity a in V (∗ X ) is called internal if for some standard set b ∈ V (X ), a ∈ ∗ b. All other entities in V (∗ X ) are called external. This means that the internal entities are the elements of standard entities. Of course, if b ∈ V (X ), then b ∈ Vn+1 (X ) for some n, so if a ∈ ∗ b, then a ∈ ∗ Vn (X ). Thus internal entities are elements of ∗ Vn (X ) for some n. On the other hand, suppose that b is internal but not a standard entity, and that a ∈ b. Then b ∈ ∗ Vn+1 (X ) for some n, so by transfer of the sentence (∀y ∈ Vn+1 (X ))(∀x ∈ y)[x ∈ Vn (X )], it follows that a ∈ ∗ Vn (X ), whence a is internal. In interpreting the transfer ∗ of a sentence , only constants naming standard objects are used, so only internal entities come up in the interpretation. If a ∈ ∗ b, then we can obtain information about a by transferring sentences of the form (∀x ∈ b)[· · · ]. If a is external, however, then the transfer principle does not yield information in this way about a. Example 2.8.2 To show that the set N is external in V (∗ X ), let us suppose the contrary, which means that N ∈ ∗ P(N). Then the set ∗ N\N = ∗ N∞ is also internal by transfer of the sentence (∀A ∈ P(N))[N\ A ∈ P(N)]. It follows by transfer of the sentence (∀A ∈ P(N))[A = ∅ → (∃m ∈ A)(∀k ∈ A)[m ≤ k]], that there is a first element m in ∗ N∞ . In this case, m − 1 is the last element of N, which is impossible. This contradiction is a proof that N is external. For applications of the Transfer Principle, it is clearly important to know when an entity in V (∗ X ) is internal. We have already given the proof for the following criterion. ∗ Theorem 2.8.3 The set of internal elements of V (∗ X ) is the set ∪∞ n=0 Vn (X ).
This result is not much help in determining when a set is internal. The next result is considerably more useful. Recall that L∗ X is the language for V (∗ X ). A formula in L∗ X is called standard if all of the constants are names of standard entities; it is called internal if all of the constants are names of internal entities.
54
P.A. Loeb
Theorem 2.8.4 (Keisler’s Internal Definition Principle) Let (x) be an internal formula in L∗ X for which x is the only free variable. Let A be an internal set. Then the set {a ∈ A : (a) holds in V (∗ X )} is internal. Proof Let c1 , . . . , cn be the constants in (x); denote by (c1 , . . . , cn , x). Fix k ∈ N so that A, c1 , . . . , cn are all in ∗ Vk (X ). The sentence in L X (∀x1 , . . . , xn , y ∈ Vk (X ))(∃z ∈ Vk+1 (X ))(∀x ∈ Vk (X )) [x ∈ z ←→ [x ∈ y ∧ (x1 , . . . , xn , x)]] holds in V (X ). Its transfer says that {a ∈ A : (a) holds} ∈ ∗ Vk+1 (X ). Theorem 2.8.5 If A and B are internal, so are A ∪ B, A ∩ B, A\B, A × B. Proof For ∪, assume that A, B ∈ ∗ Vn+1 (X ) and transfer (∀W, Y ∈ Vn+1 (X ))(∃Z ∈ Vn+1 (X ))(∀x ∈ Vn (X )) [x ∈ Z ←→ x ∈ W ∨ x ∈ Y ]. For ∩, replace ∨ with ∧. For \ replace x ∈ W ∨ x ∈ Y with x ∈ W ∧ x ∈ / Y . The proof for × is left to the reader. We have already shown that N and ∗ N∞ are external. As a consequence, we have the following result, where Z denotes the integers. Theorem 2.8.6 In an enlargement of a structure V (X ) with R ⊆ X , the sets N, ∗ N , R, Z, ∗ Z , ∗ R , m(0) are all external. ∞ ∞ ∞ Proof It follows from the fact that N and ∗ N∞ are external that Z and ∗ Z∞ are also external. Since Z = R∩ ∗ Z, R is external. Since ∗ N∞ = ∗ R∞ ∩ ∗ N, ∗ R∞ is external. Since for x = 0, x ∈ m(0) iff 1/x ∈ ∗ R∞ , m(0) is external. Remark 2.8.7 One should not think that external entities are “bad” in any sense. They are just not the subject of the transfer principle. A review of the last chapter shows the utility of such external objects as m(0), ∗ N∞ , and the standard part map. Problem: Recall that P F denotes the finite power set operation. Show that in general, P F (∗ A) ⊆ ∗ P F (A), but ∗ P(A) ⊆ P(∗ A). Answer: A truly finite subset of ∗ A is internal, so it is in ∗ P F (A). If, for example, the set has three elements, then the following sentence is true by transfer (∀x ∈ ∗ A)(∀y ∈ ∗ A)(∀z ∈ ∗ A)(∃w ∈ ∗ P F (A))[{x, y, z} = w]. Thus P F (∗ A) ⊆ ∗ P F (A). Not every hyperfinite set is finite, so the inclusion does not go the other way. If S ∈ ∗ P(A), then by the transfer principle, ∀s ∈ S, s ∈ ∗ A,
2 An Introduction to General Nonstandard Analysis
55
so S ∈ P(∗ A). Since external subsets of ∗ A are not in ∗ P(A), we can only say that ∗ P(A) ⊆ P(∗ A). Problem: It was shown in the last chapter that (∗ R, ∗ +, ∗ ·, ∗ <) forms an ordered field extension of the real number system. Use the transfer principle to show that it is complete with respect to internal sets. That is, if an internal set has an upper bound in ∗ R, it has a least upper bound in ∗ R. Problem: Use the transfer principle to show the following about the ordered field (∗ R, ∗ +, ∗ ·, ∗ <): (1) Each internal non empty set B ⊆ ∗ N has a smallest element with respect to ∗ <. (2) (Internal Transfinite Induction) For each internal B ⊆ ∗ N, the following implication holds: 1 ∈ B and ∀x(x ∈ B → x + 1 ∈ B) → B = ∗ N. Here is an additional property of monomorphisms that allows one to use the transfer principle for certain internal sets. Definition 2.8.8 A monomorphism ∗ is comprehensive if for any sets C, D in V (X ) and any map h : C −→ ∗ D, there is an internal map H : ∗ C −→ ∗ D such that H (∗ a) = h(a) for each a ∈ C. A monomorphism ∗ is called denumerably comprehensive if it is comprehensive with the restriction that C must be a countable set in V (X ). Proposition 2.8.9 A monomorphism ∗ is denumerably comprehensive if and only if any ordinary sequence {An : n ∈ N} from an internal, but not necessarily standard, set S can be extended to an internal sequence {An : n ∈ ∗ N} in S. Proof Suppose ∗ is a denumerably comprehensive monomorphism. Let C = N, and let {An : n ∈ N} be an ordinary sequence of elements taken from an internal set S. Recall that each An must then be internal. For some k, we have S ∈ ∗ Vk (X ), so there is an internal sequence {An : n ∈ ∗ N} that extends the original sequence and takes / S}, there must be a values in ∗ Vk (X ). If we consider the internal set {n ∈ ∗ N: An ∈ first element n 0 , which must be unlimited. By setting An = An 0 −1 for all n ≥ n 0 , we may assume that An ∈ S for all n ∈ ∗ N. For the converse, we note that if C is countably infinite, then there is an enumeration {cn : n ∈ N} of C. That enumeration has a nonstandard extension {cn : n ∈ ∗ N}. The composition cn → n → An , n ∈ ∗ N, gives the desired extension of the mapping cn → n → An , n ∈ N. Remark 2.8.10 Starting with [5] and then [1], the property of denumerable comprehensiveness has played a central role in the application of nonstandard analysis to measure theory. The next result applies in particular to the enlargement defined in Theorem 2.6.7. Theorem 2.8.11 Monomorphisms constructed using ultrapowers are comprehensive.
56
P.A. Loeb
Proof Let U be an ultrapower with index set I . Fix sets C, D in V (X ) and a map h : C → ∗ D. For each element M([b]) ∈ ∗ D, let S(M([b])) be a representative element of the equivalence class [b]. We may assume that S(M([b]))(i) ∈ D ∀i ∈ I . Now for each i ∈ I , let ki be the mapping from C into D given by ki = {a, S(h(a))(i) : a ∈ C} . There is an n ∈ N and a set U in the ultrafilter such that ∀i ∈ U , ki has rank n. / U . Let [k] denote the equivalence Pick some i 0 ∈ U , and let ki = ki0 for i ∈ class containing k. For each i ∈ I , ki is a function from C into D, so H = M([k]) is an internal function from ∗ C into ∗ D. We need only show that H (∗ α) = h(α) for all α ∈ C. Fix α ∈ C and recall that ∗ α = M([ α ]) and h(α) is an element M([b]) ∈ ∗ D. By definition, H (∗ α) = M([c]), where ci = ki (αi ) = S(h(αi ))(i) = S(h(α))(i) = bi a.e., so H (∗ α) = h(α). Here is an important principle for the applications of nonstandard analysis. It uses the externality of R and N in V (∗ X ) to obtain information about internal subsets of ∗ R. Theorem 2.8.12 (Spillover Principle) Let A be an internal subset of ∗ R. (i) If A contains all standard natural numbers, then A contains all elements of ∗ N less than some unlimited natural number. (ii) If A contains all unlimited natural numbers, then A contains all elements of ∗ N greater than some standard natural number. (iii) If A contains the positive infinitesimals, then A contains all elements of ∗ R+ smaller than some standard positive real number. (iv) Assume that for each unlimited natural number H there exists an unlimited natural number K < H such that K ∈ A. Then A contains a standard natural number. Proof (i) Since A is internal, we cannot have A = N. If ∗ N\A is not empty, it has a first element, which must be unlimited. (ii) Here, ∗ N\A is bounded above by an unlimited integer, so it has a least upper bound M, which must be limited. Let n be the first integer strictly bigger than M. It follows that A contains all elements of ∗ N greater than n. (iii) Let B be the set {n ∈ ∗ N : ∀x with 0 < x < 1/n , x ∈ A}. Then B is internal and contains ∗ N∞ , so it contains some finite m. (iv) Since the set A ∩ ∗ N is internal and non-empty, it contains a smallest element k, which must, by assumption, be limited. Here is a very useful corollary of the Spillover Principle. Abraham Robinson said he wanted this principle written on his tombstone, but it isn’t.
2 An Introduction to General Nonstandard Analysis
57
Theorem 2.8.13 (Robinson’s Sequential Lemma) Let sn : n ∈ ∗ N be an internal sequence in ∗ R such that sn 0 for all n ∈ N. Then for some η ∈ ∗ N∞ , sn 0 for all n ≤ η. Proof Let A = {n ∈ ∗ N : |sn | < 1/n}. Then A is internal and contains N, so it contains all elements of ∗ N smaller than some unlimited η. Therefore, sn 0 for all unlimited n ≤ η, and we already know that sn 0 for all limited n. Remark 2.8.14 This fact does not go the other way: sn = 1/n is infinitesimal for all unlimited n, but it is not infinitesimal for any limited n. Robinson used his lemma in [8] to verify his construction of Banach limits. That construction, given below, has been extended to vector valued sequences by the author and Horst Osswald in [6]. Definition 2.8.15 Let ∞ denote the set of standard bounded real-valued sequences. A linear map L : ∞ → R is called a Banach limit if for each σ ∈ ∞ , L(σ) is a value between the lim inf σ and the lim sup σ and L(σ) = L(T (σ)) where T (σ)(n) = σ(n + 1). Robinson’s nonstandard construction of a Banach limit picks η ∈ ∗ N∞ and sets L η (σ) = st
η 1 ∗ σ(n) . η n=1
η
Here, n=1 is the nonstandard extension, evaluated at η, of the usual summation operator that sums a sequence from 1 to k. Theorem 2.8.16 For any η ∈ ∗ N∞ , L η is a Banach Limit. Proof The map L η is clearly linear. The sum of any finite set of limited numbers divided an unlimited integer is infinitesimal. Moreover, given any ε > 0, a bounded sequence, after a finite number of terms, takes values between its lim inf −ε and its lim sup +ε. Therefore, for all limited m ∈ N, η η 1 1 ∗ σ(n) − η η−m n=1
n=m+1
∗ σ(n)
η η m 1 1 1 ∗ ∗ ∗ ≤ σ(n) − σ(n) + σ(n) η η−m η−m n=1
n=1
η m 1 ∗ σ(n) 0. η − m η n=1
n=1
58
P.A. Loeb
It follows that for any σ ∈ ∞ and any ε > 0, lim inf(σ)−ε ≤ L η (σ) ≤ lim sup(σ)+ ε, whence the inequality is true ε = 0. It is clear that shifting a bounded sequence
with η σ to the left by 1 changes η1 n=1 ∗ σ(n) by only an infinitesimal amount, and thus does not change the value of L η (σ).
2.9 Saturation We will need one more general property of monomorphisms when dealing with mathematical objects such as Banach spaces and topological spaces. It is a useful “compactness” property. Fix an uncountable cardinal number κ. Let us say that a set A is κ-small if its cardinal number satisfies the inequality card(A) < κ. Recall that a binary relation P is concurrent or finitely satisfiable on a set A contained in its domain if for each n ∈ N and each finite set {x1 , . . . , xn } ⊆ A there is a y in the range of P such that xi , y ∈ P for 1 ≤ i ≤ n. Definition 2.9.1 A nonstandard superstructure V (∗ X ) is κ -saturated if for each internal binary relation P ∈ V (∗ X ) and each κ-small subset A (internal or external) of the domain of P such that P is concurrent on A there is an element b in the range of P such that a, b ∈ P for each a ∈ A. For κ-saturation, the cardinal number κ should be determined by V (X ); it must not depend on objects in V (∗ X ), because these can change depending on the monomorphism. It is sufficient, therefore, to fix κ as the first cardinal number larger than the cardinality of the original superstructure V (X ). One then says that V (∗ X ) is polysaturated. Horst Osswald presents a construction of polysaturated models in the appendix, i.e., the next section, of this chapter. A construction of saturated models can also be found in Stroyan and Luxemburg’s book [12]. Theorem 2.9.2 Suppose the nonstandard superstructure V (∗ X ) is κ-saturated and A is a set in V (X ) such that card(A) < κ. Then there is a hyperfinite set B such that for each a ∈ A, ∗ a ∈ B. If D is a set in V (X ) and h : A −→ ∗ D, then there is an internal map H : ∗ A −→ ∗ D such that H (∗ a) = h(a) for each a ∈ A. In particular, if V (∗ X ) is polysaturated, then it is concurrent and an enlargement. Proof The binary relation P such that for each c ∈ ∗ A and each hyperfinite subset B of ∗ A, c, B ∈ P if and only if c ∈ B is concurrent on the set {∗a : a ∈ A}. The binary relation Q on the set of internal mappings with domain contained in ∗ A and range contained in ∗ D such that g, g ∈ Q if and only if g is an extension of g is concurrent on the following set of mappings with singleton domains: {∗a −→ h(a) : a ∈ A}. An internal map G extending all of these maps can be further extended by mapping any c ∈ ∗ A outside the domain of G to a fixed element of ∗ D. Remark 2.9.3 In applications, one often starts with a topological space (X, T ) and works with just a κ-saturated enlargement, where κ > card(T ).
2 An Introduction to General Nonstandard Analysis
59
Theorem 2.9.4 A nonstandard superstructure V (∗ X ) is κ-saturated if and only if for each internal set C and every (internal or external) κ-small collection B consisting of internal subsets of C such that B has the finite intersection property, there is an a ∈ ∩ {B : B ∈ B}. Proof Assume first that V (∗ X ) is κ-saturated, that B ⊆ ∗ P(C) is κ-small, and that B has the finite intersection property. Then the relation P := {B, a | a ∈ B ∈ ∗ P(C)} is internal and concurrent on B ⊆ domain(P). By the assumption, there exists an a such that B, a ∈ P for each B ∈ B. Now assume that P is an internal relation, and that P is concurrent on a κ-small set A ⊆ domain(P). For each a ∈ A set Sa := {b | a, b ∈ P} . Then B := {Sa | a ∈ A} isκ-small and has the finite intersection property. By the assumption, there is a b ∈ B. It follows that a, b ∈ P for each a ∈ A. Theorem 2.9.5 For V (∗ X ), ℵ1 -saturation is equivalent to being denumerably comprehensive. Proof First, assume that V (∗ X ) is at least ℵ1 -saturated. Fix an internal set S, and let an : n ∈ N be an ordinary sequence of elements from S. We must show that this sequence can be extended to an internal sequence an : n ∈ ∗ N in S. For each n ∈ N, Let Bn be the collection of internal maps F from ∗ N into S such that ∀i ≤ n, F(i) = ai . Then Bn is internal. Moreover, B := {Bn | n ∈ N} is κ-small and has ∗ the finite intersection property. Therefore, there exists an internal F : N → S with F ∈ B, whence F(n) = an for each n ∈ N. The proof of the converse is left to the reader. In the following we assume V (∗ X ) be κ-saturated, where κ ≥ ℵ1 . Proposition 2.9.6 If A is an infinite but κ-small set, then A is external. Proof Assume that A is internal. Then {a, b | a, b ∈ A and a = b} is internal and concurrent on A. Therefore, there exists an element b ∈ A such that a = b for all a ∈ A. Proposition 2.9.7 Let κ > card(V (X )) and let A be a set in V (X ). Let ∗ [A] := {∗ a : a ∈ A}. Then ∗ [A] ⊆ ∗ A and ∗ [A] = ∗ A if and only if A is infinite. Proof If a ∈ A, then by transfer, ∗ a ∈ ∗ A. Thus, ∗ [A] ⊆ ∗ A. Suppose A is infinite. Since ∗ [A] is infinite and κ-small, by the previous result, ∗ [A] is external. Since ∗ A is internal, ∗ [A] = ∗ A. On the other hand, if A is finite, say A = {a1 , . . . , ak } , then, ∗
A=
∗
a1 , . . . , ∗ ak = ∗ [A].
Part (i) of the next result shows that each κ-small internal cover of an internal set A contains a finite subcover of A; Part ii implies that the standard part of an internal finitely additive measure on an internal algebra is always σ-additive:
60
P.A. Loeb
Proposition 2.9.8 Fix an internal set C. (i) If A ⊆ C is internal and B is a κ -small subset of ∗ P(C) such that A ⊆ B. Then there exist finitely many B1 , . . . , Bk ∈ B with A ⊆ B1 ∪ · · · ∪ Bk . (ii) If (Ak )k∈N is a strictly or strictly decreasing sequence of internal increasing subsets of C, then k∈N Ak and k∈N Ak are external. Proof (i) Assume that the assertion fails. Then the internal relation R := {B, a | a ∗ X ) be κ-saturated, A ∈A, a ∈ / B ∈ ∗ P(C)} is concurrent on B. Since V ( B. (ii) Assume that B := k∈N Ak is internal. Since N is κ-small, by (i), there exists an m ∈ N such that B ⊆ A1 ∪ · · · ∪ Am = Am Am+1 ⊆ B, which is a contradiction. The rest follows from DeMorgan’s Law and the fact that the complement of an external set is external. Corollary 2.9.9 The sets N, ∗ N∞ , ∗ R∞ , m(0) are external. Proof We have seen that N is external. This also follows from the fact that N is a κ-small set. It now follows that ∗ N∞ = k∈N {m ∈ ∗ N | k < m} is external. With a similar proof, it follows that ∗ R∞ and m(0) are external. Problem: Assume V (∗ X ) is ℵ1 -saturated. Also assume that A is an internal set such that N ∩ A is infinite. Show that for each unlimited natural number M there exists an unlimited natural number K with K ≤ M and K ∈ A. Answer: Fix an unlimited M ∈ ∗ N. For each n ∈ N, let
bn := k ∈ ∗ N | n ≤ k ≤ M and k ∈ A . The bn ’s have the finite intersection property, so there is a K ∈ bn . It follows that K ∈ ∗ N∞ ∩ A and K ≤ M. We conclude this section by listing the numbers of the major tools needed for applications. They are the following: Part (v) of 2.4.3, 2.4.5, 2.6.4, 2.8.4, 2.8.8, 2.8.9, 2.8.12, 2.9.1. For further background, there is the initiating work of [9] and then [7]. Also we note the text of Vakil [13], that employs a weaker form of nonstandard analysis discussed in this book’s preface.
Appendix: Nonstandard Models Horst Osswald Mathematisches Institut der Universität München Theresienstr. 39, D–80333 München, Germany e-mail:
[email protected] As promised in Sect. 2.9, we shall now show that for each superstructure V of cardinality κ there exists a κ+ -saturated superstructure W and a monomorphism ∗ from V into W . (κ+ is the smallest cardinality greater than κ.) By the way, in model theory a monomorphism is often called an elementary embedding.
2 An Introduction to General Nonstandard Analysis
61
The proof is elementary. This means that the ambitious existence of countably incomplete κ+ good ultrafilter is not used as for example in the proof of Theorem 6.1.8 in the book of Chang and Keisler [2]. The idea of the proof is similar to the proof in algebra that each field k has an algebraically closed extension K . In the algebraic proof one has to find roots of polynomials; here we have to find elements in the intersection of families of internal sets. Roughly speaking, in algebra one constructs a suitable transfinite increasing chain (kα )α<γ of fields kα with k0 = k such that, if α is a successor ordinal, then kα contains exactly one root of each irreducible polynomial over kα−1 . If α is a limit number, then kα is the union of the preceding fields. The union of all the kα provides an algebraically closed extension of k. Here we construct a transfinite elementary chain; in the successor step we apply the compactness theorem, which says that for each superstructure and each family of internal sets in the superstructure having the finite intersection property there exists a superstructure—with the same formal properties—in which the intersection of the whole family is non-empty. In the limit case we take the so called elementary limit of the preceding structures. In brief, we are going to prove the following Theorem 2.9.10 Fix a superstructure V (X ) over X of cardinality κ. There exists a superstructure V (Y ) over a set Y such that the following two principles hold: (T) (The Transfer Principle) There exists a monomorphism ∗ from V (X ) into V (Y ) with Y = ∗ X . (PS) (Polysaturation) V (∗ X ) is κ+ -saturated. The proof of this theorem, essentially adopted from the first edition of this book, slightly modifies Sacks [11] corresponding proof for first order logic.
Models In order to work with long chains of superstructures and monomorphisms between them, it is convenient to consider only the internal part of a superstructure, which we will call a model. Let N0 denote the nonnegative integers. A sequence V := (Vn )n∈N0 is called a weak model if V0 = ∅ and Vn ⊂ P(V0 ∪ · · · ∪ Vn−1 ) for each n ≥ 1. The entities in V0 are called the individuals in V , the entities in Vn with n ≥ 1 are called the sets in V . Since individuals and sets must be treated differently in the proof of Theorem 2.9.10, we think of an individual as an object different from a set, in particular, it is different from the empty set. Moreover, since we are not interested in the elements of an individual, we assume that individuals do not contain any elements. Because of the extensionality axiom, “global” individuals don’t exist in Zermelo-Fraenkel (ZF)set theory. In many books on poly-saturated models the existence of such “global” individuals is presupposed. Here we suggest to use “local” individuals instead. Local
62
P.A. Loeb
individuals are individuals relative to the model according to equations (M 1) and (M 2) below. These local individuals are sufficient to lay the foundation for the theory of poly-saturated models and they exist in ZF-set theory. A weak model V = (Vn )n∈N0 is called a model if (M 1) V0 ∩ 1≤n Vn = ∅, which means that individuals in a model are different from sets within the model. (M 2) b ∩ n∈N0 Vn = ∅ for each b ∈ V0 , which means that individuals in a model are empty relative to the model. A model V := (Vn )n∈N0 is called a standard model if Vn = P(V0 ∪ · · · ∪ Vn−1 ) for each n ≥ 1. Recall that, if V := (Vn )n∈N0 is a standard model and X := V0 , then V (X ) = n∈N0 Vn has been called a superstructure over X . If V is a standard model, by induction it is easy to see that Vn (X ) = X ∪ Vn for each n ∈ N0 , where Vn (X ) is defined in 2.2.1 letting V0 (X ) := X and Vn+1 (X ) := Vn (X ) ∪ P(Vn (X )). This shows that Vn (X ) results from Vn by adding the set X of individuals – if V is a standard model. Let S := (Sn )n∈N0 be a sequence of sets. We use the following notation: S<∞ :=
Sn and S≤n := S0 ∪ · · · ∪ Sn .
n∈N0
From Weak Models to Models As was promised in Remark 2.1.3, we will now show that the individuals of a weak model V can be renamed in such a way that V becomes a model. Let us call a (weak) model V := (Vn )n∈N0 a (weak) model over X if V0 = X . Proposition 2.9.11 For each set Y = ∅ there exists a bijection c from Y onto X such that each weak model over X is already a model over X . Proof Set
:= {{b} | b ∈ Y } . Y
) = card(Y ). Define by induction: Then card(Y , Yn+1 := P(Yn ), Y∞ := Y0 := Y
n∈N0
and X := {{{{y} , Y∞ }} | y ∈ Y } .
Yn ,
2 An Introduction to General Nonstandard Analysis
63
Now define c(y) := {{{y} , Y∞ }} and rename the elements y of Y by c(y) ∈ X . Notice that ∪ X. (1) Y∞ is an infinite set, card(a) = 1 and card({a, Y∞ }) = 2 for each a ∈ Y It follows that onto X . Moreover, (2) the function i : a → {{a, Y∞ }} defines a bijection from Y Y0 = Y ∈ / X . Since ∅ ∈ Yn for each n ≥ 1, Yn ∈ / X for each n ∈ N ∪ {∞} . Let V := (Vn )n∈N0 be a weak model over X . By induction over n ∈ N0 , we see / Vk for each n ∈ N0 and each k ∈ {0, . . . , n} . It follows that that Yn ∈ / Vk for each k ∈ N0 . (3) Y∞ ∈ Now we prove that V is a model over X : (4) We first show that X ∩ n≥1 Vn = ∅ : . Then {a, Y∞ } ∈ Vi for Assume that k ≥ 1 and {{a, Y∞ }} ∈ Vk for some a ∈ Y / X = V0 . Therefore, there exists a j < i with some i < k. By (1), {a, Y∞ } ∈ Y∞ ∈ V j , contradicting (3). (5) Finally, we prove that b ∩ n∈N0 Vn = ∅ for each b ∈ X : Fix b = {{a, Y∞ }} ∈ X . Assume that there exist n ∈ N0 and x ∈ Vn with x ∈ b. / X = V0 , thus, n ≥ 1. It follows that there exists Then x = {a, Y∞ }. By (1), x ∈ a k < n with Y∞ ∈ Vk , contradicting (3). It depends on the theory we have in mind, which objects we take either as individuals or as sets. If we are not interested in the shape of elements of a mathematical entity, then this entity can be chosen as an individual. For example, in arbitrary Banach spaces the elements may be individuals, but in the Lebesgue L p -spaces the elements are functions, thus sets. In general, the real numbers should be individuals, but if we want to study real numbers as equivalence classes of Cauchy-sequences of rationals, then real numbers become sets and now the rationals may be chosen to be individuals. Languages for Models Monomorphisms between superstructures are stronger than the usual injective homomorphisms or homeomorphic embeddings. A monomorphism not only preserves the algebraic or topological structure, it preserves every mathematical property. In order to make the phrase “mathematical property” precise, we need a mathematical language strong enough to formalize every mathematical statement. The introduction of this language and its interpretation is similar to the approach at the beginning of this chapter. However, to keep this section self-contained, we repeat the necessary basic facts in a form suitable for the proof of our main result. Given a model V := (Vn )n∈N0 , the alphabet of the language LV has the following symbols: Logical symbols: ∨, ¬, ∃, =, ∈. Variables: A countable number of them will do. Parameters: The elements of V<∞ are the parameters in LV . Auxiliary symbols: Parentheses “( ”,“ )”, point “·” and comma “,”.
64
P.A. Loeb
We assume that all these symbols are pairwise distinct. A sentence in LV is built up inductively from these rules: (a) If a, b ∈ V<∞ , then (a ∈ b) and (a = b) are sentences in LV . Remark 2.9.12 It should be mentioned that in the sentence a ∈ b the term a may be a set, in particular, a is an n-tuple of elements in V<∞ (see Part (4) of Proposition 2.9.14). See also the proof of Part (10) in that proposition. It would follow that b is an n -placed relation. (b) If A and B are sentences in LV , then (A ∨ B) and (¬A) are sentences in LV . (c) Let A be a sentence in LV and let a, b be parameters in LV . If x is a variable, not occurring in A, then (∃x ∈ a Ab (x)) is a sentence in LV . Here Ab (x) is the string of signs of the alphabet of LV that results from A by replacing each b, where b occurs in A, with x. (d) The set of sentences of LV is the smallest set having the properties (a), (b) and (c). A formula in LV results from a sentence in LV by replacing some parameter with a variable x, not occurring in the sentence. We then say that x is free in the formula. We shall write A(x), to indicate that x is free in the formula A. As usual,we use the following abbreviations: (A → B) for ((¬A) ∨ B), (A ∧ B) for (¬((¬A) ∨ (¬B))), (A ↔ B) for ((A → B) ∧ (B → A)), (∀x ∈ a Ab (x)) for (¬(∃x ∈ a(¬Ab (x)))). In order to save parentheses we agree that ¬, ∃, ∀ bind stronger than ∧, ∧ binds stronger than ∨, ∨ binds stronger than →, and → binds stronger than ↔ . The relation “binds stronger” is transitive. Moreover, we will use the following shorthand ∃x1 , . . . , xk ∈ a A for ∃x1 ∈ a . . . ∃xk ∈ a A, ∀x1 , . . . , xk ∈ a A for ∀x1 ∈ a . . . ∀xk ∈ a A.
Interpretation of the Language Fix a model V := (Vn )n∈N0 . Here is the truth predicate for sentences in LV : (a) Fix a, b ∈ V<∞ . (i) The sentence a = b is true in V if a = b in the sense of common set theory. (ii) The sentence a ∈ b is true in V if a ∈ b in the sense of common set theory. By the definition of models, a ∈ b can never become true in V if b is an individual of V . (b) Let A, B be sentences of LV . (i) ¬A is true in V if A is not true in V . (ii) A ∨ B is true in V if A is true in V or B is true in V .
2 An Introduction to General Nonstandard Analysis
65
(iii) ∃x ∈ a A(x) is true in V if there exists a c ∈ a such that A(x)(c) is true in V . Here the sentence A(x)(c) results from the formula A(x) by replacing x with c. Following a common practice in model theory, we will write V |= A to denote that A is true in V . We conclude this section with a remark concerning individuals. Remark 2.9.13 Recall that if ∅ is a set in the model V , then a is an individual in V iff a = ∅ and V |= ¬∃x(x ∈ a). (Here ∃x(x ∈ a) is a shorthand for the cumbersome formula ∃x ∈ a(x ∈ a).) In particular, when we assume that the positive integers are individuals in V , we mean they are coded so that they never appear as sets in the model.
Models Closed Under Definition We will now study models having nice closure properties: A model V := (Vn )n∈N0 is called closed under definition if for each formula A(x) in LV and each n ∈ N0
a ∈ V≤n | V |= A(x)(a) ∈ Vn+1 . Proposition 2.9.14 Suppose that V is closed under definition and the positive integers are individuals in V , coded according to Proposition 2.9.11. Fix n ≥ 1. Then (1) (2) (3) (4) (5) (6) (7) (8) (9) (10)
V1 ⊆ · · · ⊆ Vn ⊆ . . . , thus V≤n = V0 ∪ Vn . If F is a finite subset of V≤n−1 , then F ∈ Vn . If a, b ∈ V≤n−1 , then a, b := {{a, b} , {b}} ∈ Vn+1 . Fix a1 , . . . , ak ∈ V≤n−1 . Then (a1 , . . . , an ) := {1, a1 , . . . , k, ak } ∈ Vn+2 . Fix a1 , . . . , ak ∈ Vn . Then a1 ∪ · · · ∪ ak ∈ Vn and a1 ∩ · · · ∩ ak ∈ Vn . If a, b ∈ Vn , then a\b ∈ Vn . ∅ ∈ Vn . V≤n−1 ∈ Vn . Vn−1 ∈ Vn . Fix a1 , . . . , ak ∈ Vn . Then a1 × · · · × ak = {(α1 , . . . , αk ) | α1 ∈ a1 ∧ · · · ∧ αk ∈ ak } ∈ Vn+3 .
Proof (1) Let c ∈ Vn . Then
c = a ∈ V≤n | V |= a ∈ c = a ∈ V≤n | V |= (x ∈ c) (a) ∈ Vn+1 . (2) Let F = {a1 , . . . , ak }. Then
66
P.A. Loeb
F = a ∈ V≤n−1 | V |= a = a1 ∨ · · · ∨ a = ak ∈ Vn . (3) and (4) These follow from (2). (5) Since a1 , . . . , ak ⊆ V≤n−1 ,
a1 ∪ · · · ∪ ak = a ∈ V≤n−1 | V |= a ∈ a1 ∨ · · · ∨ a ∈ ak ∈ Vn . The proof for “∩” is similar. (6) The proof of (6) is similar to the proof of (5). (7) This is true, because
∅ = a ∈ V≤n−1 | V |= ¬a = a ∈ Vn .
(8) V≤n−1 = a ∈ V≤n−1 | V |= a = a ∈ Vn . (9) By (8), V0 = V≤0 ∈ V1 . Let n > 1. Then, by (1), (6) and (8), Vn−1 = V≤n−1 \V0 ∈ Vn . (10) Using (4), we obtain:
a1 × · · · × ak = a ∈ V≤n+2 | V |= ∃x1 ∈ a1 . . . ∃xk ∈ ak (a = (x1 , . . . , xk )) ∈ Vn+3 ,
where we have used the following abbreviations: a = (x1 , . . . , xk ) for ∀x ∈ V≤n+1 (x ∈ a ↔ x = 1, x1 ∨ · · · ∨ x = k, xk ) , x = i, xi for ∀z ∈ V≤n (z ∈ x ↔ z = {i, xi } ∨ z = {xi }), z = {i, xi } for ∀y ∈ V≤n−1 (y ∈ z ↔ y = i ∨ y = xi ), z = {xi } for ∀y ∈ V≤n−1 (y ∈ z ↔ y = xi ). Elementary Embeddings Fix a model V = (Vn )n∈N0 closed under definition and a second model W = (Wn )n∈N0 . Suppose ∗ is a mapping from V<∞ into W<∞ , and recall that, if A is a sentence (formula) in LV , then the ∗-transform of A, ∗ A, is the sentence (formula) in LW obtained by replacing each parameter a in A by ∗ (a). We will write ∗ a instead of ∗(a). A mapping ∗ : V<∞ → W<∞ is called an elementary embedding from V into W if (E 1) and (E 2) are true: (E 1) For each n ∈ N0 , ∗ Vn = Wn . It follows that Wn ∈ W<∞ . (E 2) (Transfer Principle) For each sentence A in LV V |= A ⇔ W |= ∗ A. Proposition 2.9.15 Let ∗ be an elementary embedding from V into W and assume that the positive integers are individuals in V , coded according to Proposition 2.9.11. We obtain for each n ∈ N0 :
2 An Introduction to General Nonstandard Analysis
67
(1) The restriction ∗ Vn maps Vn into Wn , and Wn is a set in W . (2) The map ∗ is injective and a homomorphism, that is, for a, b ∈ V<∞ , a ∈ b ⇔ ∗ a ∈ ∗ b. (3) ∗(V≤n ) = W≤n ∈ Wn+1 . (4) W is closed under definition. (5) Let A(x) be a formula in LV . Then ∗
a ∈ V≤n | V |= A(x)(a) = b ∈ W≤n | W |= ∗ (A(x)) (b) .
(6) If {a1 , . . . , ak } is a finite subset of V<∞ , then ∗ {a1 , . . . , ak } = {∗ a1 , . . . , ∗ ak } , in particular, ∗ ∅ = ∅. It follows that ∗ (a1 , . . . , ak ) = (∗ a1 , . . . , ∗ ak ) for all a1 , . . . , ak ∈ V<∞ . (7) Let a1 , . . . , ak , a be sets in V . Let f be a set in V and a mapping from a1 × · · · × ak into a. Then ∗ f is a mapping from ∗ a1 × · · · × ∗ ak into ∗ a such that for all (b1 , . . . , bk ) ∈ a1 × · · · × ak , ∗
( f (b1 , . . . , bk )) = ∗ f (∗ b1 , . . . , ∗ bk )).
If, in addition, f is surjective onto a or injective or bijective, then ∗ f is surjective onto ∗ a, injective or bijective, respectively. Proof (1) Since V is closed under definition, Vn ∈ Vn+1 and Vn = ∅ for all n ∈ N0 . Let a ∈ Vn . Since V |= a ∈ Vn , W |= ∗ a ∈ Wn . Therefore, Wn is a set in W and ∗ a ∈ Wn . (2) Let a, b ∈ V<∞ . Then a = (∈) b ⇔ V |= a = (∈) b ⇔ W |= ∗ a = (∈) ∗ b ⇔ ∗ a = (∈) ∗ b. (3) If n = 0, the result follows from (E 1). Let n ≥ 1. Set A := ∀x ∈ V≤n (x ∈ V0 ∨ x ∈ Vn ) ∧ ∀x ∈ V0 (x ∈ V≤n ) ∧ ∀x ∈ Vn (x ∈ V≤n ). Since A is true in V , ∗ A is true in W . It follows that ∗ V≤n = W0 ∪ Wn . Since ∀x ∈ Vk (x ∈ Vk+1 ) is true in V for each k ≥ 1 and therefore the ∗ -image of this sentence is true in W for all k ≥ 1, by (E 1), W≤n = W0 ∪ Wn . (4) Let A(x) be a formula in LW . There is an m ∈ N0 such that the parameters b1 , . . . , bk occurring in A(x) belong to W≤m . Since V is closed under definition, V |= ∀x1 , . . . , xk ∈ V≤m ∃z ∈ Vn+1 ∀x ∈ V≤n (x ∈ z ↔ A(x, x1 , . . . , xk ))) where A(x, x1 , . . . , xk ) results from A(x) by replacing bi with xi , i = 1, . . . , k. We may assume that x1 , . . . , xk do not occur in A(x) and are pairwise different. Since ∗ V≤n = W≤n and ∗ Vn = Wn , we obtain the fact that
68
P.A. Loeb
W |= ∀x1 , . . . , xk ∈ W≤m ∃z ∈ Wn+1 ∀x ∈ W≤n (x ∈ z ↔ A(x, x1 , . . . , xk )). It follows that for b1 , . . . , bk there is a B ∈ Wn+1 such that for a ∈ W≤n , a ∈ B iff W |= A(x)(a). Therefore, B = a ∈ W≤n | W |= A(x)(a) . (5) Define B := a ∈ V≤n | V |= A(x)(a) . Then B ∈ Vn+1 and V |= ∀x ∈ V≤n (x ∈ B ↔ A(x)). Since W |= ∀x ∈ W≤n (x ∈ ∗ B ↔ ∗ A(x)) and ∗ B ∈ Wn+1 , we obtain ∗
B = b ∈ W≤n | W |= ∗ A(x)(b) .
(6) Let {a1 , . . . , ak } ⊆ V≤n . Then, by (1), {∗ a1 , . . . , ∗ ak } ⊆ W≤n and, by (5), ∗
{a1 , . . . , ak } = ∗ a ∈ V≤n | V |= a = a1 ∨ · · · ∨ a = ak =
b ∈ W≤n | W |= b = ∗ a1 ∨ · · · ∨ b = ∗ ak = ∗ a1 , . . . , ∗ ak .
Moreover, we obtain for each n ≥ 1, ∗
∅ = ∗ a ∈ V≤n−1 | V |= ¬a = a = b ∈ W≤n−1 | W |= ¬b = b = ∅.
(7) We identify the k-placed functions f with the 2-placed relations, where the second argument is uniquely determined by the first argument, which is a ktuple. Therefore, if f is a k-placed function, we can use xk+1 = f (x1 , . . . , xk ) as a shorthand for ((x1 , . . . , xk ) , xk+1 ) ∈ f. We may assume that there exists an n ≥ 1 such that a1 , . . . , ak , a, f ∈ Vn . Then V |= A1 ∧ A2 ∧ A3 , where A1 := ∀x1 ∈ a1 . . . ∀xk ∈ ak ∃y ∈ a(((x1 , . . . , xk ) , y) ∈ f ). A2 := ∀x1 , . . . , xk , y ∈ V≤n (((x1 , . . . , xk ), y) ∈ f → x1 ∈ a1 ∧· · ·∧xk ∈ ak ∧y ∈ a). A3 := ∀x1 , . . . , xk , y, y ∈ V≤n (((x1 , . . . , xk ), y) ∈ f ∧ (x1 , . . . , xk ), y ∈ f → y = y ).
A1 means that domain( f ) ⊇ a1 × · · · × ak ; A2 means that range( f ) ⊆ a and domain( f ) ⊆ a1 × · · · × ak ; A3 means that f is a function. Since V |= A1 ∧ A2 ∧ A3 and therefore W |= ∗ A1 ∧ ∗ A2 ∧ ∗ A3 , we see that ∗ f is a mapping from ∗ a × · · · × ∗ a into ∗ a. Assume that f (b , . . . , b ) 1 k 1 k = b. Then ((b1 , . . . , bk ), b) ∈ f and ((∗ b1 , . . . , ∗ bk ), ∗ b) ∈ ∗ f. Therefore, ∗ f (∗ b , . . . , ∗ b ) = ∗ b = ∗ ( f (b , . . . , b )). The other parts of assertion (7) can 1 k 1 k be proved in a similar way. ∗a
Since ∗ is injective, we may identify each individual a in V with the individual in W .
2 An Introduction to General Nonstandard Analysis
69
Saturated Models Let us call a non-empty family B of sets deep if B has the finite intersection property, which means that the intersection of all finitely many sets in B is nonempty. Let γ be an uncountable cardinal. A model V = (Vn )n∈N0 , which is closed under definition, is called γ-saturated if for each n ≥ 1
B = ∅ for each deep B ⊆ Vn with card(B) < γ.
It follows that the internal sets in a γ-saturated model can be treated as though they were γ-compact (see Sect. 2.9).
From Pre-models to Models In order to deal with ultrapowers and limits of elementary chains under the same roof, we introduce the notion “pre-model”: A triple ((Pn )n∈N0 , ∼, E) is called a pre-model if the following conditions are true: (PM 1) P0 = ∅ and P0 ∩ n≥1 Pn = ∅. (PM 2) ∼ is an equivalence relation on P<∞ such that for all a, b ∈ P<∞ with a ∼ b and all n ∈ N0 : a ∈ Pn ⇒ b ∈ Pn . (PM 3) The relation E is a subset of P<∞ × P1≤ such that for a, a , b, b ∈ P<∞ with a ∼ a and b ∼ b: a E b ⇒ a E b . (PM 4) Transitivity. If n ≥ 1 and a ∈ Pn , then b E a ⇒ b ∈ P≤n−1 . (PM 5) Extensionality. Fix a, b ∈ P1≤ . Then a ∼ b if (cEa ⇔ cEb) for all c ∈ P<∞ . Later on we shall be concerned with two important examples of pre-models. Fix a pre-model P = ((Pn )n∈N0 , ∼, E). Using the Mostowski Collapsing Function (see Definition 2.5.3), we define inductively on n ∈ N0 sets Vn and their elements a n such that (Vn ) becomes a model generated by P. a 0 := {b ∈ P0 | b ∼ a}. According to Proposition 2.9.11, we For each a ∈ P0 set 0 to a 0 such that the function a 0 → a 0 defines a bijection between can rename each a
0
0 a | a ∈ P0 and a | a ∈ P0 , and such that each weak model over V0 := {a 0 | a ∈ P0 } is a model over V0 .
70
P.A. Loeb
Now let n ≥ 1 and assume that a k is already defined for each k < n and each a ∈ Pk . Moreover, assume that Vk := {a k | a ∈ Pk } for each k < n. If a ∈ Pn , set a n := {ck | k < n, c ∈ Pk and c E a}. and Vn := {a n | a ∈ Pn }. Since P0 = ∅, we have V0 = ∅. Since for n ≥ 1, a n ⊂ V≤n−1 , we see that Vn ⊂ P(V≤n−1 ). It follows that V := (Vn )n∈N0 is a model over V0 . We say that the model V is generated by ((Pn )n∈N0 , ∼, E). Proposition 2.9.16 Fix n, m ∈ N0 and a ∈ Pn and b ∈ Pm . Then m
a n = b ⇔ a ∼ b. Proof We prove this result by induction on μm,n := max {n, m}. 0 a0 = b0 ⇔ a ∼ b. For the Let μm,n = 0. Then, by the definition of a 0 , a 0 = b ⇔ induction step let μm,n > 0. m Assume that a n = b . Since V0 ∩ V1≤ = ∅, we have 1 ≤ n, m. Suppose that c E a m holds. By (PM 4), there exists an i < n with c ∈ Pi . It follows that ci ∈ a n = b . j Therefore, there exists j < m and an element d ∈ P j with d E b and d = ci . Since μi, j < μm,n , by the induction hypothesis, d ∼ c. From (PM 3) it follows that c E b holds. Therefore, c E a implies c E b and vice versa. By (PM 5), a ∼ b. Now assume that a ∼ b. Since m > 0 or n > 0, by (PM 1) and (PM 2), 1 ≤ m, n. m In order to show that a n = b , let ci ∈ a n with i < n, c ∈ Pi and c E a. By (PM 3), m c E b. By (PM 4), there exists a j < m with c ∈ P j . It follows that c j ∈ b . Since m m μi, j < μm,n , by the induction hypothesis, ci = c j ∈ b . This proves that a n ⊆ b . m The proof of a n ⊇ b is similar. In view of the previous proposition, we may define for each a ∈ P<∞ , a := a n if a ∈ Pn . We obtain the following Corollary 2.9.17 Fix a, b ∈ P<∞ . Then (1) a = b ⇔ a ∼ b. (2) a ∈ b ⇔ a E b. Proof We only need to prove (2). Assume that a ∈ b. Since V is a model, b is a set in V. Therefore, there exist i ∈ N0 and a ∈ Pi with a E b and a = a. By (1), a ∼ a. By (PM 3), a E b. Now assume that a E b. Since b ∈ Pn for some n ≥ 1, by (PM 4), there exists an i < n with a ∈ Pi . By the definition of b, a ∈ b.
2 An Introduction to General Nonstandard Analysis
71
Ultrapowers The aim of this section is the proof of the compactness theorem. We assume that the reader is familiar with filters and ultrafilters (see Sect. 1.2). First we will construct a pre-model generating the ultrapower of a model V : Fix a non-empty set I , an ultrafilter D on I and a model V = (Vn )n∈N0 , which is closed under definition. We shall use the following abbreviations. If (ai )i∈I is an I -sequence, then we will write (ai ) instead of (ai )i∈I . If F(i) is an assertion about elements i of I , then we shall write F(i) a.e. instead of {i ∈ I | F(i)} ∈ D. For each n ∈ N0 set Fn := {(ai ) | ai ∈ Vn a.e.}. On F<∞ =
n∈N0
Fn we define a relation ∼ by setting (ai ) ∼ (bi ) :⇔ ai = bi a.e.
and we define a relation E ⊆ F<∞ × F1≤ by setting, (ai ) E (bi ) :⇔ ai ∈ bi a.e. Using the ultrafilter properties one can easily prove the following Lemma 2.9.18 The triple (Fn )n∈N0 , ∼, E is a pre-model. The model generated by ((Fn )n∈N0 , ∼, E) is called the D-ultrapower of V and is denoted by D (V ). Now we shall show that the D-ultrapower of V has the same formal properties as V . Let A be a (sentence) formula in L D (V ) . By (Ai )i∈I we denote one of the I -sequences of sentences (formulas) in LV that result from A by replacing each i )i∈I is another result, then A i = Ai a.e. parameter (ai ) in A by ai . Notice that, if ( A Theorem 2.9.19 (Theorem of Ł˘os) Fix a sentence A in L D (V ) . Then, D (V ) |= A ⇔ V |= Ai a.e. Proof By induction over the definition of the sentences in L D (V ) . (1) (a) Let A be the sentence ((ai ) ∈ (bi )). Then D (V ) |= (ai ) ∈ (bi ) ⇔ (ai ) ∈ (bi ) ⇔ ai ∈ bi a.e. ⇔ V |= Ai a.e. (b) If A = ((ai ) = (bi )), then the proof is similar to the proof under (a). (2) (a) Let A = B ∨ C be a formula of L D (V ) . Assume that the assertion is true for B and C.
72
P.A. Loeb
Suppose that D (V ) |= A. Then D (V ) |= B or D (V ) |= C. By the induction hypothesis, V |= B i a.e. or V |= C i a.e. Therefore, V |= B i ∨ C i a.e. The result follows, since Ai equals B i ∨ C i a.e. Suppose that V |= B i ∨ C i a.e. Since D is an ultrafilter, V |= B i a.e. or V |= C i a.e. By the induction hypothesis, D (V ) |= B or D (V ) |= C, thus, D (V ) |= B ∨ C. (b) Let A = ¬B. Assume that
D (V ) |= A, thus, D (V ) B. By the
induction hypothesis, / D. Since D is an ultrafilter V B i ∈ D, thus, V |= ¬B i V |= B i ∈ a.e.
/ D, V B i ∈ / D. The induction Assume that V |= ¬B i a.e. Since ∅ ∈ hypothesis implies D (V ) |= ¬B. (c) Let A = ∃x ∈ (ai )B(x). If D (V ) |= A, then there is a (bi ) ∈ (ai ) with D (V ) |= B(x)((bi )). By the induction hypothesis, V |= B(x)i (bi ) a.e. Since bi ∈ ai a.e., V |= Ai a.e.
Suppose that V |= Ai a.e., that is, Y := i ∈ I | V |= ∃x ∈ ai (B(x)i ) ∈ D. We choose an I -sequence (bi ) with V |= bi ∈ ai ∧ B(x)i (bi ) for each / Y . Since V |= B(x)i (bi ) a.e., by the induction i ∈ Y and set bi = ∅ if i ∈ hypothesis, D (V ) |= B(x)((bi )). Since bi ∈ ai a.e., (bi ) ∈ (ai ). It follows that D (V ) |= A. An important consequence of Ł˘os’ theorem is the existence of an elementary embedding from V into the ultrapower D (V ) of V. For each b ∈ V<∞ let (b) be the constant I -sequence (bi ) with bi = b for each i ∈ I . The mapping ∗ : V<∞ → D (V )<∞ is defined by setting ∗
b := (b).
Corollary 2.9.20 The function ∗ is an elementary embedding from V into D (V ). Proof We have to check the properties (E 1) and (E 2). (E 1) Since Vn ∈ Vn+1 , (Vn ) ∈ Fn+1 , thus ∗ Vn = (Vn ) ∈ D (V )<∞ . By Corollary 2.9.17 (2), and since Vn is a set in V , we obtain (ai ) ∈ (Vn ) ⇔ (ai ) E (Vn ) ⇔ (ai ) ∈ D (V )n . This proves that ∗ Vn = D (V )n . (E 2) Let A be a sentence in LV . Then ∗ A is a sentence in L D (V ) and (∗ A)i = A a.e. By the theorem of Ł˘os, we obtain V |= A ⇔ V |= (∗ A)i a.e. ⇔ D (V ) |= ∗ A. Recall from Proposition 2.9.15 (4) that the model D (V ) is closed under definition. Using the ultrapower construction given above, we can prove the Compactness Theorem. The proof is the same as in first order model theory (see [2]).
2 An Introduction to General Nonstandard Analysis
73
Proposition 2.9.21 (The Compactness Theorem) Let V = (Vn )n∈N0 be a model which is closed under definition, let n ≥ 1 and let B be a deep subset of Vn . Then ∗ there exists a model W = (Wn )n∈N0 and ∗an elementary ∗ embedding from V into ∗ W , such that [B] = ∅. (Notice that [B] = { A | A ∈ B}.) Proof We may assume that B is closed under finite intersections. Let I be the set of finite non-empty subsetsof B. Since B is deep, for each i ∈ I there exists an ai ∈ V≤n−1 such that ai ∈ i ∈ B. For each set A ∈ B define := {i ∈ I | ai ∈ A} ⊂ I, A and
| A ∈ B} ⊆ P (I ) . G := { A
for each A ∈ B. Since B = ∅, G = ∅. Now let Then ∅ ∈ / G, because {A} ∈ A ∩ A, B ∈ B. Then A ∩ B ∈ G and it is easy to see that A ∩B ⊂ A B. It follows that F(G) := {B ⊆ I | ∃C ∈ G(C ⊆ B)} is a filter. Fix an ultrafilter D ⊇ F(G) on I . Let D (V ) be the D-ultrapower of V and let ∗ : V<∞ → D (V )<∞ be the elementary embedding introduced previously. Since ai ∈ V≤n−1 and D is an ultrafilter, (ai ) ∈ F≤n−1 , thus (ai ) ∈ D (V )≤n−1 . In order to show that (ai ) ∈ ∗ [B], fix A ∈ B. Then ∈ G ⊂ F(G) ⊂ D. {i ∈ I | V ai ∈ A} = {i ∈ I | ai ∈ A} = A By the theorem of Ł˘os, D (V ) (ai ) ∈ (A) = ∗ A.
Elementary Chains and Their Elementary Limits Here we extend the notions “elementary chain” and “elementary limit” to our notion of a model. Let λ be an ordinal number different from 0 and let (V α )α<λ be a λα of models V α = (Vn )n∈N0 , which are closed under definition. Then the pair sequence (V α )α<λ , (∗αβ )α≤β<λ is called an elementary chain if for each α ≤ β ≤ γ < λ (EC 1) ∗αβ is an elementary embedding from V α into V β . α . (EC 2) ∗αα (a) = a for each a ∈ V<∞ (EC 3) ∗βγ ◦ ∗αβ = ∗αγ . Starting from an elementary chain C we define a pre-model such that the generatedαmodel becomes theelementary limit of C. To this end fix an elementary chain (V )α<λ , (∗αβ )α≤β<λ . We define for each n ∈ N0 , Pn := {(a, α) | α < λ and a ∈ Vnα }. Two elements (a, α), (b, β) ∈ P<∞ = n∈N0 Pn are called equivalent if there exists a γ < λ with α, β ≤ γ such that ∗αγ (a) = ∗βγ (b), in which case we shall write (a, α) ∼ (b, β).
74
P.A. Loeb
For (a, α) ∈ P<∞ and (b, β) ∈ P1≤ we set (a, α) E (b, β) if there exists a γ < λ with α, β ≤ γ such that ∗αγ (a) ∈ ∗βγ (b). The simple proof of the next result is left to the reader : Lemma 2.9.22 ((Pn )n∈N0 , ∼, E) is a pre-model. The model, generated by the pre-model ((Pn )n∈N0 , ∼, E), is called the elementary limit of the elementary chain ((V α )α<λ , (∗αβ )α≤β<λ ) and is denoted by V λ = (Vnλ )n∈N0 . We shall now collect some immediate consequences of the construction of the elementary limit (see Corollary 2.9.17 (1) and (2)). Proposition 2.9.23 Fix an elementary chain ((V α )α<λ , (∗αβ )α≤β<λ ) and let V λ be its elementary limit. Fix n ∈ N0 . Then (1) Vnλ = {(a, α) | (a, α) ∈ Pn } = {(a, α) | α < λ and a ∈ Vnα }. α ,b ∈ Vβ . Fix α, β < λ, a ∈ V<∞ <∞ (2) (a, α) = (b, β) iff there is a γ with α, β ≤ γ < λ and ∗αγ (a) = ∗βγ (b). It follows that (a, α) = (∗αβ (a), β) if α ≤ β < λ. (3) (a, α) ∈ (b, β) iff there is a γ < λ with α, β ≤ γ and ∗αγ (a) ∈ ∗βγ (b). β
(4) (Vnα , α) = (Vn , β), because for each γ with α, β ≤ γ < λ ∗αγ (Vnα ) = Vnγ = ∗βγ (Vnβ ). Definition 2.9.24 Fix an elementary chain ((V α )α<λ , (∗αβ )α≤β<λ ) and its elemenα → V λ by setting tary limit V λ . For each α ≤ λ define a mapping ∗αλ : V<∞ <∞ ∗αλ (a) := (a, α) for α < λ and ∗λλ (a, α) := (a, α). Proposition 2.9.25 Fix α < λ. Then ∗αλ is an elementary embedding from V α into V λ . It follows that, ((V α )α<λ+1 , (∗αβ )α≤β<λ+1 ) is an elementary chain. Proof To prove (E 1), fix n ∈ N0 . We have to show that Vnλ = ∗αλ (Vnα ) : β “⊆” Fix x ∈ Vnλ . Then there exist β < λ and b ∈ Vn with x = (b, β). By (3) and (4), β
x = (b, β) ∈ (Vn , β) = (Vnα , α) = ∗αλ Vnα . “⊇” Fix x ∈ ∗αλ (Vnα ) = (Vnα , α). Since Vnα is a set in V α , Vnα , α is a set in V λ . β 2.9.17 Therefore, there αexist β < λ and b ∈ V<∞ with x = (b, β). By Corollary γ (2), (b, β) E Vn , α , thus there exists a γ ≥ α, β with ∗βγ (b) ∈ ∗αγ (Vnα ) = Vn . We obtain x = (b, β) = (∗βγ (b), γ) ∈ Vnλ .
2 An Introduction to General Nonstandard Analysis
75
This proves (E 1). To prove (E 2), we will prove by induction on the definition of the sentences A in α<λ LV α that if A is in LV α then V α |= A ⇔ V λ |=
∗αλ
A.
(1) (a) Let A = (a ∈ b). Then V α |= A ⇔ a ∈ b ⇔ (a, α) E (b, α) ⇔ (a, α) ∈ (b, α) ⇔ ∗αλ (a) ∈ ∗αλ (b) ⇔ V λ |= ∗αλ A. The second “⇐ ” of the previous computation can be seen as follows: If (a, α) E (b, α), then ∗αγ (a) ∈ ∗αγ (b) for some γ ≥ α, thus a ∈ b. (b) If A = (a = b), then the proof is similar to the proof of (1) (a). (2) If A = (B ∨ C) or A = ¬B, then the assertion follows immediately from the induction hypothesis. Let A = ∃x ∈ a B(x). Assume that V α |= A. Thenthere is a b ∈ a with V α |= B(x)(b). By the induction hypothesis, V λ |= ∗αλ B(x) (b, α) . Since (b, α) ∈ (a, α), V λ |= ∗αλ A. Assume that V λ |= ∗αλ A. Then V λ |= ∗αλ B(x) (b, δ) for some (b, δ) ∈ (a, α). δ), Vλ |= ∗γλ (∗αγ B(x) ∗δγ (b) ). Set γ := max{α, δ}. Since (∗δγ (b), γ) = (b, By the induction hypothesis, V γ |= ∗αγ B(x) ∗δγ (b) . Since ∗δγ (b) ∈ ∗αγ (a), we obtain the fact that V γ |= ∃x ∈ ∗αγ (a) ∗αγ B(x), that is, V γ |= ∗αγ ∃x ∈ a B(x). Since α , we obtain V α |= ∃x ∈ a B(x). the parameters of ∃x ∈ a B(x) belong to V<∞
Existence of Polysaturated Nonstandard Models In order to finish the proof of Theorem 2.9.10, we need one more lemma. Lemma 2.9.26 Fix a model W := (Wn )n∈N0 that is closed under definition. Then there exist a model U and an elementary embedding ∗ from W into U such that for each n ≥ 1 and each deep B ⊆ Wn , ∗ [B] = ∅. Proof There exists a cardinal number θ and a listing (Bα )α<θ of all deep subsets Bα ⊆ Wn for some n ≥ 1. By transfinite recursion, we define an elementary chain ((W α )α<θ , (∗αβ )α≤β<θ ) in the following way: Set W 0 := W, ∗00 := identity W<∞ .
76
P.A. Loeb
Assume that λ < θ and that W α , ∗αβ are already defined for each α ≤ β < λ such that the following conditions (1, λ) and (2, λ) hold: (1, λ) ((W α )α<λ , (∗αβ )α≤β<λ ) is an elementary chain, (2, λ)
∗0α
Bβ = ∅ for each β < α < λ.
In order to construct W λ and ∗αλ for each α < λ, we have to consider two cases: Case 1: λ = γ + 1 is a successor ordinal. Since Bγ is deep and ∗0γ is an elementary embedding from W 0 into W γ , ∗0γ Bγ λ is deep. By the compactness theorem, there exist a model W and an elementary ∗γλ ◦∗0γ γ λ Bγ = ∅. We now define for embedding ∗γλ from W into W , such that each α ≤ γ: λ . ∗αλ := ∗γλ ◦ ∗αγ and ∗λλ := identity W<∞ Notice that (1, λ + 1) and (2, λ + 1) are true. Case 2: λ is a limit ordinal. Let W λ be the elementary limit of ((W α )α<λ , (∗αβ )α≤β<λ ). For each α < λ, the elementary embedding ∗αλ from W α into W λ is defined in Definition 2.9.24. Set λ . Notice that (1, λ + 1) and (2, λ + 1) are true. ∗λλ := identity W<∞ We thus obtain an elementary chain ((W λ )λ<θ , (∗αλ )α≤λ<θ ). Let U := W θ be its 2.9.25, ∗ is an elementary embedding elementary limit. Set ∗ := ∗0θ . By Proposition from W into U. It is easy to check that ∗ [B] = ∅ if B is a deep subset of Wn with n ≥ 1. Now we are able to finish the proof of Theorem 2.9.10: fix a superstructure V (X ) of cardinality κ. Let V = (Vn )n∈N0 be the standard model over X . Then V (X ) = V<∞ . Let κ+ be the smallest cardinal number greater than κ. Then κ+ is a regular cardinal, that is, for each ρ < κ+ and each ρ-sequence (αβ )β<ρ in κ+ , supβ<ρ αβ < κ+ . By transfinite recursion, we construct again an elementary chain ((V α )α<κ+ , (∗αβ )α≤β<κ+ ): Set V 0 := V and ∗00 := identity V<∞ . Fix an ordinal λ strictly between 0 and κ+ and assume that there already exists an elementary chain ((V α )α<λ , (∗αβ )α≤β<λ ) such that for α, β < λ, α < β, n ≥ 1 and each deep B ⊆ Vnα , We now define V λ and ∗αλ for each α ≤ λ: First assume that λ = γ + 1 is a successor ordinal.
∗αβ
[B] = ∅
(♦λ)
2 An Introduction to General Nonstandard Analysis
77
Then, by Lemma 2.9.26, there exists a model V λ and an elementary embedding γ ∗γλ from V γ into V λ , such that for each n ≥ 1 and each deep B ⊆ Vn , ∗γλ [B] = ∅. For each α < λ set λ . ∗αλ := ∗γλ ◦ ∗αγ and ∗λλ := identity V<∞
Note that ((V α )α<λ+1 , (∗αβ )α≤β<λ+1 ) is an elementary chain such that ♦ (λ + 1) is true. Now assume that λ is a limit number. Let V λ be the elementary limit of ((V α )α<λ , (∗αβ )α≤β<λ ). For α ≤ λ let ∗αλ be the elementary embedding from V α into V λ , defined in Definition 2.9.24. Notice that ((V α )α<λ+1 , (∗αβ )α≤β<λ+1 ) is an elementary chain and ♦ (λ + 1) is true. + Let W := V κ be the elementary limit of ((V α )α<κ+ , (∗αβ )α≤β<κ+ ). Set ∗ := ∗0κ+ Then ∗ is an elementary embedding from V into W . To prove that W is κ+ -saturated, fix n ≥ 1 and a deep set B ⊆ Wn with card(B) < κ+ . Since κ+ is a regular cardinal, there exists a δ < κ+ such that B ⊆ (b, β) | β < δ and b ∈ Vnβ . Set B :=
∗βδ (b)
β | β < δ, b ∈ Vn and (b, β) ∈ B . Notice that
B
∗δκ+ [B ]
= B
((V α )
is deep. From the construction of and that α<κ+ , (∗αβ )α≤β<κ+ ) it follows that ∗δ(δ+1) B = ∅. Since ∗(δ+1)κ+ is an elementary embedding from V δ+1 into + V κ = W, by (EC 3),
B=
∗δκ+
∗ (δ+1)κ+ ◦ ∗δ(δ+1) B = ∅. B =
Now set Y := W0 . Then Y = ∗ V0 = ∗ X , and V (Y ) is a superstructure over Y with W<∞ ⊆ V (Y ). Finally, we will prove that the internal entities in V (Y ) coincide with the elements in W<∞ : Let B ∈ V (Y ) be internal. Then there exists an A ∈ V (X )\X = V1≤ with B ∈ ∗ A. Since ∗ A ∈ W1≤ , we obtain the fact that B ∈ W<∞ . Now assume that B ∈ W<∞ . Then B ∈ Wn = ∗ Vn for some n ∈ N0 . This proves that B is internal. We end this appendix with an important application of Theorem 2.9.10. Theorem 2.9.27 Fix an internal set A in V (∗ X ), a set I in V (X ) and a mapping f : I → A. Then there exists an internal mapping F : ∗ I → A with f (i) = F(∗ i) for all i ∈ I . Since the mapping ∗ is injective, we may identify i ∈ I with ∗ i ∈ ∗ I , thus F is an internal extension of f . Proof If I is standard finite, then f is internal, thus f is its own internal extension (see Proposition 2.9.14 Part (2)). Therefore, we may assume that I is infinite, thus A = ∅. Let E (I ) denote the set of all finite subsets of I . For all E ∈ E (I ) define, B E := F :
∗
I → A | F is internal and ∀i ∈ E f (i) = F(∗ i) .
78
P.A. Loeb
Note that B E is internal for each E ∈ E (I ) and that there exists an internal F : ∗ I → A such that f (i) = F(∗ i) for all i ∈ E. Therefore, {B | E ∈ E (I )} has the E finite intersection property. Since its cardinality is smaller than κ+ , there exists an F ∈ B E for all E ∈ E (I ). This F is an internal extension of f .
References 1. R.M. Anderson, A non-standard representation for Brownian motion and Itô integration. Isr. J. Math. 25, 15–46 (1976) 2. C.C. Chang, H.J. Keisler, Model Theory (North-Holland, Amsterdam, 1973) 3. N.G. de Bruijn, P. Erd˝os, A color problem for infinite graphs and a problem in the theory of relations. Proc. Kon. Nederl. Akad. v. Wetensch. Ser. A 54, 371–373 (1951) 4. A.E. Hurd, P.A. Loeb, An Introduction to Nonstandard Real Analysis (Academic Press, Orlando, 1985) 5. P.A. Loeb, Conversion from nonstandard to standard measure spaces and applications in probability theory. Trans. Am. Math. Soc. 211, 113–122 (1975) 6. P.A. Loeb, H. Osswald, Nonstandard integration theory in topological vector lattices. Monatshefte fur Math. 124, 53–82 (1997) 7. W.A.J. Luxemburg, A general theory of monads, in Applications of Model Theory to Algebra, Analysis, and Probability, ed. by W.A.J. Luxemburg (Holt, Rinehart, and Winston, New York, 1969) 8. A. Robinson, On generalized limits and linear functionals. Pac. J. Math. 14, 269–283 (1964) 9. A. Robinson, Non-standard Analysis (North-Holland, Amsterdam, 1966) 10. A. Robinson, E. Zakon, A set-theoretical characterization of enlargements, in Applications of Model Theory to Algebra, Analysis, and Probability, ed. by W.A.J. Luxemburg (Holt, Rinehart, and Winston, New York, 1969) 11. J. Sacks, Saturated Model Theory (Benjamin W.A., Reading, 1972) 12. K. Stroyan, W.A.J. Luxemburg, Introduction to the Theory of Infinitesimals (Academic Press, New York, 1976) 13. N. Vakil, Real Analysis Through Modern Infinitesimals (Cambridge University Press, Cambridge, 2011)
Chapter 3
Topology and Measure Theory Peter A. Loeb
3.1 Metric and Topological Spaces We begin this chapter by showing that nonstandard analysis simplifies many of the ideas in the study of metric and topological spaces. Most of the initial results on topology can be found in Robinson’s book [34]. After some introductory material, we will present a few more recent applications of nonstandard analysis to topology. The chapter concludes with a quick introduction to the applications of nonstandard analysis in measure and probability theory. For this purpose, we assume the reader is familiar with the Carathéodory Extension Theorem. A full introduction to this rich theory, starting from first principles, is given in Horst Osswald’s chapters in this book. We will use the phrase “in a nonstandard extension” to mean a superstructure that is a nonstandard extension of the standard structure containing the initial spaces with which we are working. We will assume that it is at least an enlargement. The standard structure will contain the real numbers. We will write ∗ N∞ for ∗ N\N. The property that our nonstandard model is an enlargement means that for any standard set A in the standard superstructure, there is a hyperfinite set F with {∗ a : a ∈ A} ⊆ F ⊆ ∗ A. We will often use another principle that is valid for a denumerably comprensive or ℵ1 -saturated nonstandard model; in particular, it is valid for an ultrapower model: Any ordinary sequence (i.e., with index set N) in an internal set S is the initial segment of an internal sequence with index set ∗ N taking values in S. With metrics and topologies, one generalizes the notion of “closeness” available in the real and complex number systems. One generalization is provided by metric spaces (X, ρ). Recall that a metric ρ on the set X is a nonnegative real valued function on X × X satisfying the following: For all x, y, z in X , ρ(x, x) = 0, ρ(x, y) = ρ(y, x), ρ(x, z) ≤ ρ(x, y) + ρ(y, z), and if ρ(x, y) = 0, then x = y. If we do not have the P.A. Loeb (B) Department of Mathematics, University of Illinois, 1409 West Green Street, Urbana, IL 61801, USA e-mail:
[email protected] © Springer Science+Business Media Dordrecht 2015 P.A. Loeb and M.P.H. Wolff (eds.), Nonstandard Analysis for the Working Mathematician, DOI 10.1007/978-94-017-7327-0_3
79
80
P.A. Loeb
last property, that is, if we may have ρ(x, y) = 0, for x = y, then ρ is called a semimetric. If X is a vector space, the metric on X is often given by a norm. Recall that this is, a nonnegative function x −→ x on X such that for every scalar α, αx = |α| x (whence 0 = 0 + 0 = 2 0 = 0), x + y ≤ x + y, and x > 0 if x = 0. If the last property does not hold, that is, if we may have an x = 0 for which x = 0, then we use the term seminorm. A norm on a vector space X produces a metric given by ρ(x, y) = x − y for all x, y in X . If the norm is only a seminorm, the corresponding metric is then a semimetric. Of course, a seminorm becomes a norm on the space of equivalence classes where x is equivalent to y when x − y = 0. The resulting norm is well-defined since |x − y| ≤ x − y. Example 3.1.1 An important seminorm in integration theory is given by the mapping f −→ | f | on the space of integrable functions. It is used, for example to define L 1 spaces. In the case of a unit mass at a point x ∈ X , the integrable functions are the real-valued functions on X and the seminorm is given by f −→ | f (x)|. As x ranges over X , this seminorm produces the topology of pointwise convergence. An important seminorm defined on the space of continuous real-valued or complexvalued functions on a topological space (X, T ) replaces a single point with a compact set K and maps f to max x∈K | f (x)|. As K ranges over the set of compact subsets of X , this seminorm produces the topology of uniform convergence on compact subsets of X . For the rest of this section we will speak of a metric space, leaving for the reader the generalization to semimetric spaces. In a metric space, an open ball with center x and radius r > 0, is denoted by B(x, r ) := {y ∈ X : ρ(x, y) < r }. Notice that for points y, the larger the number of balls centered at x that also contain y, the closer y is to x. In a nonstandard extension of a metric space, we define the monad of a point x ∈ X by setting monad(x) = μ(x) :=
∗
B(x, r ) = y ∈ ∗ X :
∗
ρ(x, y) 0 ,
where the intersection is over all positive standard values of r . We use this monad in the same way we use monads on the real line. For example, a standard set O is called open if for each x ∈ O, μ(x) ⊂ ∗ O. There are settings where a metric will not capture the notion we want; we need a topological space. We can talk about topological spaces using a base at each point in essentially the same way that we talk about metric spaces using balls. Definition 3.1.2 Fix a nonempty set X . A neighborhood filter base at a point x ∈ X is a nonempty collection Bx of nonempty subsets of X such that ∀U, V ∈ Bx , ∃W ∈ Bx such that x ∈ W ⊆ U ∩ V.
3 Topology and Measure Theory
81
A subset O of X is an open set with respect to the collection {Bx : x ∈ X } if for each x ∈ O there is a U ∈ Bx with x ∈ U ⊆ O. If the elements of Bx are open sets, then the collection Bx is called an open base at x. We will assume in what follows that X and a neighborhood filter base Bx at each point x ∈ X are given. Notice that in general, we do not require that elements of Bx are open sets. Nevertheless, we still have the following result. Theorem 3.1.3 The open subsets of X form a topology on X . That is, they include X itself and the empty set, and they are stable with respect to the operations of taking finite intersections and arbitrary unions. Problem: Prove Theorem 3.1.3. Example 3.1.4 An example of a base not given by balls in a metric space is the base at points for the topology of pointwise convergence of real-valued functions on an uncountable set X . Here, each point is a function f , and an element of the base specifies a finite number of points x1 , . . . , xn in X and an ε > 0. A function g is in the open base set given by these parameters if for 1 ≤ i ≤ n, |g(xi ) − f (xi )| < ε. To see that the condition for a neighborhood filter base is met, simply take two such sets for a given f , take the union of the two sets of points in X and the smaller of the two ε’s. This gives an open base set contained in the two initial ones. Definition 3.1.5 Given x ∈ X , the monad of x is monad(x) = μ(x) :=
∗
U.
U ∈Bx
As with balls in a metric space, we will indicate that y ∈ μ(x) by writing y x. The near-standard points of ∗ X are the points in the monad of some standard point of X. Remark 3.1.6 Since any finite intersection of elements of Bx contains another element of Bx , there is a W ∈ ∗ Bx with W ⊂ μ(x). In a metric space, one would take a ball of infinitesimal radius. Example 3.1.7 For pointwise convergence on [0, 1], the monad of a real-valued function f consists of all internal ∗ R-valued functions g on ∗ [0, 1] such that at each standard x, g(x) f (x). Proposition 3.1.8 A set O ⊆ X is open if and only if for each x ∈ O, μ(x) ⊆ ∗ O. Proof First we note that for each U ∈ Bx , μ(x) ⊂ ∗ U . On the other hand, if μ(x) ⊆ ∗ O, then ∃W ∈ ∗ Bx with W ⊆ μ(x) ⊆ ∗ O, and so “∃W ∈ Bx with W ⊆ O” must also be true for the standard superstructure by downward transfer.
82
P.A. Loeb
Problem: Show that the elements of the neighborhood filter base for pointwise convergence of real-valued functions on [0, 1] are open sets. Answer: Let the basic open set U ∈ B f be given by a finite number of points r1 , . . . , rn in the interval [0, 1] and an ε > 0. That is, g ∈ U if for 1 ≤ i ≤ n, |g(ri ) − f (ri )| < ε. Given such a g, find δ > 0 such that for each i, |g(ri ) − f (ri )|+ δ < ε. The basic open set V ∈ Bg determined by r1 , . . . , rn and δ is contained in U . Problem: Let T be the collection of all open sets in X . Use monads to show that T has the following properties: (i) The space X and the empty set ∅ are open. (ii) Finite intersections and arbitrary unions of open sets are open. Answer: If x ∈ X , then μ(x) ⊂ ∗ X , and there is no x ∈ ∅. Part ii follows from that fact that for a finite collection of sets O1 , . . . , On , we have ∗ O1 ∩ · · · ∩ ∗ On = ∗ (O ∩ · · · ∩ O ), and for an arbitrary collection of sets O , the nonstandard exten1 n α sion of each member, ∗ Oα , is contained in the nonstandard extension of the union of the members. Different bases, such as open balls or open rectangles in the plane, give rise to the same topology just as they can give rise to the same monads of standard points. Given a topology T , one can define a maximal base and the monad for the topology at each x ∈ X by setting Bx = {U ∈ T : x ∈ U }, μ(x) =
∗
U.
U ∈T x∈U
Fix an open base Bx at each x ∈ X . Given A ⊆ X , and x ∈ X , we say that x is a point of closure of A if for each U ∈ Bx , U ∩ A = ∅, or what is the same thing, ¯ μ(x) ∩ ∗ A = ∅. We write A¯ for the set of points of closure of A. Clearly, A ⊆ A. ¯ A set A is called closed if A = A. It is easy to establish the following result; the proofs are left as an exercise. Proposition 3.1.9 A set A is closed if and only if its complement X \A is open. Proposition 3.1.10 The set X and the empty set ∅ are closed. Moreover, finite unions and arbitrary intersections of closed sets are closed. Recall that If D ⊆ X and D¯ = X , then we say that D is a dense subset of X . If X contains a countable dense subset, we say that X is separable. For example, the rational numbers are dense in R. Definition 3.1.11 The interior of a set E is the set of all x for which there is a set U ∈ Bx with U ⊆ E. Equivalently, it is the set of all x such that μ(x) ⊆ ∗ E. We write E ◦ for the interior.
3 Topology and Measure Theory
83
3.2 Continuous Mappings Now assume that (X, S) and (Y, T ) are two topological spaces. Definition 3.2.1 A function f from X into Y is continuous at x ∈ X if for each V ∈ T with f (x) ∈ V , there is a U ∈ S with x ∈ U such that f [U ] ⊆ V . Equivalently, we want ∗
f [monadS (x)] ⊆ monadT [ f (x)],
that is, z x ⇒ ∗ f (z) f (x). We say that f is continuous or continuous on X if it is continuous at each x ∈ X . Problem: Show these standard and nonstandard conditions are equivalent. Answer: If f is continuous at x then μ(x) maps into ∗ V for each open V ∈ B f (x) , so μ(x) maps into μ( f (x)). If the converse holds, then for any V ∈ B f (x) , there is a W ∈ ∗ Bx with W ⊆ μ(x), and W maps into ∗ V , so downward transfer gives the desired result. Theorem 3.2.2 A function f from X into Y is continuous on X if and only if for each open set O contained in Y , f −1 [O] is open in X . A function f from A ⊆ X into Y is continuous at each point of A if and only if for each V ∈ T , there is a U ∈ S such that f −1 [V ] ∩ A = U ∩ A. Proof Exercise Recall that if there is a one-to-one map from (X, S) onto (Y, T ) which is continuous and its inverse is continuous, then we say these two spaces are topologically equivalent or homeomorphic.
3.3 Convergence Assume we are given a topological space X and an open base Bx at each x ∈ X . (For a metric space, the open base at x is the set of open balls centered at x.) Recall that a sequence xn converges to a point x ∈ X if it is eventually in each U ∈ Bx . That is, there is an m ∈ N such that for all n ≥ m, xn ∈ U . Proposition 3.3.1 A sequence xn converges to a point x ∈ X if and only if for every H ∈ ∗ N∞ , x H x. Proof Exercise. Recall that a point x is a cluster point of a sequence xn if the sequence xn is frequently in every U ∈ Bx . That is, for every m ∈ N, there is an n ≥ m with xn ∈ U .
84
P.A. Loeb
Proposition 3.3.2 A point x is a cluster point of a sequence xn if and only if there is an H ∈ ∗ N∞ with x H x. Proof Exercise. A generalization of sequential convergence, used even in such simple settings as Riemann integration theory, replaces the natural numbers with a directed set. A directed set D is a set supplied with a transitive relation such that for each a and b in D, a a, b b, and there is a c ∈ D with a c and b c. Partitions play the role of the directed set in Riemann integration theory. The generalization of a sequence is given by a net. This is a function xa from a directed set into a given topological space X . It converges to x ∈ X if for all U ∈ Bx , there is a c ∈ D such that for all a c, xa ∈ U . That is, the net is eventually in U . A point x is a cluster point of a net xa if for each U ∈ Bx and each c ∈ D, there is an a c with xa ∈ U . That is, the net is frequently in U . The notion of a “subnet” is fairly complicated (see [23]). Problem: Show that if D is a standard directed set, then there is a c ∈ ∗ D with ∗ a c for each standard a ∈ D. Answer: Imbed the set {∗ a : a ∈ D} in a hyperfinite subset of ∗ D. Problem: Give the nonstandard criterion for a net to converge and for a point to be a cluster point of a net. Answer: The conditions are similar to the conditions for sequences.
3.4 More on Topologies One often has two topologies, T1 and T2 , where every set open for the first is open for the second; i.e., T1 ⊆ T2 . We say T1 is weaker or coarser than T2 and T2 is stronger or finer than T1 . It is clear that the weaker the topology, i.e., the fewer open sets there are, the larger the monads will be. The collection consisting of just X and the empty set is the weakest possible topology on X . The full power set of X is the strongest possible topology, and this contains any given collection of sets. It follows that any collection C of subsets of X generates a topology, namely, the intersection of all topologies containing the collection. This is the weakest topology containing C. We also have the notion of a base for a topology rather than a base at each point. A base B for a topology is a collection of open sets such that for each open set O and each x ∈ O, there is a U ∈ B with x ∈ U ⊆ O. Given a base, it is easy to see that a set is open if and only if it is a union of sets from the base. It is also easy to see that B is a base if and only if for each x ∈ X , {U ∈ B : x ∈ U } is an open base at x. This open base at x may not be the one you would choose. For example, the disks in the plane with rational centers and rational radii form a base, but some points are no longer centers of balls in this base.
3 Topology and Measure Theory
85
There are two topologies on the nonstandard extension ∗ X of a topological space that are important in the literature. The first, called the Q-topology by Robinson, has a base consisting of the sets in ∗ T . That is, each internal open set is open in this topology, but in general, there are external sets that are also open in this topology. The second important topology, called the S-topology by Robinson, has base B = {∗ U : U ∈ T }. Notice that if x is standard and y ∈ μ(x), then every S-open set that contains x also contains y. Thus, in general, the S-topology is not “Hausdorff.” Recall that a topological space is called a Hausdorff space if distinct points are contained is disjoint open sets, or what is the same thing, distinct standard points have disjoint monads. A topological space is said to satisfy the first axiom of countability if each point has a countable base Bx . A topological space is said to satisfy the second axiom of countability if there is a countable base for the topology. In a metric space, balls of radii 1/n form a base at each point. There may, however be no countable base for the topology. In Euclidean n-space, balls with rational centers and rational radii form a countable base for the topology. For a Hausdorff space, we have the important standard part map st : ns(∗ X ) −→ X defined as follows: For each x ∈ X and each y ∈ μ(x), set st(y) = ◦ y = x. Even for a non-Hausdorff space, we can define the standard part of a set B ⊆ ∗ X , by setting st(B) := {x ∈ X : μ(x) ∩ B = ∅}. Theorem 3.4.1 (Luxemburg [30]) Assume that card(T ) < κ and we are working with a κ-saturated enlargement. Then for each internal set B ⊆ ∗ X , st(B) is closed. Proof Assume that z ∈ X is a point of closure of st(B); we must show that z ∈ st(B), i.e., μ(z) ∩ B = ∅. Given U ∈ T with z ∈ U , ∃x ∈ U ∩ st(B). Since x ∈ st(B), ∃y ∈ μ(x) ∩ B ⊆ ∗ U ∩ B. Since ∗ U ∩ B = ∅ for each open U containing z, it follows from saturation that μ(z) ∩ B = ∅. Note that for the last proof, one can use just ℵ1 -saturation if there is a countable base Bx at each point x ∈ X .
3.5 Compact Spaces Recall that an open covering of a set A ⊆ X is a collection of open sets in X such that each point of A is in at least one of the open sets. A subset A ⊆ X is called compact if every open cover has a finite subcover. Of course, A may be all of X . By DeMorgan’s law, X is compact if and only if every family of closed sets with the finite intersection property (this means that every finite subset has a nonempty intersection) itself has a nonempty intersection. The nonstandard extension of a finite set {a1 , . . . , an } is just the equivalent finite set {∗ a1 , . . . , ∗ an }. Compactness generalizes finiteness. Robinson’s criterion for
86
P.A. Loeb
compactness makes this clear. One does not have just the nonstandard extension of the original points in a nonstandard extension, but one does not get very far from the original points. Theorem 3.5.1 (Robinson [34]) A set A ⊆ X is compact if and only if for each y ∈ ∗ A, there is an x ∈ A with y ∈ μ(x). In particular, X is compact if every point of ∗ X is nearstandard. Proof Assume A is compact but there is a y ∈ ∗ A not in the monad of any x ∈ / ∗ Ox . The family A. Then each x ∈ A is contained in an open set Ox with y ∈ {Ox : x ∈ A} covers A and therefore has a finite subcover {O1 , . . . , On }. Now n O , ∗ A ⊆ ∪n ∗ O . Since y ∈ / ∗ Oi , for 1 ≤ i ≤ n, y ∈ / ∗ A. This since A ⊆ ∪i=1 i i i=1 ∗ contradiction shows that if X is compact, then every point of X is nearstandard. Now assume that A is not compact, i.e., there is a collection U = {Oδ : δ ∈ D} of open sets such that no finite subcollection covers A. Let B be a hyperfinite collection / V for in ∗ U with ∗ Oδ ∈ B for each δ ∈ D. Then there is a y ∈ ∗ A such that y ∈ each V ∈ B. This point y is not in the monad of any point in A since for each x ∈ A, / ∗ Oδ , whence y ∈ / μ(x). there is an δ with x ∈ Oδ but y ∈ Recall that a set in a metric space is bounded if it is contained in a ball B(a, R) about some point of the space. Theorem 3.5.2 (Heine-Borel) A compact subset A of a metric space is closed and bounded. A closed and bounded subset of Rn is compact. Proof Exercise. Recall that a topological space (X, T ) is called regular if for each x ∈ X , the singleton set {x} is closed, and for each closed set C with x ∈ / C, there are disjoint open sets U and V with x ∈ U and C ⊆ V . That is, singleton sets are closed, and for each x and open U containing x there is an open set W with x ∈ W ⊆ W ⊆ U . Theorem 3.5.3 (Luxemburg [30]) Assume that we are working in a κ-saturated enlargement, and let (X, T ) be a regular space with card(T ) < κ. If B is an internal set of nearstandard points in ∗ X , then st(B) is compact. Proof By Theorem 3.4.1, C := st(B) is closed in X . Let {Fα } be a collection of closed subsets of C such that every finite subcollection has a nonempty intersection. For n F, each finite subcollection Fα1 , . . . , Fαn there is a b ∈ B with x := st(b) ∈ ∩i=1 i whence, if U is an open subset of X with U ∩ Fi = ∅ for i = 1, . . . , n, then b ∈ / ∗U , since otherwise x ∈ U . By saturation, there is a b0 ∈ B such that for each finite subcollection of the family {Fα } and each open set U whose closure misses each set in that finite subcollection, b0 ∈ / ∗ U . Fix x ∈ X with b0 ∈ μ(x). We then have x ∈ Fα for each α since if there is an α with x ∈ / Fα , then there is an open neighborhood U / ∗ U . But this is contradicts the assumption that of x with U ∩ Fα = ∅, whence b0 ∈ b0 ∈ μ(x).
3 Topology and Measure Theory
87
Problem: Complete the following alternative proof of Theorem 3.5.3: Let C := st(B), and fix y ∈ ∗ C. If U is a standard open set with y ∈ ∗ U , then since ∗ U ∩ ∗ C = ∅, U ∩C = ∅. If x ∈ U ∩C, then by the definition of C, ∃b ∈ B with b ∈ μ(x) ⊂ ∗ U . Thus, for each standard open U with y ∈ ∗ U , there is a b ∈ B ∩ ∗ U . By saturation, ∃b0 ∈ B with b0 ∈ B ∩ ∗ U for each standard open U such that y ∈ ∗ U . Fix x ∈ C with b0 ∈ μ(x). You must show that y ∈ μ(x). Answer: If y ∈ / μ(x), then there is an open set V with x ∈ V and y ∈ / ∗ V ; by regularity there is an open set U with x ∈ U ⊆ U ⊆ V . It follows that x ∈ U and y ∈ X \ ∗ U , whence, b0 ∈ ∗ U but b0 ∈ X \∗ U . This is a contradiction. Theorem 3.5.4 If K ⊆ X is compact and f : K −→ Y is continuous at each point of K , then the image f [K ] is compact in Y . Proof Exercise. Corollary 3.5.5 A real-valued function continuous on a compact set K achieves a maximum and a minimum value at points of K . Suppose that (X, T ) is a topological space and B is a base for the topology. It is easy to see that X is compact if any covering by sets from B has a finite subcovering. Theorem 3.5.6 Let X be a topological space. If X is compact then every sequence in X has a cluster point. The converse holds if X satisfies the second axiom of countability, that is, if X has a countable base for the topology. Proof Suppose xn is a sequence in X , and X is compact. Then for any H ∈ ∗ N∞ , if x H ∈ μ(x), x is a cluster point of the sequence. For the converse, if X has a countable base, one needs only show that every decreasing sequence of closed sets has a nonempty intersection. Choosing a point in each of the closed sets produces a sequence; any cluster point is in the intersection of the sequence of closed sets. Recall that the S-topology is the topology on ∗ X with base {∗ U : U ∈ T }. Salbany and Todorov [35] have shown that the following fact is important when discussing compactifications. Theorem 3.5.7 The S-topology makes ∗ X compact. Proof Exercise. Problem: The setting of this problem is the d-dimensional normed vector space Rd supplied with a norm that is not necessarily the Euclidean norm. Let B(0, 2) denote the closed ball of radius 2 about the origin 0 in Rd . Let K (s) denote the maximum number of points one can “pack” into B(0, 2) when one point is at 0 and the distance between pairs of distinct points is at least s. It is known, for example, that for the Euclidean norm in the plane R2 , K (1) = 19 (see [5, 32]). Prove or disprove the following conjecture: For any δ > 0 in R, K (1 − δ) ≥ K (1) + 1. (The result turned out to be crucial in work on the Besicovitch Covering Theorem in [15].)
88
P.A. Loeb
Answer: Consider what happens when δ 0. Problem: The “mushroom space” is an example of a nonregular space where things “go wrong”. Here, X is the unit square {(x, y) : 0 ≤ x ≤ 1, 0 ≤ y ≤ 1}. The topology is generated by Euclidean neighborhoods except for points (x, 0). Here a typical neighborhood consists of (x, 0) together with the set {(ζ, y) : (x −ζ)2 + y 2 < r 2 , y > 0} for some r > 0. The restriction of this topology to the set L = {(x, 0) : 0 ≤ x ≤ 1} is the discrete topology, i.e., every subset is open. The point (1/2, 0) cannot be separated from its closed complement {(x, 0) : x = 1/2, 0 ≤ x ≤ 1} in L. Let B be the set {(ζ, ε) : ζ ∈ ∗ [0, 1]}, where ε is a positive infinitesimal. Show that B provides a counter example to Theorem 3.5.3.
3.6 Product Spaces The topology of pointwise convergence on [0, 1] can be generalized as follows. Instead of [0, 1], we take an arbitrary index set I. Instead of associating the real line with each α ∈ I, we let X α be a topological space. For the development here, we assume that there is some set S in our standard superstructure containing all of these topological spaces X α . That is, there is an upper bound to the level at which they occur in the superstructure. Now the point set α∈I X α is the set of all functions f on I with f (α) ∈ X α for each α ∈ I. The monad of such an element f consists of all internal g ∈ ∗ α∈I X α with g(α) f (α) for each standard α ∈ I. Such a g is a mapping on ∗ I with g(β) ∈ X β for each β ∈ ∗ I, but the values of g at nonstandard indices are not relevant in determining whether g is in the monad of f . The space α∈I X α is called a product space, and the topology is called the product topology. Theorem 3.6.1 The product of Hausdorff spaces is Hausdorff. Proof Exercise. The nonstandard proof of the following important theorem is considerably simpler than standard proofs in the literature. Theorem 3.6.2 (Tychonoff) The product of compact spaces is compact. Proof If X = α∈I X α and g ∈ ∗ X , then for each standard α ∈ I, there is an xα ∈ X α with g(α) xα . (The xα ’s are unique if the spaces X α are Hausdorff.) The element f ∈ X with f (α) = xα for each α ∈ I is in X and g ∈ μ( f ).
3.7 Relative Topologies Let X be a space with a topology, and let A be a subset of X . Recall that one can restrict the topology on X to A by calling a subset of A open if it is the intersection of A with an open subset of X . Thus, if Bx is a base of open sets for the original
3 Topology and Measure Theory
89
topology and x ∈ A, the collection {U ∩ A : U ∈ Bx } is a base at x for the restricted topology. If X is a metric space, the restriction of the metric to pairs of points in A gives the appropriate metric on A. In all of this, one just ignores points outside of A. One does have to recall that an open subset of A in the restricted topology, e.g., A itself, may not be open in all of X . One also speaks of the relative topology on A. Problem: Describe the relatively open subsets of A in terms of monads for the topology on X .
3.8 Uniform Continuity and Uniform Spaces So far we have discussed ways to determine closeness to any given point of a space. If the space is supplied with a norm, a metric, or even a family of semimetrics, then we can consider in a “uniform way” closeness for all pairs of points in the space. We start with a simple extension of results that have already been established for the real line. Theorem 3.8.1 A map f from a set A contained in a metric space (X, d) into a metric space (Y, ρ) is uniformly continuous on A if and only if for all x, y ∈ ∗ A, with x y, ∗ f (x) ∗ f (y). Proof Assume x y ⇒ ∗ f (x) ∗ f (y). Pick ε > 0 in R. Then the sentence (∃δ ∈ R+ )(∀x, y ∈ A)[d(x, y) < δ ⇒ ρ( f (x), f (y)) < ε] holds for the nonstandard extension of the structure and therefore for the original structure. The converse proof is similar to the proof for R. Theorem 3.8.2 A continuous map f from a compact set A contained in a metric space (X, d) into a metric space (Y, ρ) is uniformly continuous on A. Proof Assume x y in ∗ A. Let z = st x = st y. Since A is compact, z ∈ A, and so f (x) f (z) ∗ f (y).
∗
To amplify the notion of uniform closeness, we now turn to the topic of uniform spaces. The rest of this section is compiled from a draft by Manfred Wolff. Let P be a set of semimetrics on a nonempty set X . For each ρ ∈ P and for every ε > 0 set Bε,ρ = {(x, y) ∈ X × X : ρ(x, y) < ε}. The collection of finite intersections of these sets form the base of a filter U = U(P) on X × X . That is, given ρ ∈ P and ε > 0, since ρ(x, x) = 0, Bε,ρ = ∅. Moreover, the family of finite intersections of the sets Bε,ρ is, by definition, stable under the operation of taking finite intersection. The collection U(P) is the collection of supersets of sets in the filter base.
90
P.A. Loeb
Given any filter U on X × X , for each U ∈ U, we will write U −1 for the set {(y, x) ∈ X × X : (x, y) ∈ U }, and we will write U ◦ W for the set {(x, z) ∈ X × X : ∃y ∈ X with (x, y) ∈ W and (y, z) ∈ U }. We let denote the diagonal {(x, x) : x ∈ X }. Different sets of semimetrics may define the same filter U(P). The following definition characterizes the filter. Definition 3.8.3 Let U be a filter on X × X satisfying the following properties: (a) U ∈ U ⇒ ⊆ U. (b) U ∈ U ⇒ U −1 ∈ U. (c) U ∈ U ⇒ ∃V ∈ U with V ◦ V −1 ⊆ U. Then U is called a uniformity, and the pair (X, U) is called a uniform space. Problem: Let P be a set of semimetrics on a nonempty set X . Show that the filter U(P) is a uniformity. Remark 3.8.4 It can be shown, that to every uniformity V there is a set P of semimetrics such that V = V(P). The monad of a filter is the intersection of the nonstandard extensions of sets in a generating filter base. For U = U(P), the monad μU = {(x, y) ∈
∗
X × ∗ X : ∗ ρ(x, y) 0 for all standard ρ ∈ P}.
The monad μU is an equivalence relation on ∗ X . That is, for each x, y, z in ∗ X and each standard ρ ∈ P, ∗ ρ(x, x) = 0, ∗ ρ(x, y) 0 ⇒ ∗ ρ(y, x) 0, and ∗ ρ(x, y) 0 and ∗ ρ(y, z) 0 ⇒ ∗ ρ(x, z) 0. Problem: Let X be a nonempty set, and let W be a filter on X × X . Recall that the monad of W is the set {(x, y) ∈ ∗ X × ∗ X : ∀ W ∈ W, (x, y) ∈ ∗ W }. Show that W is a uniformity if and only if its monad is an equivalence relation on ∗ X . With every uniformity U on X there is an associated a topology TU . That is, for each x ∈ X , the neighborhood filter base consists of the sets Bx,U = {y ∈ X : (x, y) ∈ U }, with one such set for each U ∈ U. The corresponding monad μ (x) at x is {y ∈
∗
X : (x, y) ∈ ∗ U for all standard U ∈ U}.
If U = U(P), then μ (x) = {y ∈
∗
X : ∗ ρ(x, y) 0 for all standard ρ ∈ P}.
For the rest of this section, we will assume that U = U(P) and topological considerations refer to TU . The proof of the following result is left to the reader.
3 Topology and Measure Theory
Proposition 3.8.5 The topology TU is Hausdorff if and only if
91
U ∈U
U = .
Example 3.8.6 The seminorms described in Example 3.1.1 are, of course, seminorms that produce a uniformity and a corresponding topology. Particular examples are the topology for pointwise convergence and the topology for uniform convergence on compact sets on a space of continuous functions. Problem: For every continuous function f on the unit interval [0, 1] define the semimetric ρ f (x, y) = | f (x) − f (y)|. Show that this set of semimetrics defines the same uniformity as the single metric d(x, y) = |x − y|. Let f be a mapping from a set X into a set Y . We will write f × f for the mapping X × X → Y × Y , where f × f (u, v) = ( f (u), f (v)). Definition 3.8.7 Let (X, U) and (Y, V) be two uniform spaces. A mapping f : X −→ Y is called uniformly continuous if for every V ∈ V there exist U ∈ U with f × f (U ) ⊆ V . Problem: Show that f is uniformly continuous if and only if ∗ f × ∗ f (μU ) ⊆ μV , i.e., if and only if ∗ f (x) ∗ f (y) whenever x y. Problem: Show that a uniformly continuous mapping is continuous for the corresponding topologies. Problem: Let (X, T ) be a compact topological space, and let C(X ) be the space of continuous real-valued functions on X . (1) For every f ∈ C(X ), Let ρ f (x, y) = | f (x) − f (y)|. Show that the set P = {ρ f : f ∈ C(X )} induces a uniformity U that in turn induces the given topology. (2) Let V be an arbitrary uniformity on X inducing the given topology T . Let (Y, W) be another uniform space. Show that every continuous mapping f : (X, V) −→ (Y, W) is uniformly continuous. Hint: Recall the proof of Theorem 3.8.2. (3) Let V be an arbitrary uniformity on X inducing the given topology. Then V = U of part (1). Hint: consider the identity I on X , and apply (2) in both directions, that is, to I and to I −1 . We will write UT for this unique uniformity. Definition 3.8.8 Let (X, U) be a uniform space. (a) A filter C on X is called a Cauchy filter if to every U ∈ U there exists a C ∈ C such that C × C ⊆ U . (b) Let A be an upwards directed set and (xα )α∈A be a net in X . The net is called a Cauchy net if the filter of sections of the net is a Cauchy filter. Problem: Prove the following assertions: (a) C is a Cauchy filter if and only if for all x, y in the monad μC of C x y holds. (b) (xα )α∈A is a Cauchy net if and only if xα xβ for all α, β that are infinitely large, that is greater than every standard α ∈ A. (c) Every convergent filter is a Cauchyfilter.
92
P.A. Loeb
(d) If C is a Cauchy filter in (X, U) and if f is a uniformly continuous mapping from (X, U) to another uniform space (Y, V), then the filter f (C) is a Cauchy filter in (Y, V). (e) Let (X, T ) be a compact Hausdorff space, equipped with its unique uniformity UT . Show that every Cauchy filter is convergent. Definition 3.8.9 A uniform space (X, U) is called complete if every Cauchy filter is convergent. Remark 3.8.10 To every uniform space (X, U) there exist another uniform space (Y, V) that is complete, and an injective uniformly continuous mapping f from (X, U) onto a dense subset of Y . The space (Y, V) is uniquely determined up to obvious isomorphisms and is called the completion of (X, U). A construction of the completion via nonstandard analysis may be found in [16].
3.9 Nonstandard Hulls If one takes the finite (i.e., limited) nonstandard rational numbers modulo the infinitesimal ones, i.e., [∗ Q ∩ Fin(∗ R)]/[∗ Q ∩ μ(0)], one obtains the real numbers R. A similar construction, applied to infinite dimensional spaces, produces spaces that are new. These are the subject of this section. The development of these new spaces was initiated first by W.A.J. Luxemburg [30] and then C.W. Henson and L.C. Moore [18]. Throughout this section, we will assume we are working with an ℵ1 -saturated enlargement of our standard structure. We let (X, ρ) be a metric space. The set Fin(∗ X ) consists of the finite, that is, limited elements of ∗ X . These are the elements y ∈ ∗ X for which ∗ ρ(y, x) is limited for some, and therefore every x ∈ X . The relation is an equivalence relation on Fin(∗ X ). Let Xˆ denote the set of equivalence classes {μ(x) : x ∈ Fin(∗ X )}. Here, of course, μ(x) = {y ∈ ∗ X : y x}. Note that x need not be a standard point. The space Xˆ forms a metric space when supplied with the metric ρ, ˆ where ρ(μ(x), ˆ μ(y)) = st(∗ ρ(x, y)). It is easy to see that ρˆ is well defined and a metric. For example, ρ(μ(x), ˆ μ(y)) = 0 iff x y iff μ(x) = μ(y). Definition 3.9.1 The space ( Xˆ , ρ) ˆ is called the nonstandard hull of (X, ρ). Theorem 3.9.2 If follows from ℵ1 -saturation that ( Xˆ , ρ) ˆ is complete. Proof Let μ(ai ), i ∈ N be a Cauchy sequence in Xˆ , and pick a sequence ai : i ∈ N of representatives. For each k ∈ N, ∃n(k) so that if i ≥ n(k) and j ≥ n(k), ∗ then ρ(μ(a ˆ i ), μ(a j )) < 1/k, whence ρ(ai , a j ) < 1/k. We may assume that n(k + 1) > n(k) for all k. Now use ℵ1 -saturation to extend the sequence ai : i ∈ N to an internal sequence ai : i ∈ ∗ N. It follows from the Spillover Principle that for each k ∈ N, one may pick an infinite integer η(k) such that for n(k) ≤ i ≤ η(k) and n(k) ≤ j ≤ η(k), ∗ ρ(ai , a j ) < 1/k. We may assume the η(k) are decreasing. Extend the sequence {η(k) : k ∈ N} to ∗ N. Again, by spillover, there is an infinite
3 Topology and Measure Theory
93
integer γ with n(k) ≤ γ ≤ η(k) for all k ∈ N. It follows that for all k ∈ N and all standard i ≥ n(k), ∗ ρ(ai , aγ ) < 1/k, whence μ(ai ) → μ(aγ ). The mapping x → μ(∗ x) is clearly an isometry (a distance preserving map which must, therefore, be one-to-one) of (X, ρ) into ( Xˆ , ρ). ˆ If we start with a Banach space (X, · ), the resulting standard Banach space Xˆ , built from ∗ X is called the = st(∗ x). This nonstandard hull of X . Here, ˆ· is defined by setting μ(x) construction is equivalent to the “Banach Space Ultrapower” construction, but one has the internal structure to help in the development. In general, there are new elements in Xˆ . For example, if X = 2 , then the function on ∗ N that is 1 at some fixed η ∈ ∗ N∞ and 0 elsewhere has norm 1, but it is not in the monad of any standard sequence. If X = 1 , then the function that is 1/η on {n ∈ ∗ N : 1 ≤ n ≤ η} and 0 elsewhere has norm 1 and is not in the monad of any standard sequence.
3.10 Compactifications A treatment of general compactifications, employing the treatment in [19], will be given in the chapter of this book by Matt Insall, Małgorzata Marciniak, and the author of this chapter. It uses the ideas of Salbany and Todorov [35], though related ideas have been in the nonstandard analysis literature since the initial work of Robinson [33]. The basic construction fixes a regular topological space (X, T ), and lets ∗ X have the S-topology, i.e., the topology generated by the base consisting of extensions of standard open sets. The S-topology makes ∗ X into a compact space. Using equivalence classes of remote points to form the additional points for the compactification, one obtains a large class of compactifications including end-compactifications and ˇ the Stone-Cech compactification. The space need only be regular for the construction to work.
3.11 The Base and Antibase Operators In this section, we discuss a part of joint work of the author with Jürgen Bliedtner of Frankfurt University [8]. The section deals with base operators and the topologies these operators generate. Applications in [8] not discussed here include upper and lower densities, approximate continuity, and a new construction for liftings of bounded measurable functions in the sense of [20]. We work with a set X and an algebra A of subsets of X ; we may have A = P(X ). Given set mappings s and t : A → P(X ), we write s t if for each A ∈ A, s(A) ⊆ t (A). For this section, we fix a nonstandard enlargement of a structure containing X and the real numbers. The two important operators we consider have very simple definitions, where we use A to denote the complement of a set A.
94
P.A. Loeb
Definition 3.11.1 A base operator, or just a base, on an algebra A of subsets of X is a mapping b : A → P(X ) such that b(∅) = ∅, b(X ) = X, and ∀A, B ∈ A, b(A ∪ B) = b(A) ∪ b(B). We will call e an antibase operator, or just an antibase, on A if e : A → P(X ) has the property that e(∅) = ∅, e(X ) = X, and ∀A, B ∈ A, e(A ∩ B) = e(A) ∩ e(B). The antibase e constructed from a base b by setting e(A) = (b(A)) is called the antibase associated with b. We have a similar definition for the base associated with an antibase. For each base operator b, there is a corresponding topology, the b -topology, which we will discuss below (before and in Proposition 3.11.10). Problem: Show that a base b and antibase e are increasing set mappings, and if associated, e b. Example 3.11.2 An example of an antibase is the mapping e on Rn that sends each Lebesgue measurable set A to its points of density with respect to ball differentiation using Lebesgue measure λ. That is, x ∈ e(A) iff λ (A ∩ B(x, r )) = 1. r →0+ λ (B(x, r )) lim
The associated topology is the density topology. Example 3.11.3 A base operator b : P(X ) → P(X ) occurs in potential theory when X is an open domain in Rn or a more general potential theoretic setup. For any set A ⊆ X , the base b(A) is defined by setting b(A) := {x ∈ X : A is not thin at x} . The complement of the Lebesgue spine, for example, is thin at its point. The associated topology is the fine topology. This is the coarsest topology refining the usual one so that finite superharmonic functions are continuous. Definition 3.11.4 We call any mapping E : X → P(∗ X )\ {∅} a base generating function. We write E x for the nonempty (possibly external) set that is the image of x. The operators b and e on P(X ) given by b(A) := x ∈ X : E x ∩ ∗ A = ∅ and e(A) := x ∈ X : E x ⊆ ∗ A are called the base and antibase on P(X ) generated by E. Example 3.11.5 A simple example of a base generating function is obtained by starting with a topological space (X, T ) and letting E x be the monad of x for each
3 Topology and Measure Theory
95
x ∈ X . In this case, e(A) is the interior of A and b(A) is the closure of A for each A ⊆ X. Remark 3.11.6 Given a base generating function, it is easy to check that b and e are in fact a base and antibase associated with each other on the algebra P(X ). As we now show, every base and antibase on an algebra A can be obtained from a base generating function. This gives an automatic and easy extension of these operators to the full power set of X . Theorem 3.11.7 Fix an algebra A of subsets of X together with a base operator b and the corresponding antibase operator e on A. For each x ∈ X , we set E x :=
∗
A : A ∈ A, x ∈ e (A) .
The mapping E is a base generating function on X , and the corresponding base and antibase generated by E on the full power set of X are extensions of b and e. Proof If x ∈ e(A) and e(B), then x ∈ e(A ∩ B), so A ∩ B = ∅. It follows that E x = ∅. If x ∈ e(A), then by definition, E x ⊆ ∗ A. If E x ⊆ ∗ A, then there is an internal set B ∈ ∗ A with B ⊆ E x and x ∈ ∗ e(B) ⊆ ∗ e(A), whence x ∈ e(A). It follows that for each A ∈ A, e(A) = {x ∈ X : E x ⊆ ∗ A}. Moreover, for each A ∈ A, x ∈ b(A) ⇔ x ∈ / e(A) ⇔ E x ∩ ∗ A = ∅. Remark 3.11.8 If the antibase e is originally defined on A using a base generating function F, then for each x ∈ X , Fx ⊆ E x . The theorem shows that both F and E generate the same base b and antibase e on A. We will call E the base generating function associated with b and e on A. Example 3.11.9 For the example of measure differentiation, for each x ∈ X , the set E x is the intersection of all sets ∗ A such that x is a point of density of A. This is a subset of the monad of x. Given a base generating function E, we can extend the operators b and e to bounded real-valued functions f by setting b E ( f )(x) := sup st ∗ f (z) : z ∈ E x , e E ( f )(x) := inf st ∗ f (z) : z ∈ E x . In a potential theoretic setting, B. Fuglede [14], modifying a result of Doob, found the following way to extend the base operator b to nonnegative bounded functions: b( f )(x) = sup {t ≥ 0 : x ∈ b ({ f ≥ t})} . Problem: Show that for each set A ⊆ X , 1b(A) = b E (1 A ) and 1e(A) = e E (1 A ).
96
P.A. Loeb
We now fix an algebra A of subsets of X together with a base operator b and the associated antibase operator e on A. We let E be the associated base generating function. We also let b and e denote the extensions of b and e to the full power set of X . The b-topology, described in [29], is given by the following characterization: A set O ⊆ X is open in the b-topology iff O ⊆ e(O). Equivalently, a set F ⊆ X is closed in the b-topology iff b(F) ⊆ F. It is easy to see that this definition does in fact produce a topology on X . Proposition 3.11.10 A set O ⊆ X is open in the b-topology iff for each x ∈ O, E x ⊆ ∗ O. Proof Assume O is open in the b-topology. If x ∈ O, then x ∈ e(O), so E x ⊆ ∗ O. Conversely, assume that for each y ∈ O, E y ⊆ ∗ O. Then for any given x ∈ O, there is an internal set B ∈ ∗ A with B ⊆ E x ⊆ ∗ O and x ∈ ∗ e(B) ⊆ ∗ e(O). It follows that x ∈ e(O). This shows that O ⊆ e(O), i.e., O is open in the b-topology. A base operator b is called strong on A if b(b(A)) ⊆ b(A) ∀A ∈ A, or equivalently, e(e(A)) ⊇ e(A) ∀A ∈ A. For measure differentiation, topology, and potential theory, the base operators described above are strong. Theorem 3.11.11 Assume that b is a strong base operator on A. Then for each x ∈ X , the monad of x in the b-topology is E x ∪ {x}. Proof Fix x ∈ X . By Proposition 3.11.10, E x ∪ {x} =
∗
∗ A ∪ {x} : A ∈ A, x ∈ e(A) ⊆ O : O open, x ∈ O .
We must show that if b is strong, the above set containment can be replaced with equality. Given A ∈ A with x ∈ e (A), the set A ∩ e(A) is an open subset of A since e (A ∩ e(A)) = e(A) ∩ e(e(A)) = e(A) ⊇ A ∩ e(A). Moreover, as we have shown, e (A ∩ e(A)) = e(A), so x ∈ e (A ∩ e(A)). It follows that E x ⊆ ∗ (A ∩ e(A)), whence (A ∩ e(A)) ∪ {x} is an open neighborhood of x. With additional structure, the base and antibase operators are called upper and lower densities, and the b-topology is called an abstract density topology (see Sect. 6E of [29]). See [8] for applications of nonstandard analysis in this setting. Our observations as applied to the density monad generalize results in Proposition IV.4 and its preceding remark that were established by Wattenberg for measure differentiation using intervals on [0, 1] in [40]. In [12], Eifrig used a rapid ultrafilter and a mapping x → Bx , where Bx is an internal subset of our set E x , in a paper establishing a lifting for measurable sets. The existence of a rapid ultrafilter can be shown using the Continuum Hypothesis or Martin’s Axiom. In [8], The author and Jürgen Bliedtner used the mapping x → E x together with basic nonstandard analysis to establish the existence of a strong lifting for the space of bounded measurable functions on a measure space such that the measure is finite.
3 Topology and Measure Theory
97
3.12 Measure and Probability Theory For the rest of this chapter, we assume familiarity with standard measure theory. We give a quick overview of the original construction from [24] of standard measure spaces on nonstandard models (now called “Loeb spaces”). A full introduction to this theory, starting from first principles, is given in Horst Osswald’s chapters in this book. We will use his notation for the measure spaces considered here. The principal device used in nonstandard measure theory is ℵ1 -saturation. We will always assume that we are working with an ℵ1 -saturated enlargement of a standard structure containing the real numbers R. We include two applications from the literature not discussed in other chapters of this book. The first is a special case of an approach by the author and J. Bliedtner in [7] to the martingale convergence theorem; the second deals with a construction of representing measures in potential theory initiated in [25] and [26]. Working in an ℵ1 -saturated enlargement, we can construct a hyperfinite set as the set of elementary outcomes in a conceptual experiment in the “nonstandard world”. For coin tossing, for example, can be the set of internal sequences of −1’s and 1’s of length η ∈ ∗ N∞ . Given such a hyperfinite , we can let C consist of all internal subsets of . The collection C is an internal σ-algebra, but it is also an algebra in the ordinary sense. Suppose P is an internal probability measure on (, C). For the coin tossing experiment, for example, each internal set A with internal cardinality |A|, would be given the probability P(A) = |A|/2η in ∗ [0, 1]. In the general case, on (, C) with values in the we can form a finitely additive real-valued measure P real interval [0, 1] by setting P(A) = st(P(A)). The question is, “Can we extend P to a countably additive measure on σ(C), i.e., the σ-algebra generated by C?” This is the question we consider next. We start with an arbitrary internal measure space (, C, μ). The internal measure μ does not have to take only limited values; it may even take the value +∞. When is a hyperfinite set, however, one usually sets C equal to the collection of all internal subsets of , and μ is usually an internal probability measure. In general, C is an internal σ-algebra as well as an algebra in the usual sense in . Let μ be the finitely additive, extended real-valued measure on (, C) defined at each A ∈ C by setting μ(A) = +∞ otherwise. Let μ(A) = st(μ(A)) when μ(A) is limited, and setting σ(C) denote the σ-algebra in the ordinary sense generated by C. The collection σ(C) is, in general, an external collection of subsets of . We can extend μ to a measure μ L defined on the measure completion L μ (C) of σ(C), and thus obtain a standard measure space (, L μ (C), μ L ) on , by using the Carathéodory Extension Theorem. To apply this technique, we first note that when a sequence Ai : i ∈ N, indexed by the ordinary natural numbers, consists of pairwise disjoint elements of C and the union A is also in C then A is actually a finite union since all but a finite number of the Ai ’s are empty. This follows from Proposition 2.9.8 i, but here is the simple proof: Using ℵ1 -saturation, we extend the sequence Ai : i ∈ N to an internal sequence Ai : i ∈ ∗ N; the set {m ∈ ∗ N : A ⊆ ∪1≤i≤m Ai }
98
P.A. Loeb
is internal and contains ∗N∞ , so it must contain some standard natural number since ∗N ∞ is external.
μ(Ai ) . By the Carathéodory Extension TheIt now follows that μ(A) = i∈N orem, the finitely additive measure μ has a σ-additive extension μ L defined on the completion L μ (C) of the σ-algebra σ(C). Moreover, if μ() is limited, this extension is unique. Ward Henson [17] has shown that the extension is still unique if the internal measure μ is just ∗ R-valued, and his proof also works if μ takes the value +∞. For the rest of this chapter, we shall assume that μ() is limited. Now (, L μ (C), μ L ) is an ordinary finite measure space formed on the internal set . Fix E in L μ (C). It is well known that for any ε > 0 in R, there are sets A ∈ Cδ and B ∈ Cσ with A ⊆ E ⊆ B such that μ L (B\A) < ε. By saturation, we may assume that A and B are actually sets in C. To see this, suppose that An : n ∈ N is a decreasing sequence in C with limit A. We may extend the sequence to an internal decreasing sequence indexed by ∗N, and choose an unlimited integer γ such that μ L (Aγ ) = limn∈N μ L (An ). A similar proof works for B. We leave it as an exercise to show that by saturation there is a set C ∈ C such that the symmetric difference E C is a null set in L μ (C). These set approximation properties characterize L μ (C) and have had many important applications. They have also been used in the literature to define L μ (C). Given an ∗R-valued function f on , we set ◦ f (x) = st( f (x)) if f (x) is limited in ∗R. We set ◦ f (x) = +∞ if f (x) is positive and unlimited in ∗R, and we set ◦ f (x) = −∞ if f (x) is negative and unlimited in ∗R. It follows from the set approximation properties that a function g : X → R ∪ {+∞, −∞} is measurable with respect to L μ (C) if and only if there is an internally C-measurable function f such that ◦ f = g μ -a.e. The function f is called a lifting of g. If such an f takes only limited L values, then g is integrable, and by the set approximation properties ◦
f dμ =
◦
f dμ L =
g dμ L .
(3.1)
It follows from the definition of the standard integral that Eq. (3.1) also holds when
[| f | − (| f | ∧ η)] dμ 0 ∀η ∈ ∗N∞
(3.2)
Condition 3.2 is a simple expression of the condition called S-integrability in the literature (see [2] and Chap. 6 of this book). If we know, on the other hand, that g is integrable, then it is easy to see that for a sufficiently small η ∈ ∗N∞ , replacing f with the function f · χ{| f |≤η} gives an S-integrable lifting of g. That is, with this replacement, the internal integral of f is limited and Eq. (3.1) holds. Suppose ∗ X is the nonstandard extension of a compact Hausdorff space X and μ is a limited-valued, internal measure on the internal Baire sets C in ∗ X . Then for each Borel set B ⊆ X , the set
3 Topology and Measure Theory
99
B˜ = ∪{monad(y) : y ∈ B} ˜ for is in L μ (C). A Borel measure ν on X can be defined by setting ν(B) = μ L ( B) each Borel set B ⊆ X . The measure ν restricted to the Baire sets is the standard part of the internal measure μ with respect to the weak ∗ topology on Baire measures. By Robinson’s criterion for convergence and clustering given in Sect. 3.3, this correspondence between ν and μ yields a nonstandard approach to the weak convergence of measures (see [3, 27]). In particular, one can obtain Lebesgue measure on an interval [a, b] as follows: Start with internal counting measure on the left endpoints of a hyperfinite interval partition of ∗ [a, b], all intervals having the same size, and note that this is the measure used to produce internal Riemann sums; the standard part of the internal measure is Lebesgue measure. (Also see [2].) When used in probability theory, the above general construction allows one to tackle problems of continuous parameter stochastic processes using the combinatorial tools available for discrete parameter processes. Examples are the author’s construction of Poisson processes in [24] and Anderson’s [2] representation of Brownian Motion and the Itô integral discussed in Chap. 7 of this book. As demonstrated by the work of Keisler [22], Fajardo and Keisler [13], Sun [36, 37], and others, these measure spaces have special closure properties not shared by even Lebesgue measure spaces. The above construction has yielded new standard-analysis results by many researchers in areas such as probability theory, potential theory, number theory, mathematical economics and mathematical physics. Here are six examples: Keisler’s [21] new existence theorem for stochastic differential equations, Perkins’ [31] award winning research (Rollo Davidson Prize in Probability Theory) on the theory of local time, Arkeryd’s [4] results on gas kinetics, Cutland and Ng’s work [10] on the Wiener sphere and Wiener measure discussed in Horst Osswald’s chapters in this book, Renling Jin’s work on number theory discussed in his chapter in this book, and Sun’s work in probability and economics discussed in his contribution to this book. We also note that Fields medalist Terence Tao has used Loeb measures in his recent multifaceted work [6, 38, 39]. We finish this chapter with two other applications from the literature.
3.12.1 The Martingale Convergence Theorem Our first application is a special case of the approach to the martingale convergence theorem by J. Bliedtner and the author in [7]. We start with a standard probability space (, E, P) and an increasing sequence of σ-algebras En : n ∈ N with E = σ(∪n∈N En ). We let Sn and S denote, respectively, the collections of all finite signed measures on (, En ) and (, E). We will use the notation ν|En for the restriction to En of a measure ν defined on a larger σ-algebra. Recall that an L 1 -bounded martingale h n : n ∈ N is a sequence of functions on such that for each n ∈ N,
100
P.A. Loeb
(i) h n is En -measurable, (ii) μn ∈ Sn where ∀A ∈ En , μn (A) := A h n d P, (iii) ∀m > n, μm |En = μn . (iv) supn A |h n | d P < +∞. The general technique established in [7] yields an easy proof of the following convergence theorem of Andersen and Jessen [1]. Theorem 3.12.1 (Andersen-Jessen) Fix μ ∈ S. For each n ∈ N , let μn = μ|En , and let h n be the Radon-Nikodym derivative of the nonsingular part of μ with respect to P and En . Then the L 1 -bounded martingale h n : n ∈ N converges P-almost everywhere to the Radon-Nikodym derivative of the nonsingular part of μ with respect to P. That is, dμn dμ (ω) = lim h n (ω) = lim (ω) n n dP dP for P-almost every ω ∈ . For the proof, see Theorem 3.2 of [7]. Not all martingales are generated by a measure, but nonstandard measure theory provides a way to use Andersen and Jessen’s result to establish Doob’s [11] convergence theorem for L 1 -bounded martingales. Fix an L 1 -bounded martingale h n : n ∈ N. We let (∗ , B∞ , PL ) denote the standard, complete probability space formed from the internal space (∗ , ∗E, ∗P) using the construction discussed above. Similarly, for each n ∈ ∗N, we let (∗ , Bn , νn ) denote the standard space formed from (∗ , ∗En , ∗μn ). Fix an index η ∈ ∗N∞ . The σ-algebra B = σ(∪n∈N Bn ) generated by the collection {Bn : n ∈ N} is a subset of Bη . That is, for each n ∈ N, Bn ⊆ B ⊆ Bη ⊆ B∞ . Let ν = νη |B. Given n ∈ N, we have νn = ν|Bn since for each A in the generating algebra ∗En , νn (A) = ◦ (∗μn (A)) = ◦ (∗μη (A)) = ν(A). Moreover, for each A ∈ ∗En , ◦ ∗
νn (A) = ( μn (A)) =
◦
∗
∗
◦ ∗
hn d P =
A
( h n ) d PL
A
since lim
m→∞
[|h n | − (|h n | ∧ m)] d P = 0.
That is, ∗h n is an S-integrable lifting of ◦ (∗h n ). Since the extension of νn from ∗En to PL , and the Radon-Nikodym derivative dνn /d PL = ◦ (∗h n ). By Bn is unique, νn the Andersen-Jessen Theorem, ◦ (∗h n ) → dν/d PL PL -a.e. on ∗ . The reader can now show using Egoroff’s theorem that the original sequence h n converges P-a.e.
3 Topology and Measure Theory
101
on (alternatively, see Proposition 3.4 in [7]). We remark that the limit is a function f such that ◦ (∗ f ) = dν/d PL on ∗ \U , where U is a set of PL -measure 0 in B∞ .
3.12.2 Representing Measures in Potential Theory For our second example, we work with harmonic functions on the unit disk in the complex plane C. Let Dr denote the open disk {z ∈ C : |z| < r }, and let D = D1 . Let Cr be the circle {z ∈ C : |z| = r }, and let C = C1 . All measures we consider will be Borel measures. Let P(z, x) be the Poisson Kernel (|z|2 − |x|2 )/|z − x|2 , and let x0 denote the origin. The space H1 consisting of all positive harmonic functions on D taking the value 1 at x0 is convex and compact with respect to the topology of uniform convergence on compact subsets of D, i.e., the ucc topology. It is well known that every continuous function on C has a harmonic extension on D and that not every harmonic function on D is obtained in this way. On the other hand, by the Riesz-Herglotz theorem there is for each h ∈ H1 a probability measure νh on C such that P(z, ·) νh (dz). h= C
The mapping z → P(z, ·) from C into H1 (with the ucc topology) is a homeomorphism. We may think of νh as a measure either on C or on the collection of harmonic functions {P(z, ·) : z ∈ C}. The latter point of view is that of Martin boundary theory and Choquet theory. The simplest realization of Choquet theory deals with a triangle. Each point inside and on a triangle is represented by a unique affine weight on the extreme points of the triangle, i.e., on the vertices. For the compact, convex set H1 , the extreme points are the functions {P(z, ·) : z ∈ C} , and each h ∈ H1 is represented by a unique probability measure νh on this set. While the usual construction of νh is simple for the disk, it does not generalize without going to an ideal boundary. The measure theory discussed above does yield a generalizable construction of νh by extracting a measure from the function h that would otherwise be lost at the boundary of D. We give a brief description of that construction. More details can be found in the original work [25, 26] from which this example is taken. First, we recall that for each circle Cr and each point x ∈ Dr , there is by the Riesz Representation Theorem a Borel measure μrx , called a harmonic measure, that gives the value at x of the harmonic extension of any continuous function on Cr . Moreover, normalized Lebesgue measure on Cr is the harmonic measure μrx0 with respect to the origin x0 . Given h ∈ H1 , the measures h · μrx0 , 0 < r < 1, are probability measures, and νh is the weak∗ limit as the radius r tends to 1. This construction of νh does not work for more general domains and potential theories, but the following modification is valid in these more general settings. Fix h ∈ H1 . For each r < 1, let {Ari } form an interval partition of Cr , and choose yir ∈ Ari . Let δ yir denote unit mass at the point yir . The net of measures
102
P.A. Loeb
h(yir )μrx0 (Ari ) · δ yir converges in the weak∗ topology to the measure νh on C. The direction for this net is given by letting r tend to 1 and refining the partitions {Ari }. To see that νh is in fact the weak∗ limit of this net, note that the integral of any continuous function f with respect to one of these measures with support in Cr is a Riemann sum approximation to the integral of f with respect to the measure h · μrx0 . Instead of a finite combination of measures concentrated on the points of D, we want a combination of point masses on the function space [0, +∞] D . Given r and a partition {Ari } of Cr , the function x → μrx (Ari ) is a harmonic function on Dr . It is the solution of the Dirichlet problem for the function that is 1 on Ari and 0 on the rest of Cr . When we divide by μrx0 (Ari ), the new function is equal to 1 at the origin x0 . Let δir be unit mass on the function that is equal to μrx (Ari )/μrx0 (Ari ) in Dr and is identically 0 on and outside Cr ; the point mass δir is a measure on the function space [0, +∞] D supplied with the product topology. (The restriction of the product topology is the ucc topology on the set of
positive harmonic functions on Dr taking the value 1 at x0 .) Again, the measures i h(yir )μrx0 (Ari ) · δir have νh as a weak ∗ limit as r approaches 1 and the partitions {Ari } are refined. The limit measure is on the set {P(z, ·) : z ∈ C} ⊂ H1 ⊂ [0, +∞] D . i
This construction of νh continues to work in quite general potential theoretic settings (see [26, 28]), and it does not use the Martin boundary. The proof that this construction works was obtained in [26] by interpreting the construction in [25] of νh as a limit result in the sense of [3, 27]. The construction of νh in [25] was the first application after coin tossing of the general measure theory we have described above. In [25], the measurability of the standard part map was used to replace the standardizations of internal measures with standard measures on the standard compact set H1 . Here, specialized to the case of the disk D, is that construction of νh . We start with a circle Cr ⊂ ∗D with r 1, and an interval partition {Ari } so fine that every standard harmonic function has infinitesimal variation on each set Ari . Suppressing the superscript r , we have
∗
h(y) dμx (y)
∀x ∈ D, h(x) = Cr
∗
h(yi )μx0 (Ai )
i
μx (Ai ) . μx0 (Ai )
The family of weights ∗h(yi )μx0 (Ai ) is made into an ordinary probability measure μh using the general measure theory described above. The measure μh is supported by the set of nonstandard harmonic functions μx (Ai )/μx0 (Ai ). This is a set of positive, internal harmonic functions on Dr , with each function taking the value 1 at x0 . The mapping S on this set of functions given by the formula S(g)(x) = ◦ (g(x)) ∀x ∈ D is the standard part mapping with respect to H1 supplied with the ucc topology. The measurability of S, established for this special case in [25] and generalized
3 Topology and Measure Theory
103
in [3, 27], allows one to project the measure μh onto H1 . The process preserves affine combinations of harmonic functions and yields representing measures, so by a corollary of a result by Cartier, Fell, and Meyer (see [25]), the final measure is the unique representing measure νh on the extreme points {P(z, ·) : z ∈ C} of H1 .
References 1. E.S. Andersen, B. Jessen, Some limit theorems on set-functions. Danske Vid. Selsk. Mat.-Fys. Medd. 25(5), 1–8 (1948) 2. R.M. Anderson, A nonstandard representation of Brownian motion and Itô integration. Isr. J. Math. 25, 15–46 (1976) 3. R.M. Anderson, S. Rashid, A nonstandard characterization of weak convergence. Proc. Am. Math. Soc. 69, 327–332 (1978) 4. L. Arkeryd, Loeb solutions of the Boltzmann equation. Arch. Ration. Mech. Anal. 86, 85–97 (1984) 5. P.T. Bateman, P. Erd˝os, Geometrical extrema suggested by a lemma of Besicovitch. Am. Math. Mon. 58, 306–314 (1951) 6. V. Bergelson, T. Tao, Multiple recurrence in quasirandom groups. Geom. Funct. Anal. 24, 1–48 (2014) 7. J. Bliedtner, P.A. Loeb, A reduction technique for limit theorems in analysis and probability theory. Arkiv för Matematik 30(1), 25–43 (1992) 8. J. Bliedtner, P.A. Loeb, The optimal differentiation basis and liftings of L ∞ . Trans. Am. Math. Soc. 352, 4693–4710 (2000) 9. C. Constantinescu, A. Cornea, Ideale Ränder Riemannscher Flächen (Springer, Berlin, 1963) 10. N.J. Cutland, S.-A. Ng, The Wiener sphere and Wiener measure. Ann. Probab. 21(1), 1–13 (1993) 11. J.L. Doob, Stochastic Processes (Wiley, New York, 1953) 12. B. Eifrig, Ein Nicht-Standard-Beweis für die Existenz eines Liftings, in Measure Theory Oberwolfach 1975, ed. by A. Bellow. Lecture Notes in Mathematics, vol. 541 (Springer, Berlin, 1976), pp. 133–135 13. S. Fajardo, H.J. Keisler, Existence theorems in probability theory. Adv. Math. 118, 134–175 (1996) 14. B. Fuglede, Remarks on fine continuity and the base operation in potential theory. Math. Ann. 210, 207–212 (1974) 15. Z. F˝uredi, P.A. Loeb, On the best constant for the Besicovitch covering theorem. Proc. Am. Math. Soc. 121(4), 1063–1073 (1994) 16. C.W. Henson, The nonstandard hulls of a uniform space. Pac. J. Math. 43, 115–137 (1972) 17. C.W. Henson, Unbounded Loeb measures. Proc. Am. Math. Soc. 74, 143–150 (1979) 18. C.W. Henson, L.C. Moore, Nonstandard analysis and the theory of Banach spaces, in Nonstandard Analysis: Recent Developments. Springer Lecture Notes in Mathematics, vol. 383 (Springer, Berlin, 1983) 19. M. Insall, P.A. Loeb, M. Marciniak, End compactifications and general compactifications. J. Log. Anal. 6(7), 1–16 (2014) 20. A. Ionescu-Tulcea, C. Ionescu-Tulcea, Topics in the Theory of Lifting (Springer, Berlin, 1969) 21. H.J. Keisler, An infinitesimal approach to stochastic analysis. Mem. Am. Math. Soc. 48, 297 (1984) 22. H.J. Keisler, Infinitesimals in probability theory, in Nonstandard Analysis and Its Applications, ed. by N.J. Cutland (Cambridge Press, Cambridge, 1988), pp. 106–139 23. J.L. Kelley, General Topology (Van Nostrand, New York, 1955)
104
P.A. Loeb
24. P.A. Loeb, Conversion from nonstandard to standard measure spaces and applications in probability theory. Trans. Am. Math. Soc. 211, 113–122 (1975) 25. P.A. Loeb, Applications of nonstandard analysis to ideal boundaries in potential theory. Isr. J. Math. 25, 154–187 (1976) 26. P.A. Loeb, A generalization of the Riesz-Herglotz theorem on representing measures. Proc. Am. Math. Soc. 71(1), 65–68 (1978) 27. P.A. Loeb, Weak limits of measures and the standard part map. Proc. Am. Math. Soc. 77(1), 128–135 (1979) 28. P.A. Loeb, A construction of representing measures for elliptic and parabolic differential equations. Math. Ann. 260, 51–56 (1982) 29. J. Lukeš, J. Malý, L. Zajíˇcek, Fine Topological Methods in Real Analysis and Potential Theory. Lecture Notes in Mathematics, vol. 1189 (Springer, Berlin, 1986) 30. W.A.J. Luxemburg, A general theory of monads, in Applications of Model Theory to Algebra, Analysis, and Probability, ed. by W.A.J. Luxemburg (Holt, Rinehart, and Winston, New York, 1969) 31. E.A. Perkins, A global intrinsic characterization of Brownian local time. Ann. Probab. 9, 800– 817 (1981) 32. E.F. Reifenberg, A problem on circles. Math. Gaz. 32, 290–292 (1948) 33. A. Robinson, Compactification of groups and rings and nonstandard analysis. J. Symb. Log. 34, 576–588 (1969) 34. A. Robinson, Non-standard Analysis (North-Holland, Amsterdam, 1966) 35. S. Salbany, T. Todorov, Nonstandard analysis in topology: nonstandard and standard compactifications. J. Symb. Log. 65, 1836–1840 (2000) 36. Y.N. Sun, Integration of correspondences on Loeb spaces. Trans. Am. Math. Soc. 349, 129–153 (1997) 37. Y.N. Sun, A theory of hyperfinite processes: the complete removal of individual uncertainty via exact LLN. J. Math. Econ. 29, 419–503 (1998) 38. T. Tao, T. Ziegler, The inverse Conjecture for the Gowers norm over finite fields in low characteristic. Ann. Comb. 16, 121–188 (2012) 39. T. Tao, Hilbert’s Fifth Problem and Related Topics. Graduate Studies in Mathematics, vol. 153 (American Mathematical Society, Providence, 2014) 40. F. Wattenberg, Nonstandard measure theory: avoiding pathological sets. Trans. Am. Math. Soc. 250, 357–368 (1979)
Part II
Functional Analysis
Chapter 4
Banach Spaces and Linear Operators Manfred P.H. Wolff
4.1 Introduction In this chapter we deal with old and new applications of nonstandard analysis to the theory of Banach spaces and linear operators. In particular we consider the structure theory of Banach spaces, basic operator theory, strongly continuous semigroups of operators, approximation theory of operators and their spectra, and the Fixed Point Property. To include in this chapter interesting examples of nonstandard functional analysis we must assume that the reader is familiar with the basics of Banach spaces and operator theory. Non-experts in these field can, however, profit from this chapter by looking at the elementary applications with which we begin every section. Moreover we refer to the book [45] for those, who want to learn functional analysis and simultaneously nonstandard analysis. Nonstandard functional analysis is comparable to the theory of ultraproducts of Banach spaces and operators. More precisely, in a broad sense an ultraproduct of different Banach spaces corresponds to the nonstandard hull of an internal Banach space. However, the reader should notice that nonstandard analysis enables us to prove important properties of nonstandard hulls easily with the use of the Transfer Principle, which is not available within the frame work of the theory of ultraproducts. Moreover, the structure of internal Banach spaces and internal operators is much richer than that of ultraproducts, and helps to make the proofs for nonstandard hulls quite easy. The application of nonstandard analysis to functional analysis has three different aspects: First, it helps to find and prove new standard theorems and to solve standard problems. Secondly, it establishes a deep analysis of the nonstandard hulls of given standard objects and their relations to those objects. Thirdly, it is a development of its own interest. M.P.H. Wolff (B) Mathematisches Institut der Universität Tübingen, Auf der Morgenstelle 10, 72076 Tübingen, Germany e-mail:
[email protected] © Springer Science+Business Media Dordrecht 2015 P.A. Loeb and M.P.H. Wolff (eds.), Nonstandard Analysis for the Working Mathematician, DOI 10.1007/978-94-017-7327-0_4
107
108
M.P.H. Wolff
For the basics of nonstandard analysis we refer to Chap. 2 and for the basics of topology to Chap. 3.
4.2 Basic Nonstandard Analysis of Normed Spaces 4.2.1 Internal Normed Spaces and Their Nonstandard Hull In the following let V (X ) = n≥0 Vn (X ) be the full superstructure over an appropriate infinite set X containing C and containing in addition the normed linear spaces we want to consider. We always use nonstandard extensions V (∗ X ) of V (X ) which are κ-saturated for some κ > card(V (X )) (see Definition 2.9.1 and the following remark). Such enlargements are called polysaturated. Moreover we always consider vector spaces over K = C unless explicitly stated otherwise. Let F be a subset of Vn (X ) for some n ≥ 1 and assume that all F ∈ F are normed linear spaces (Banach spaces, resp.). Then every G ∈ ∗ F is an internal element (see Definition 2.8.1). The Transfer Principle (Definition 2.4.3v) implies that such a G is a (nonstandard) normed linear space (Banach space, resp.). More precisely, this means that G is an internal vector space over ∗ K, equipped with an internal function · : G → ∗ R satisfying the usual axioms of a norm. When all F ∈ F are Banach spaces then G is complete in the following sense: If (xn )n∈∗ N is an internal Cauchy sequence then there exists z ∈ G such that for every ε > 0 in ∗ R there exists n(ε) ∈ ∗ N with z − xn < ε for all n ∈ ∗ N, n ≥ n(ε). However, we do not make use of this notion of convergence in the sequel. If E is an internal normed linear space in V (∗ X ) then we consider its finite part Fin(E) = {x ∈ E : x ∈ Fin(∗ R)} where Fin(∗ R) = {t ∈ ∗ R : t is nearstandard} is the finite part in ∗ R (see Sect. 1.6). Fin(E) is obviously an external vector space over K and E 0 = {x ∈ E : x 0} is a subspace. By x y iff x − y ∈ E 0 (iff x − y 0) there is defined an equivalence relation on the whole space E which is compatible with the external linear operations on E viewed as a vector space over is called the nonstandard hull of E. By K. The quotient space Fin(E)/E 0 = E q( x ) := ◦ x there is uniquely defined a norm on E. The following proposition is an immediate application of Theorem 3.9.2. Proposition 4.2.1 The nonstandard hull is always complete with respect to q. Remark 4.2.2 Let E be a standard normed linear space of our superstructure. Then by abuse of our previous definition we often denote the nonstandard hull ∗ E of the and call it the nonstandard hull of E if no confusion can internal space ∗ E by E arise. Moreover in this case we denote the set E 0 by μ· (0) or simply μ(0), whenever the context allows this simplification. This notation is consistent with that in the first three chapters, since μ(0) is nothing but the monad of 0 with respect to the norm topology.
4 Branch Spaces and Linear Operators
109
From now on we shall write x in place of q( x ), hoping that no confusion will be generated thereby. (defined by x → If F is a standard normed space then by F → Fin(∗ F) → F ∗ x → ∗ in a canonical manner. So F x) the space F is isometrically embedded into F . Its closure F then is the completion can and will be identified with a subspace of F = F since the unit ball of F is of F. Note that if F is finite dimensional then F compact; cf. Corollary 4.2.11. The following relations between ∗ E and ∗ E are indispensable in nonstandard functional analysis: Proposition 4.2.3 Let E be a standard normed space. (i) Let x ∈ ∗ E be arbitrary. x ∈ Fin(E) if and only if for all ξ ∈ Fin(E ) < x, ξ >∈ Fin(K) holds. (ii) Let ξ ∈ ∗ E be arbitrary. ξ ∈ Fin(E ) if and only if for all x ∈ Fin(E) < x, ξ >∈ Fin(K) holds. (iii) x ∈ ∗ E is nearstandard if and only if there exists y ∈ E (standard) with < x, ξ > < ∗ y, ξ > for all ξ ∈ Fin(E ). (iv) ξ ∈ ∗ E is nearstandard if and only if there exists ψ ∈ E (standard) with < x, ξ > < x, ∗ ψ > for all x ∈ Fin(E). Proof (i) If x ∈ Fin(E), then there exists a standard r such that x ≤ r . If ξ ∈ Fin(E ) then there exists a standard s such that ξ ≤ s. By the Transfer Principle we get | < x, ξ > | ≤ sr. If x is not in Fin(E), then x = r ≈ ∞. The formula x = sup{| < x, ξ > | : ξ ≤ 1} and the Transfer Principle imply that for ε > 0 there exists ξ ∈ ∗ B(E ) (B(E ) d enotes the dual unit ball) such that r − ε ≤ | < x, ξ > | ≤ r . The assertion follows. The remainder of the proof is left as an exercise. Problem 4.2.4 Prove the remainder of the proposition. Hint: use the formulas x = sup{| < x, ξ > | : ξ ≤ 1}, and so on, and the Transfer Principle. As a further basic lemma we will prove an assertion on the ∗ —linear independence of vectors. Lemma 4.2.5 Let E be an internal Banach space and let y1 , . . . , yn be linear inde Then y1 , . . . , yn are internally linear independent in E. pendent in E. Proof Suppose on the contrary that n1 αk yk = 0 where at least one of the α j is not 0. Set β := max(|α 1 |, . . . , |αn |). (Notice that this number exists by the Transfer Principle.) Then n1 γk yk = 0 where γk := αk /β are of absolute value less than or equal to 1 and at least one of them is of absolute value 1. But then n1 ◦ γk yk = 0, a contradiction. Now let E be an internal normed space and A ⊂ E an internal set = ∅. Then by the Transfer Principle we can define the distance of an element y to A by d(y, A) = inf{y − x : x ∈ A} and we conclude
110
M.P.H. Wolff
(i) d(y, A) = 0 iff y ∈ A¯ (the internal closure of A in E), (ii) For every ε > 0 (ε 0 is allowed) there exists z ∈ A such that d(y, A) ≤ y − z < d(y, A) + ε. The proof of the next proposition is taken from standard functional analysis. However its consequence Corollary 4.2.7 for the nonstandard hull of an internal Banach space is interesting in itself since it does not hold for arbitrary closed subspaces of the nonstandard hull. Proposition 4.2.6 (F. Riesz) Let H be a closed internal subspace of the internal normed space E, and assume H = E. Then to every y ∈ E\H there exists x in the internal span of H and y such that x = 1 and d(x, H ) 1. y Proof y ∈ E\H implies y = z ∈ / H , hence 1 ≥ α := d(z, H ) = 0. Choose η 0, η > 0 arbitrarily. Then there exists u ∈ H satisfying α ≤ z−u < α(1+η). z−u has the desired properties. For if v ∈ H , then But then x = z−u
x − v =
1 1 α z − u − z − uv ≥ ≥ , z − u z − u 1+η ∈H
whence 1 ≥ d(x, H ) ≥
1 1+η
1.
Corollary 4.2.7 Let F be an internal subspace of the internal Banach space E. ˆ Fˆ there exists an element xˆ of norm 1 in the linear hull of yˆ Then to every yˆ ∈ E\ ˆ ˆ = 1. and F such that d(x, ˆ F)
Problem 4.2.8 (1) Work out the proof of the corollary. (2) Prove the following standard theorem: Let E be a (standard) normed linear space and let F be a finite dimensional subspace. Then to every y ∈ / F there exists an x of norm 1 in the linear hull of y and F such that d(x, F) = 1. The standard version of the foregoing proposition is an immediate consequence of the Transfer Principle: Corollary 4.2.9 (standard) Let F be a standard normed linear space and let H be a closed linear subspace. Then to every (standard) ε with 0 < ε < 1 and to every y ∈ F\H there exists x in the linear hull of H and y such that x = 1 and d(x, H ) > ε. The next proposition characterizes the internal normed linear spaces of standard finite dimension over ∗ C. is locally Proposition 4.2.10 Let E be an internal normed linear space. Then E ∗ n compact iff E is internally linear isomorphic to C for some standard n.
4 Branch Spaces and Linear Operators
111
Proof Assume first of all that E is not internally linear isomorphic to ∗ Cn for any standard n. By induction on n we construct a sequence (yn ) such that yn = 1, d(yn , Hn−1 ) > 1/2 where Hn−1 is the internal vector space spanned by y1 , . . . , yn−1 . The construction is possible by Proposition 4.2.1, and by assumption, it does not stop. cannot be locally compact. yn ≥ 21 for m = n, and thus E It follows that ym − Now assume that E is internally linear isomorphic to ∗ Cn for some standard n. ≤ n. Proposition 4.2.6 implies that dim( E) = n. Hence E By Lemma 4.2.5 dim( E) is norm isomorphic to Cn equipped with an appropriate norm, which is known to be locally compact. Corollary 4.2.11 Let E be a standard Banach space. Then E is finite dimensional = E. if and only if E Problem 4.2.12 Work out the proof. We need the following fact about the cardinality of hyperfinite sets. Lemma 4.2.13 Let M be a hyperfinite set of internal cardinality N ∈ ∗ N\N. Then M is (externally) not countable. Proof By transfer, there exists an internal bijection f , say, from M onto P := {k/N : 0 ≤ k ≤ N −1} ⊂ ∗ [0, 1] where [0, 1] denotes the unit interval of R. Also by transfer, the mapping g : ∗ ]0, 1] → P defined by g(x) = max{k/N ∈ P : k/N ≤ x} is internal. Its restriction to the external subset [0, 1] is obviously injective, and the assertion follows. Corollary 4.2.14 Let E be an internal normed linear space. If E has internal dimen∼ is nonseparable. sion n in the standard natural numbers N, then E = Cn ; otherwise E Proof The first part of our assertion follows directly from Proposition 4.2.10. If E does not have internal dimension n ∈ N, then there exists an infinitely large N ∈ ∗ N with dim(E) ≥ N . By induction on n and transfer, we construct an internal sequence (yn ) of normalized vectors satisfying d(yn , ym ) > 21 for m = n, m, n ≤ N . But yn − ym ≥ 1/2 holds then M := { yn : n ≤ N } is contained in the unit sphere and for all n = m, m, n ≤ N . By Lemma 4.2.13 M is not countable. Remark 4.2.15 The counterpart in the theory of ultraproducts says that an ultraproduct of a family (E k ) of normed linear spaces is of dimension n iff almost all E k are of that dimension. If it is infinite dimensional then it is inseparable. Before we present some examples we want to show how to construct points that are not nearstandard: Lemma 4.2.16 Let E be a standard normed vector space and let (xn )n be a bounded sequence of E satisfying xm − xn ≥ α > 0 for all m = n. If N ∞ then x N is not nearstandard, i.e. a remote point.
112
M.P.H. Wolff
Proof Assume that x N is nearstandard to y ∈ E, i.e. x N − y 0. Let n ∈ N be arbitrary. Then the formula ∃k > n(xk − y < 1/n) holds in the nonstandard extension hence in the standard world by the Downward Transfer Principle. So we get a subsequence (xkn )n with xkm − xkn < 2/n, a contradiction to our hypothesis. We now give some examples. Example 4.2.17 Let E = = {x ∈ CN : xn = 0 for all n ≥ n(x)} be the vector space of all C-valued sequences with only finitely many coordinates = 0. (1) Set ∞ = , equipped with the supremum norm x = sup{|xn | : n ∈ N}. We denote by en the sequence (δk,n )k∈N where δk,n is the Kronecker symbol. Let now E n be the subspace spanned by {e1 , . . . , en } . Then E n = {x : x = nk=1 xk ·ek }. By the Transfer Principle we obtain internal hyperfinite dimensional subspaces E N of ∗ E for each hyperfinite integer N . The sequence u, defined by u(n) = N k=1 ek is an internal element of E N of norm 1. (a) Take f ∈ l ∞ (N) and set f˜ = ∗ f · u. Then f˜ is in E N , and the mapping U : l ∞ (N) f → f˜ ∈
EN ⊂ ∞ is a linear isometric embedding of l ∞ (N) into ∞ . (b) Let E = C([0, 1] be the space of all complex valued continuous functions on the unit interval, and let n ∈ N be arbitrary. Consider the finite partition { nk : 0 ≤ k ≤ n} and set fn =
f
k n
0
k≤n k>n
Then f n ∈ . The transfer principle gives an element f N ∈ E N . The mapping ∞ , given by f → fˆN , U : C([0, 1]) →
EN ⊂ ∞ . In the same manner is a linear isometric embedding of C([0, 1]) into ∞ for every separable one can construct an isometry from C(X ) into compact space X . (2) Now for 1 ≤ p < ∞ we consider the norm ( denoting this space by p .
∞
|xk | p )1/ p = x p on ,
k=1
(a) Similarly to the case studied above we embed the classical space l p (N) into
p by EN ⊂ U : l P (N) →
E N , f → ∗ f · u. U is again an isometry.
4 Branch Spaces and Linear Operators
113
(b) Let λ be the Lebesgue measure on the unit interval [0, 1]. For f ∈ C([0, 1]) set 1 k f nk k ≤ n = n 1/ p fn 0 k>n n Again we obtain a linear isometric embedding of L p ([0, 1]) by p , f → fN U : C[[0, 1]) → and continuous extension to L p ([0, 1]). Problem 4.2.18 Prove that U in the last part of the example is indeed an isometry. It is typical that a given standard Banach space like the examples above can be embedded into the nonstandard hull of a hyperfinite dimensional space. In order to establish this phenomenon in full generality we need a lemma on the domination of internal upwards directed families. Notice that we always assume that our extension V (∗ X ) is polysaturated. Lemma 4.2.19 Let A ⊂ D ∈ Vn (∗ X ) be a partially ordered upwards directed family of internal objects and assume that the cardinality κ(A) is strictly less than the cardinality of our standard superstructure V (X ). Then there exists an element A ∈ D satisfying B ≤ A for all B ∈ A. Proof The binary relation P = {(A, B) : A ≤ B, A, B ∈ D} is concurrent on A by assumption. Because card(A) < card(V (X )) the assertion follows from our assumption that ∗ V (X ) is polysaturated. (See the beginning of this section as well as Sect. 2.9 and the remarks that follow.) The following result enables us to consider standard normed linear spaces as (external) subspaces of hyperfinite dimensional internal Banach spaces. Theorem 4.2.20 Let E be an arbitrary (standard) vector space of our superstructure. Then there exists an internal hyperfinite dimensional subspace F of ∗ E, such that E is externally contained in F in the sense that ∗ x ∈ F for all x ∈ E. Proof Let A be the set of all finite dimensional linear subspaces of E. Then A is upwards directed by inclusion and contained in ∗ A. Thus the assertion follows from Lemma 4.2.19. Remark 4.2.21 The corresponding result in the theory of ultraproducts says that every ultrapower Eˆ of a normed space E contains an ultraproduct of finite dimensional subspaces of E that in turn contains an isomorphic copy of E. The interested reader may wish to construct a proof of this assertion.
114
M.P.H. Wolff
4.2.2 Standard Continuous and Internal S–continuous Linear Operators The characterization of the continuity of a linear operator from one normed space E to another one F is well known. The transfer of this characterization is of less use than the following notion: Let E, F be internal normed linear spaces. An internal map f : D ⊂ E → F is called S–continuous at x ∈ D if for all y ∈ D with y x we have f (y) f (x). The map f is called uniformly S–continuous if u v implies f (u) f (v) for all u, v ∈ D. Internal linear maps that are S–continuous at 0 are very nice as the following proposition shows. Proposition 4.2.22 Let E, F be internal normed linear spaces and let T be an internal linear map from E to F. The following assertions are equivalent: (i) (ii) (iii) (iv) (v)
T is (uniformly) S–continuous. T is S–continuous at 0, which means x 0 implies T x 0. T (Fin(E)) ⊂ Fin(F). There exists a standard real number M such that T x ≤ Mx for all x ∈ E. T := sup{T x : x = 1} ∈ Fin(K ).
Notice that the supremum of an internal internally bounded subset of ∗ R always exists by the Transfer Principle. Proof (v) and (iv) are obviously equivalent. x−y (v) → (i): Let x y but x = y. Then T x − T y = x − y · T x−y ≤ x − y · T 0. (ii) →(iv): Let Bε denote the Ball with radius ε. The set A := {n ∈ ∗ N : T (B1/n ) ⊂ B1 } contains all N ≈ ∞ and is internal. Hence it contains a standard n by the Spillover Principle (Theorem 2.8.12). It follows T (B1 ) ⊂ Bn . (iv) → (iii): This is obvious. (iii) → (iv): T (B1 ) ⊂ Fin(E) = n∈N Bn and T (B1 ) being internal imply that T (B1 ) ⊂ Bn for some standard n by Proposition 2.9.7. The rest is obvious. The number T is called the operator norm. Corollary 4.2.23 Let T : E → F be an internal S–continuous linear map. Then by : ( T x→T x ) := (T x) there is uniquely defined a bounded linear operator from E ◦ to F, called the nonstandard hull of T . Its norm is given by T = T . Problem 4.2.24 Work out the proof. is consistent with the definition of the Remark 4.2.25 Notice that the definition of T is nothing but nonstandard hull of an internal normed space. That is, the graph of T the nonstandard hull of the graph of T . Now let E, F be standard Banach spaces (in V (X )), and denote by L(E, F) the Banach space of all bounded linear operators from E to F, equipped with the operator norm T = sup{T x : x = 1}.
4 Branch Spaces and Linear Operators
115
Corollary 4.2.26 Let T be a bounded linear operator from E to F. Then ∗ T is an T as above. S–continuous linear operator from ∗ E to ∗ F, and so we can construct ∗ ∗ ∗ F) ⊂ The map L(E, F) T → T → T (=: T ) is a linear isometry into L(E, L( E, F). This embedding also satisfies (ST ) = S T for T ∈ L(E, F), S ∈ L(F, G). Proof Obvious. Remark 4.2.27 (i) If in Corollary 4.2.23 we set F = ∗ C, and if moreover we replace the internal space E considered there by the space ∗ E , where E denotes a standard Banach space, then we get an isometric embedding of the nonstandard of the dual space E of E into the dual space ( E) of the nonstandard hull E (see Sect. 4.3 below). of E. In general, however, ( E) E hull E Notice, that the norm on E, E , respectively, are the operator norms, because E can be identified with a subspace of L(E , K), and E = L(E, K) by definition. (ii) In this context we recall the definition of an adjoint operator: Let T : E → F be a continuous linear operator. Then T : F → E , defined by T ξ = ξ ◦ T is E called the adjoint operator. It is continuous with equal norm. Since ( E) in general we have also (T ) = T in such a case. Next we characterize the invertibility of internal operators: Proposition 4.2.28 Let S be an internal, S–bounded operator from the internal Banach space E into the internal Banach space F. The following assertions are equivalent: (i) S is bijective (ii) S is bijective and S −1 is finite. If one of these conditions is satisfied then S −1 = ◦ S −1 . Proof (i) ⇒ (ii): Since (i) holds S x ≥ δ > 0 for all x of norm 1 and δ = S −1 −1 . Obviously, Sx > δ/2 for all x of norm 1. Suppose S is not onto. By the open mapping theorem (and the Transfer Principle) S(E) is (internally) closed. If S(E) = F then by 4.2.6 there exists y such that y = 1 and = by the definition of d(y, S(E)) 1. But then y∈ / S(E) S( E) S (cf. Corollary 4.2.23). This contradiction implies that S is bijective and S −1 ≤ 2/δ. (ii) ⇒ (i) as well as the additional assertion are obvious. Finally we give a characterization of standard operators with closed range. To this end we restrict ourselves to Banach spaces. Let E, F be (standard) Banach spaces and let T : E → F be a continuous linear map. Set γ(T ) = inf{T (x) : d(x, ker(T )) = 1}. The open mapping theorem tells us that γ(T ) > 0 holds if and only if the range of T is closed (consider the induced mapping from E/ ker(T ) onto the range of T ). We ∗T ) = ∗ ). ker(T ) ⊂ ker(T point out that we always have ker(
116
M.P.H. Wolff
Problem 4.2.29 Prove this assertion. Proposition 4.2.30 Let E and F be Banach spaces and let T : E → F be a continuous linear mapping. The following assertions are equivalent: (i) The range T (E) is not closed. ). ker(T ) = ker(T (ii) ∗ (iii) There exists a remote point x of norm 1 with 0 = ∗ T x 0. Proof (i) → (ii): If the range is not closed then γ(T ) = 0 by the preceding paragraph. By the Transfer Principle there exists x ∈ ∗ E such that ∗ d(x,∗ ker(T )) = 1 and ∗ T (x) 0, hence xˆ ∈ / ∗ ker(T ) but Tˆ (x) ˆ = 0. ∗ (ii) → (iii): By Proposition 4.2.6 there exists xˆ ∈ ker(T )\ ker(T ) satisfying d(x, ˆ ∗ ker(T )) = 1. For x ∈ xˆ we have x ≥ 1 as well as 0 = ∗ T x 0. If x would be near standard there would exist a standard element y satisfying T y = 0, a contradiction to d(x, ∗ ker(T )) 1. (iii) → (i): If T (E) would be closed then γ(T ) > 0, a contradiction to x ˆ = 1 and 0 = ∗ T x 0.
4.2.3 Special Banach Spaces and Their Nonstandard Hulls Our first assertion says in particular that the nonstandard hull of every standard Hilbert space is a Hilbert space. is a Hilbert space. Proposition 4.2.31 If H is an internal Hilbert space then H Proof If H is an internal Hilbert space, then the parallelogram law x + y2 + x − , proving the y2 = 2(x2 + y2 ) holds for H . But this law then also holds in H assertion. The corresponding scalar product is given by (x| ˆ yˆ ) = ◦ (x|y). Example 4.2.32 Let H = 2 (N) and consider the orthonormal basis given by en = (δk,n )k where δk,n denotes the Kronecker symbol. Then the set ∗ {en : n ∈ N} is an internal orthonormal basis in ∗ 2 (N) but {eˆn : n ∈ ∗ N} is not an orthonormal basis in Hˆ , although it is obviously an orthonormal system. To show this take an 2 arbitrary (standard) continuous N ∗ function g√on the unit interval [0, 1] with L -norm 1 and consider f = k=1 g(k/N ) ek / N where N denotes an infinitely large integer. (Cf. also Example 4.2.36 (1)). We recall the following characterization of complex Banach lattices from [41]: Let E be a Banach space over C and assume that there is an idempotent function | · | : E → E satisfying the following equations: 1. |αx| = |α| |x| for α ∈ C, where |α| is the usual absolute value in C, 2. | |x| + |y| − |x + y| | = |x| + |y| − |x + y|,
4 Branch Spaces and Linear Operators
117
3. | |x| − |y| | = |x| − |y| ⇒ y ≤ x, 4. E is the linear hull of {|x| : x ∈ E}. Then (E, |·|) is the unique complexification of the real Banach lattice E R = E + − E + , where E + = {x ∈ E : |x| = x} and the function | · | is the complex extension of the real absolute value |x| := sup(x, −x) where x is in E R and sup denotes the supremum in E R . In particular |x + i y| = sup{x cos θ + y sin θ : θ ∈ [0, 2π]} and |x + i y| = x + i y holds (for Banach lattices see [40, 57]). For x ∈ E R we set x + = (|x| + x)/2 as well as x − = (|x| − x)/2, and we obtain x + = sup(x, 0), x − = − inf(x, 0) as in the case of concrete Banach lattices of functions on a set Y . The way we have introduced Banach lattices enables us to show easily the following result by use of the Transfer Principle: Proposition 4.2.33 If E is an internal Banach lattice with absolute value | · | then is a Banach lattice with absolute value | E x | := |x|. Corollary 4.2.34 Let E be a standard Banach lattice. Then the embedding J of E satisfies |J (x)| = J (|x|), so E can be viewed as a Banach sublattice of E. into E Let us recall some further basic notions from the theory of Banach lattices. The infimum of two elements x and y of the real Banach lattice E is denoted by inf(x, y). A linear operator T from one Banach lattice E to another one F is called positive if |T x| ≤ T |x| holds for all x ∈ E. The operator T is called a lattice homomorphism if |T x| = T |x| holds for all x. Our next concrete examples are based on two wellknown theorems, due to S. Kakutani, H. Bohnenblust, resp., for special cases and generalized further by others, see [34], p. 135. Theorem 4.2.35 (standard) Let (E, | · |) be a complex Banach lattice. (i) Assume that there exists 1 ≤ p < ∞ such that f + g p = f p + g p for all f, g ∈ E with inf(| f |, |g|) = 0. Then there exists an appropriate measure space (X, , μ) and an isometric lattice isomorphism from E onto L p (X, , μ). (ii) Assume that (a) sup(| f |, |g|) = sup( f , g) and (b) | f | ≤ f · u for some u > 0. Then there exists a compact space K and a lattice isomorphism from E onto the space C(K ) of all complex valued continuous functions on K . Example 4.2.36 (1) Let (X, , μ) be an arbitrary measure space and 1 ≤ p < ∞. ˜ μ), ˜ μ) Then E = L p (X, , μ) is a Banach lattice and ∗ E = L p ( X˜ , , ˜ where ( X˜ , , ˜ is an appropriate measure space. Proof F := ∗ E is a Banach lattice by Proposition 4.2.33. Let x and y be arbitrary with inf(| x |, | y|) = 0. Then the fact that z := inf(|x|, |y|) 0 follows from Proposition 4.2.33. Now set u := |x| − z and v := |y| − z. Then inf(u, v) = 0, hence u p + v p = u + v p holds by transfer and by the fact that such a formula holds in E. Apply now the quotient mapping from Fin(∗ E) onto ∗ E to this formula and use (i) of the foregoing Theorem.
118
M.P.H. Wolff
p (N) = In particular for X = N and μ is counting measure we obtain l ˜ μ). ˜ This space contains a sublattice isomorphic to L p ([0, 1]). For a proof L p ( X˜ , , cf. the Example 4.2.17 (2) Let K be compact and let E = C(K ) be the Banach lattice of all continuous complex valued functions on E. Then equipped with the usual absolute value E is a complex Banach lattice with the following two additional properties: (a) sup(| f |, |g|) = sup( f , g) and (b) | f | ≤ f · u for some u > 0. Property (b) says that u is a strong order unit. Now there exists another compact = C( K˜ ). space K˜ such that E E satisfy (a) and (b) and also The proof is based on the fact that ∗ E and hence ∗ on Theorem 4.2.35, (ii). In [22] K˜ is identified with a compactification of ∗ K . A Banach algebra A is a Banach space over K equipped with an associative and distributive multiplication such that α(x y) = (αx)y = x(αy) holds for all x, y ∈ A and α ∈ K and moreover one has x y ≤ x y for all x, y ∈ A. If there exists a unit then A is called unital. If there exists an antilinear involution x → x ∗ of norm ≤ 1, then A is called an involutive Algebra. If the involution satisfies x2 = x ∗ x then A is called a C ∗ –algebra. Now let A be an internal Banach algebra in V (∗ X ). Then the mapping (x, y) → x y is obviously S–continuous from Fin(A)×Fin(A) into Fin(A). But then the following result is not hard to show:
of an internal Banach algebra A is a Proposition 4.2.37 The nonstandard hull A is unital iff A is unital and A is commutative if A is. Banach algebra. Moreover, A is involutive, a C ∗ –algebra, resp., if A is involutive, a C ∗ –algebra, resp. Finally, A Applying Proposition 4.2.28 to an internal Banach algebra A with unit e we obtain the following result: Proposition 4.2.38 ([5], Theorem 1) Let x ∈ Fin(A) be arbitrary. xˆ is invertible if and only if x is invertible and x −1 is in Fin(A). Proof Consider the linear map L x : a → xa. Let A be a (standard) Banach algebra with unit e and let Inv(A) be the set of invertible elements. We consider the problem how to characterize those algebras for which Inv(A) is dense in A. The algebra of all analytic functions on the disc {z ∈ C : |z| < 1} which have a continuous extension to the unit circle fails to have this property. We call xˆ ∈ Aˆ generalized invertible, if there exists y x such that y ˆ be the set of all generalized invertible elements. is invertible in ∗ A. Let GInv( A) Proposition 4.2.39 ([5], Proposition 5) Let A be a (standard) Banach algebra with ˆ if and only if A = Inv(A). unit e. Then Aˆ = GInv(( A)) Proof Let a ∈ Inv(A) be arbitrary. Then there exists a sequence (an )n∈N of invertible elements an converging to a. As a convergent sequence it is bounded. But then a N =a ˆ for every N ≈ ∞. as well as a N ∈ GInv( A)
4 Branch Spaces and Linear Operators
119
ˆ be true. Let a ∈ A be arbitrary. Then to ∗ a there Conversely let Aˆ = GInv(( A) ∗ exists y ∈ Inv( A) satisfying y ∗ a. Thus for every standard n the following assertion is true in V (∗ X ) hence in V (X ): ∃[y ∈ Inv(A) : a − y < 1/n], thus a ∈ Inv(A).
4.2.4 Notes Section 4.1 contains results that are needed in every advanced nonstandard functional analysis course (see e.g. [11, 36, 61]). More recent introductions are to be found in e.g. [28]. Proposition 4.2.6 is a reformulation of Riesz’ result and its standard proof (see e.g. [81], III.2) within the frame work of nonstandard analysis. Proposition 4.2.10 holds also for topological vector spaces. An extensive study of examples like the ones presented in 4.2.17 is to be found in [23]. Theorem 4.2.20 is nothing but a special application of saturation, and Proposition 4.2.22 is orientated to standard facts. The consequences serve as basic facts in all advanced applications. Example 4.2.36 are also known within the frame work of ultraproducts of Banach spaces and trace back to results of Dacunha–Castelle and Krivine in the late 1960s (see e.g. [10]). For a comprehensive representation of these and other results see [19]. Corresponding results within the framework of nonstandard analysis may be found in [23]. Ultraproducts of Banach algebras seem to have been considered already in the late 1950s (see [53]) and came up again in 1970 in a conference on nonstandard analysis. In fact Janssen seems to be the first one who has applied such a construction to C ∗ – algebras (see [30]) (almost identical results where proved apparently independent of [30] by Hinokuma and Ozawa [27]). Only a little later Connes (see e.g. [8]) and others used ultrapower techniques to obtain, for example, the classification of W ∗ –algebras of type III.
4.3 Advanced Theory of Banach Spaces 4.3.1 A Brief Excursion to Locally Convex Vector Spaces Introduction As is generally known a seminorm p on the vector space E over K is a positive real valued function on E satisfying (i) p(x +y) ≤ p(x)+ p(y) and (ii) p(α x) = |α| p(x) for all x, y ∈ E, and α ∈ K, respectively. Note that p(x) = 0 for x = 0 is not required in general.
120
M.P.H. Wolff
A set P of seminorms on E is called separating if to every x = 0 in E there exists p ∈ P with p(x) = 0. A separating set P of seminorms defines a uniformity U (see Sect. 3.8) where U is generated by finite intersections of sets of the form U ( p, ε) = {(x, y) ∈ E × E : p(x − y) < ε}. The monad of the uniformity U is μU = {(x, y) ∈ ∗ E × ∗ E : ∗ p(x − y) 0 for all standard p ∈ P}. The monad of the neighborhood filter U(0) of 0 is given by μU (0) = {x ∈ ∗ E : ∗ p(x) 0 for all standard p ∈ P}. This monad is an external vector space over K. The neighborhood filter of an element x ∈ E is given by the monad μU (x) = {y : ∗ p(y − x) 0 for all standard p ∈ P} = x + U(0). Since P is separating the topology TP induced by the uniformity U is Hausdorff. It is called a locally convex topology and the space E equipped with this topology is called locally convex vector space , because a subbase of the neighborhood filter of 0 is given by the convex sets B( p, ε) = {x : p(x) < ε}. Obviously normed spaces are special locally convex spaces. let P be a separating set of seminorms. For each finite set A ∈ P set p A (x) = sup p∈A p(x). Then the set { p A : A ⊂ P finite} is directed upwards and defines the same uniformity, and a fortiori the same topology as the set P. Sometimes it is very useful to use this set of seminorms in place of the original one. For example the sets B( p A , ε) form not only a subbase but even a base of the neighborhood filter of 0. The next proposition is basic and easy to prove with the aid of monads: Proposition 4.3.1 Let T be the topology generated by a set of seminorms as above. Then the mappings + : E × E → E, (x, y) → x + y, · : K × E → E, (α, x) → αx are continuous. Problem 4.3.2 Prove these two assertions. Example 4.3.3 (i) Let (E, · ) be a normed space. It is obviously itself a locally convex space (see above). Now for each ξ ∈ E (E the dual of E, i.e. the space of continuous linear functionals on E) consider the seminorm pξ (x) = | < x, ξ > |. The topology on E induced by this set { pξ : ξ ∈ E } of seminorms is called the weak topology σ(E, E ) on E. Its monad of 0 is given by μσ(E,E ) = {x ∈ ∗ E :< x, ξ > 0 for all standard ξ ∈ E }.
4 Branch Spaces and Linear Operators
121
(ii) Likewise for each x ∈ E one may consider the seminorm px on E , given by px (ξ) = | < x, ξ > |. The topology on E induced by this set { px : x ∈ E} of seminorms is called the weak* topology σ(E , E) on E . Its monad of 0 is given by μσ(E ,E) = {ξ ∈ ∗ E :< x, ξ > 0 for all standard x ∈ E}. (iii) Replacing in (i) E by the dual (E ) = E of E (i.e. the second dual of E) and taking the same set { pξ : ξ ∈ E } of seminorms, each pξ now applied to elements of E we obtain the weak topology σ(E , E ). Identifying E with its isometric image in E σ(E, E ) turns out to be the subspace topology induced by σ(E , E ). (iv) Let E, F be normed spaces. For x ∈ E define the seminorm qx on L(E, F) by qx (T ) = T (x). The set {qx : x ∈ E} defines the strong operator topology, also called the topology of pointwise convergence. The monad of 0 is given by μstop (0) = {T ∈ ∗ L(E, F) : T x 0 for all standard x ∈ E}. For F = K one obtains the weak topology on E. (v) For x ∈ E and ξ in E consider the seminorm on L(E, F) defined by px,ξ (T ) = | < T (x), ξ > |. The set of these seminorms defines the weak operator topology. The monad of 0 is given by μwop (0) = {T ∈ ∗ L(E, F) :< T x, ξ > 0 for all standard x ∈ E, ξ ∈ F }. (vi) Let E, F be vector spaces over K, and let b : E × F be a bilinear form such that to each x ∈ E, x = 0, there exists y ∈ F with b(x, y) = 0 and similarly for each 0 = y ∈ F there exists x ∈ E such that b(x, y) = 0. We call such a bilinear form separating. Then for each y ∈ F p y (x) = |b(x, y)| defines a seminorm on E and the set { p y : y ∈ F} is separating and defines a topology Tb on E. Interchanging the roles of E and F we obtain a corresponding topology Tb on F by the same bilinear form. Examples (i), (ii), and (iii) are special cases of these topologies. The bilinearity of b implies the following equation for every x0 in E, y0 in F, respectively. μb (x0 ) = {x ∈ ∗ E : b(x0 − x, y) 0 for all standard y ∈ F}. μb (y0 ) = {y ∈ ∗ F : b(x, y0 − y) 0 for all standard x ∈ E}.
(4.1) (4.2)
Problem 4.3.4 Prove these equations. Example 4.3.5 In analogy to the construction of the nonstandard hull of normed spaces we define at first the set Fin(E) of finite elements of ∗ E by Fin(E) = {x ∈ ∗ E : p(x) ∈ Fin(R+ ) for all (standard) p ∈ P}. Whenever the set P should be considered explicitly we write FinP (E). The set Fin(E) characterizes the topology in a simple manner, as we shall see below.
122
M.P.H. Wolff
Lemma 4.3.6 (a) x ∈ μ(0) and λ ∈ Fin(K) implies λx ∈ μ(0). (b) x ∈ Fin(E) and λ 0 implies λx ∈ μ(0). (c) For every x ∈ μ(0) there exists an infinitely large N ∈ ∗ N with N x ∈ Fin(E). Proof The first two assertions are simple consequences of the calculation with seminnorms and infinitesimals. (c) We use the set { p A : A ⊂ P finite} (see above). Let x ∈ μ(0) be arbitrary. For each nonvoid finite subset A ⊂ P we define the internal set C(A) := {n ∈ ∗ N : n ≥ |A|, p A (x) ≤ n1 }. It contains the external set N since p A (x) 0 by hypothesis. ∗ The family {C(A) : A ∈ P finite} is directed downwards. Since our model V ( X ) is polysaturated this implies that A C(A) = ∅. Let N be in that intersection. Then N ≈ ∞ and p(x) ≤ N1 for every standard p. Thus p(N x) = N p(x) ≤ 1 for all p ∈ P. Now let P and Q be two sets of seminorms generating the topologies TP , TQ , respectively. In the next theorem we show that Fin(E) is uniquely determined by the topology: Theorem 4.3.7 (C.W. Henson, L.R. Moore [21]) Let P, Q be two sets of seminorms on E. Then the following assertions are equivalent: (i) TP is coarser than TQ , more formally: TP ⊂ TQ . (ii) μQ (0) ⊂ μP (0). (iii) FinQ (E) ⊂ FinP (E). Proof Obviously (i) ⇔ (ii). (ii) ⇒ (iii): Assume that (iii) does not hold. Then there exists x ∈ FinQ and some x x is in μQ (0) but p( p(x) ) = 1, p ∈ P such that p(x) ≈ ∞. By Lemma 4.3.6 (a) p(x) a contradiction. (iii) ⇒ (ii): Let x ∈ μQ (0) be arbitrary. Then by Lemma 4.3.6 (c) there exists N ≈ ∞ such that N x ∈ FinQ (E) ⊂ FinP (E), but then x = 1/N · N x ∈ μP (0) by Lemma 4.3.6 (b). Corollary 4.3.8 TP = TQ if and only if FinP (E) = FinQ (E). Corollary 4.3.9 Let (E, TP ) and (F, TQ ) be two locally convex spaces and T : E → F be a linear mapping. The following assertions are equivalent: (i) T is continuous for the corresponding topologies. (ii) ∗ T (μP (0)) ⊂ μQ (0). (iii) ∗ T (Fin(E)) ⊂ Fin(F). Problem 4.3.10 Prove the corollary. Hint: Adapt the proof of the theorem and use Sect. 3.2. Corollary 4.3.11 Let E be a normed linear space. Then Fin· (E) = Finσ(E,E ) (E) holds if and only if dim(E) < ∞
4 Branch Spaces and Linear Operators
123
Proof If dim(E) ∈ N then the asserted implication is obvious. Now assume that Fin· (E) = Finσ(E,E ) (E) holds. Then μσ(E,E ) (0) = μ· (0) by the theorem. Let B be the unit Ball with respect to the norm. Then μσ(E,E ) (0) ⊂ ∗ B. Since the former set is external there exists a standard finite set A = {ξ1 , . . . , ξm } and a standard ε > 0 such that B A,ε := {x ∈ E : p A (x) < ε} satisfies ∗ B A,ε ⊂ ∗ B. This implies B A,ε ⊂ B. If dim(E) = ∞ there exists a standard x = 0 in {y ∈ E :< y, ξ j >= 0, j = 1, . . . , m} hence p A (x) = 0 = p A (N x) for every N ≈ ∞. Thus N x ∈ ∗ B A,ε which in turn implies N x ≤ 1 which is impossible. Corollary 4.3.12 Let E, F be normed spaces. Every norm-continuous linear map T : E → F is weakly continuous, that means continuous for the topologies σ(E, E ) and σ(F, F ). Problem 4.3.13 Prove this corollary. Hint: See Sect. 3.2 and use the adjoint mapping. As before we consider a separating set P of seminorms and the topology defined by P. μ(0) is a subspace of the K -vector space Fin(E), and the associated congruence relation is nothing else than the equivalence relation defined by the uniformity generated by P, namely x ≡μ(0) y iff x − y ∈ μ(0), iff p(x − y) 0 for all p ∈ P. = Fin(E)/μ(0) is called the nonstandard hull of E. Every The quotient space E by p( seminorm p gives rise to a seminorm pˆ on E ˆ x) ˆ = ◦ p(x) (the standard part of p(x) which is well defined by what is discussed above). The set of seminorms gained in this way is obviously again separating. x → ∗x is a topological embedding. In fact p( ˆ ∗x) = p(x) The mapping E → E, holds for all p ∈ P. So we can E consider as a subspace of E. is complete (see [21]). Since we use polysaturated models only in our setting E Let (E, F) be a pair of vector spaces equipped with a separating bilinear form b (see Example 4.3.3, vi). Since b is separating, we can E identify with a subspace of the algebraic dual F ∗ of F: Let x ∈ E be arbitrary. ϕx : F → K is given by ϕx (ξ) = b(x, ξ). The mapping x → ϕx is the embedding. Now we can write < x, ξ > in place of b(x, ξ), and the uniformity defined by b will be denoted by σ(E, F) in accordance with the special case where E is a normed space and F = E is its dual (see Example 4.3.3, (i) and (ii)). Let H be a finite dimensional subspace of F. Then the mapping E x → φx : H → K, φx (ξ) =< x, ξ > is surjective onto the algebraic dual H ∗ of H . This simple fact is the key to the proof of the following theorem: onto the algebraic Theorem 4.3.14 There exists a topological isomorphism T of E dual F ∗ of F, equipped with the topology σ(F ∗ , F), the restriction T | E of which to E is the canonical embedding of E into F ∗ .
124
M.P.H. Wolff
Proof Let S : Fin(E) → F ∗ given by S(x)(ξ) = ◦ < x, ξ > (the standard part of < x, ξ >). S is K -linear with kernel μσ(E,F) (0). The induced mapping is our desired T . We only have to prove that it is suron Fin(E)/μσ(E,F) (0) = E jective. By Theorem 4.2.20 there exists a hyperfinite dimensional subspace H of ∗ F containing F externally. Let φ ∈ F ∗ be arbitrary (standard). Then the Transfer Principle applied to the considerations in the preceding paragraph yields that there exists x ∈ ∗ E such that ∗ φ| H = φx | H . Since F ⊂ H , it follows that x ∈ Fin(E) as well as Sx = φ = T (x). ˆ The remainder is obvious.
Theorem of Banach-Alaoglu Let E denote a normed space, and E its dual space of all continuous linear functionals on E equipped with the dual norm (cf. p. 121). In the following we denote Fin· (E) simply by Fin(E). The proof of the next theorem of Banach and Alaoglu is a typical example of applications of nonstandard analysis to functional analysis: Theorem 4.3.15 (Banach–Alaoglu) Let E be a (standard) normed space. Then the unit ball of the dual space E is compact for the weak topology. Proof Let B be the unit ball of E , and let ξ ∈ ∗ B be arbitrary. ξ ≤ 1 implies that x, ξ is nearstandard for all x ∈ Fin(∗ E), and in particular for all x standard. Then η, defined by x, η = ◦ (x, ξ) for all standard x, is a standard linear form, bounded by 1 on the unit ball of E. But by definition if x ∈ E is standard then x, η x, ξ , so ξ η with respect to the weak topology (see Example 4.3.3 (ii) and the second of the Eq. 4.1), and the theorem follows from Theorem 3.5.1.
4.3.2 General Banach Spaces Due to lack of space we only can give very few results. We refer the interested reader to [23], which in our opinion is the best reference to nonstandard analysis and Banach spaces (operator theory is not treated there).
On the Structure of the Nonstandard Hull Let E be a standard Banach space over K . In order to prove the main theorem on the we recall the fundamental standard theorem of local reflexivity due structure of E to J. Lindenstrauss and H.P. Rosenthal [35]. A relatively simple proof may be found in [38]. Theorem 4.3.16 (Principle of Local Reflexivity) Let E be a Banach space. For each finite-dimensional subspace F of E , each finite-dimensional subspace G of E , and each ε > 0 there exists an injective linear map T : F → E such that
4 Branch Spaces and Linear Operators
125
(i) T ≤ 1 + ε and T −1 ≤ 1 + ε. (ii) T x = x if x ∈ F ∩ E. (iii) < T x, ξ >=< x, ξ > for all x ∈ F and ξ ∈ G. We derive the following basic proposition from this principle: the nonstandard hull of the Banach space E . Then there Proposition 4.3.17 Let E satisfying exists an isometric embedding R from E into E, (see (i) Rx = x for all x ∈ E, where E is identified with its canonical image in E p. xxx). (ii) < Rx, ξ >=< x, ξ > for all x ∈ E , and ξ ∈ E , identified with its canonical . image in E Proof By Theorem 4.2.20 there exist hyperfinite dimensional subspaces F ⊂ ∗ E , and G ⊂ ∗ E , containing (externally) the spaces E , E , respectively. Let ε > 0 be infinitesimal. Applying the Transfer Principle to the Principle of Local Reflexivity we obtain an internal linear mapping T : F → G satisfying (i)—(iii) of this principle. We obtain in particular T x = x for all x ∈ E, moreover < T x, ξ >=< x, ξ > for all x ∈ E and ξ ∈ E . Finally T , T −1 1. Thus the mapping Tˆ : Fˆ → Eˆ is an isometry, and R := Tˆ | E has the desired properties. Remark 4.3.18 Note that the mapping R is not unique, except E being reflexive. The weak topology σ(E, E ) of a Banach space E is coarser than the norm topology. By Theorem 4.3.7 we have μ· (0) ⊂ μσ(E,E ) (0) and Fin· (E) ⊂ Finσ(E,E ) (E). We set μσ,b (0) = μσ(E,E ) (0) ∩ Fin· (E) and Eˇ = μ σ,b (0) = μσ,b (0)/μ· (0). Note that if x ∈ Fin(E) and ξ ∈ Fin(E ) then < x, ξ > ∈ Fin(K ). This fact As before we identify E, allows us to prove the following structure theorem on E. , respectively. Moreover we use the embedding E E , with its canonical image in E, of Proposition 4.3.17. R from E into E ˇ i.e. is the topological direct sum of R(E ) and E, Theorem 4.3.19 E ˇ = R(E ) ⊕ E. E
(4.3)
Proof Let S : Fin(E) → E be given by S(x)(ξ) = ◦ < x, ξ > (◦ denotes the standard part) for every ξ ∈ E . Obviously S(x) = S(y) holds whenever x −y 0. → E be the induced mapping. It is linear and contractive. Since Q ◦ R Let Q : E is the identity of E , Q is surjective. Since ker(S) = μσ,b (0), the kernel of Q is ˇ So R Q is a projection onto R(E ) and (I − R Q) maps E onto nothing else than E. ˇ ker(R Q) = E. of a As a first application we obtain a decomposition of the nonstandard hull T standard operator T . For different Banach spaces E and F we denote the linear maps of the theorem above by R E , Q E , and so on.
126
M.P.H. Wolff
Proposition 4.3.20 Let T be a bounded linear mapping from the Banach space E has the decomposition T = R F T Q E ⊕ Tˇ where T into another one F. Then T ˇ in matrix denotes the biadjoint of T , mapping E into F , and Tˇ maps Eˇ into F; form: R F T Q E 0 . T = 0 Tˇ If in particular E is reflexive, then = T
T 0 0 Tˇ
.
Proof T is weakly continuous by Corollary 4.3.12, hence ∗ T (μσ(E,E ) (0)) ⊂ μσ(F,F ) (0). Moreover ∗ T (Fin(E)) ⊂ Fin(F), since T is norm-continuous. Both ˇ ⊂ F. ˇ ( E) facts together imply T Now let xˆ = R E (x ) ∈ R( E ) and η ∈ F be arbitrary. It is easy to prove that < T x, ˆ η >=< x, ˆ T η > holds since η is standard. So we obtain (R E (x )), η > = < R E (x ), T (η) >
= < T (x ), η > = < R F (T (x )), η > . R E , in particular T R E (E ) ⊂ R F (F ), hence T R E Q E ( E) This implies R F T = T ⊂ R F Q F ( F). If E is reflexive then R E , R F , respectively are nothing else than the canonical embeddings into the corresponding nonstandard hulls.
Basic Geometry of Banach Spaces Let us start by recalling some notions of the structure theory of Banach spaces: The Banach space F is λ–embeddable into the Banach space E if there exists an isomorphism T of F into E such that x ≤ T x ≤ λx for all x ∈ F. The space F is finitely λ–representable in E if to every ε > 0 and to each finitedimensional subspace G of F there exists an isomorphism T : G → E such that (1 − ε) x ≤ T x ≤ (λ + ε)x for all x ∈ G. If λ = 1 then we say also that F is representable (finitely representable, resp.) in E in place of 1–embeddable (finitely 1–representable, resp.). Our first theorem illuminates this latter notion. Theorem 4.3.21 ([23], Theorem 3.2) Let E be an internal Banach space and let F iff F be a standard Banach space in V (X ). Then F is finitely λ –representable in E is λ–embeddable in E.
4 Branch Spaces and Linear Operators
127
Let G be the collection of Proof (I) Assume that F is finitely λ–representable in E. finite-dimensional subspaces G in F and set εG = 1/ dim(G). To each G ∈ G satisfying (1 − εG /2)x ≤ VG (x) ≤ there exists a linear map VG : G → E (λ + εG /2)x. Let {y1 , · · · , yn } be a base of G of vectors of norm 1, and pick x j ∈ E such that xj = VG (y j ) as well as (1 − εG ) ≤ x j ≤ (λ + εG ). By WG (y j ) = x j and internal linear extension there is defined an internal linear G = VG . mapping WG from ∗ G into E satisfying W Hence, (1 − εG ) ≤ WG (x) ≤ (λ + εG ) holds for all x of norm 1.
(4.4)
(II) Let be the binary relation on ∗ G defined by (G, H ) ∈ iff G ⊂ H and there exists an internal linear mapping W : H → E satisfying inequality (4.4). The relation is concurrent on G and ∗ V (X ) is polysaturated. Hence there exists an H ∈ ∗ G and W : H → E satisfying the relation (4.4) (see Sect. 2.9) such that every finite-dimensional standard G is contained in H , in particular F is an | F is the desired λ–embedding. (Notice external subspace of H . Now T = W ∗ that dim (H ) ∈ N \N , hence ε H 0.) Our next theorem shows a close connection between the standard Banach space ˆ E and its nonstandard hull E. Theorem 4.3.22 Let E be a (standard) Banach space of the full superstructure is finitely 1–represented in E. V (X ). Then E Remark 4.3.23 Notice that in general Eˆ ∈ / V (X ). of dimension n. Then by a standard result there Proof Let F be a subspace of E exists an isometry J from F onto C n =: V , equipped with an appropriate norm. Notice that V is in V (X ), as follows immediately from our hypothesis that C is contained in V (X ). Let {e1 , . . . , en } be a basis of normalized vectors of V and set yk = J −1 (ek ). Then there exist x1 , . . . , xn in ∗ E with yk = xk for each k. By Lemma 4.2.5, B := {x1 , . . . , xn } is internally linearly independent. Denote by H the ∗ -linear ∗ E. Then by the Transfer Principle the linear map W , defined hull of {x1 , . . . , xn } in n ∗ by W ( 1 αk ek ) = n1 αk xk is internal. Moreover it satisfies W (x) x for = J −1 is an isometry. Let ε > 0 be an arbitrary standard all finite x ∈ V since W real number. Then taking S = W we see that ∃S ∀x ∈ ∗ V [x = 1 ⇒ (1 − ε) ≤ S(x) ≤ (1 + ε)] holds in ∗ V (X ) and hence by the Downward Transfer Principle also in V (X ). For such an S the map T := S ◦ J is the desired one. This theorem enables us to give an easy characterization of λ–finitely embeddability.
128
M.P.H. Wolff
Proposition 4.3.24 Let E, F be standard Banach spaces of our full superstructure V (X ). The following assertions are equivalent: (i) F is finitely λ–representable in E. (ii) F is finitely λ–representable in E. (iii) F is λ–embeddable into E. Proof (i) =⇒ (ii): Obvious by Remark 4.2.2 infra. (ii) =⇒ (iii) follows from Theorem 4.3.21 applied to the particular internal space ∗ E. (iii) =⇒ (i) follow from Theorem 4.3.22 above. Recall that a Banach space E is called superreflexive if F finitely representable in E implies F is reflexive. In order to apply our results within this context we have to recall results of R.C. James on reflexive Banach spaces. The equivalent assertion (iv) below goes back to Dunford and Schwartz [13]. Theorem 4.3.25 (R.C. James [29]) For a Banach space E the following assertions are equivalent: (i) (ii) (iii)
E is reflexive. Every separable subspace of E is reflexive. For linear functional y in E there exists x in E such that x = 1 and every x, y = y . (iv) There exists a closed subspace F such that F as well as E/F are reflexive.
Eˆ is in general not an element of our superstructure V (X ). However, in order to ˆ We apply the preceding theorem to Eˆ we have to consider separable subspaces of E. therefore prove first the following technical but helpful lemma. Lemma 4.3.26 Let V (X ) be the full superstructure over an appropriate infinite set X containing C. Then to every separable Banach space E there exists an isometric copy H , say, in V (X ). Notice that with a little more effort this assertion can be generalized to Banach spaces of density character κ strictly less than the cardinality of V (X ). Proof Because E is separable there exists a countable subset {xn : n ∈ N} that is dense in the unit ball of E. By the Hahn–Banach–Theorem this set separates points on E . Because the dual unit ball B of E is weak∗ compact by Theorem 4.3.15 it follows that : B η → (η) = (xn , η)n∈N ∈ [0, 1]N is a homeomorphism onto a closed subset Y of [0, 1]N . This latter space is in V (X ) because C ∈ V (X ). Hence Y ∈ V (X ). But then the Banach space C(Y ) of all
4 Branch Spaces and Linear Operators
129
continuous functions on Y as well as all subspaces of it are elements of V (X ). The map T : E x → T x =< x, −1 (·) > is a linear isometric embedding of E onto a subspace H of C(Y ). Now we are able to characterize superreflexivity. Theorem 4.3.27 For a Banach space E the following assertions are equivalent: (i) (ii) (iii) (iv)
E is superreflexive. is reflexive. E E is superreflexive. (E ) = ( E)
By Theorem 4.3.25 we may Proof (i) ⇒ (iii): Let F be finitely representable in E. assume without loss of generality that F is separable, hence in our superstructure V (X ) by Lemma 4.3.26. Then F is finitely representable in E as follows from 4.3.22, so F is reflexive. (iii) ⇒ (ii): Obvious. (ii) ⇒ (i): If F is finitely representable in E and separable then F is embeddable into by Proposition 4.3.24, since we can assume that F is in V (X ) by Lemma 4.3.26. E But then F is reflexive and (i) follows. is a proper closed subspace of ( E) . Since E is reflexive (ii) ⇒ (iv): Suppose that E . But by the Hahn– there exists x ∈ E, x = 1 and x , y = 0 for all y ∈ E Banach Theorem and the Principle applied to some x , there exists a Transfer x ∈ x, z , a contradiction. z ∈ Fin(E ) satisfying x, z = 1 and z 1, hence 1 = (iv) ⇒ (ii): Let Then x = y for some y ∈ ∗ (E ). But x ∈ ( E) = E be arbitrary. ∗ y = sup{| x, y | : x = 1, x ∈ E}. Hence by the Transfer Principle applied to 0 < ε 0, there exists an x ∈ ∗ E with x = 1 such that y − ε < | x, y | ≤ y . This gives x , x | ≤ x , x = ◦ (y ) = ◦ (| x, y |) = | which means Theorem 4.3.25 (iii) is satisfied and the assertion follows. Using assertion (iv) of this result we obtain the easy proof of Rakov (see [19]) of the following theorem of Enflo, Lindenstrauss, and Pisier. Corollary 4.3.28 Let E be a Banach space and let F be a closed subspace of E. If F and the quotient space E/F are superreflexive then E is superreflexive. Proof Since the quotient mapping Q : E → E/F is open we get easily E/F ∼ = E/ F. So by the Theorem, F as well as E/ F are reflexive. But by Theorem 4.3.25 is reflexive, and the theorem gives the desired result. (iv) E
130
M.P.H. Wolff
4.3.3 Banach Lattices There is a theory of finitely λ–representable Banach lattices similar to the theory for Banach spaces sketched so far. Let us denote by c0 the space of sequences x = (xn )n∈N ⊂ C with lim xn = 0, x = sup(|xn |) (see Example 4.2.17 (i)). With |x| = (|xn |)n∈N it becomes a Banach lattice. Its dual space can be identified with l 1 (N) = l 1 = {x ∈ CN : |xk | =: x1 < ∞}, which is also a Banach lattice under the canonical order. are lattice homomorphisms Obviously the nonstandard extension ∗ T as well as T whenever T has this property. In full correspondence with Sect. 4.3.2, we define the Banach lattice F to be finitely λ–lattice representable in the Banach lattice E, if for each finite–dimensional vector sublattice G ⊂ F and each ε > 0 there exists a lattice isomorphism T from G into F with (1 − ε)x ≤ T x ≤ (λ + ε)x for all x ∈ G. We give one example of the usefulness of this concept. Theorem 4.3.29 Let E be a Banach lattice. Then the following assertions are equivalent: (i) E is supereflexive. (ii) Neither c0 nor l 1 are finitely 1–lattice representable in E. (iii) Neither c0 nor l 1 are 1–lattice embeddable in E. Proof First of all one may prove (ii) ⇔ (iii) generalizing Proposition 4.3.24 to the case of finitely lattice representable spaces. Then a famous result of Meyer–Nieberg (and others, see [57], Theorem II.5.11) says that the Banach lattice F is reflexive iff neither c0 nor l 1 are lattice embeddable in F. Theorem 4.3.27 now yields the result. Examples are spaces L p (1 < p < ∞) and many other Banach function spaces. In order to analyze the structure of the nonstandard hull of an internal Banach lattice we need one more notion: The norm on the Banach lattice E is order continuous if lim xα = 0 holds for every downwards directed net (xα ) of positive elements with inf xα = 0. The following theorem due to Henson and Moore (see [23], Theorem 4.9) characterizes those nonstandard hulls having order continuous norm. In order to prove it we need the following technical lemma: Lemma 4.3.30 let E be a Banach lattice and let ( x )n∈N be a normalized sequence satisfying inf( of positive elements in E xm , xn ) = 0 for m = n. Then there exists an internal hyperfinite sequence (yn )n≤N of normalized positive vectors yn satisfying inf(ym , yn ) = 0 for m = n as well as yn = xn for all (standard) n. Proof By induction on n we select a sequence (yn ) ⊂ ∗ E such that 0 < yn , yn = yn = xn : 1, inf(yn , ym ) = 0 and For n = 1 choose x1 ∈ x1 .Then x1 = |x1 | 1 Since x1 > 0 x1 |x1 | holds. We set y1 = |x1 |/|x1 |.
4 Branch Spaces and Linear Operators
131
Suppose, {y1 , · · · , yn } are already selected as desired. Let xn+1 in xn+1 be arbi− z= xn+1 since xn+1 0 (cf. the argument for trary. Set z = |xn+1 |/xn+1 . Then y1 ) and z = 1. Let u = z − inf(z, nk=1 yk ). Then u = z − inf z,
n k=1
xk
xn+1 ), = xn+1 − inf
n
xk
= xn+1 ,
k=1
whence inf(z, nk=1 yk ) 0. Obviously inf(u, yk ) = 0, so yn+1 = u/u is the desired element. Since V (X ) is polysaturated it is comprehensive, hence the sequence (yn )n∈N possesses an internal extension (yn )n∈∗ N where yn = 1 and 0 ≤ yn (see Theorem 2.9.5). The internal set A := {n : inf(yk , y ) = 0 for k, ≤ n} contains N and hence an infinite integer N . Corollary 4.3.31 Let E be a Banach lattice. Then c0 is finitely 1-lattice representable in E if and only if l ∞ (N) is 1-lattice representable in the nonstandard hull E. Proof In analogy to Proposition 4.3.24 we obtain that c0 is finitely 1-lattice repre Let en = (δk,n )k∈N be the nth sentable in E iff c0 is 1-lattice representable in E. basis vector in c0 (δk,n denotes the Kronecker symbol) and let T denote the isomet Then we apply the lemma to the sequence (T en )n∈N ric lattice isomorphism into E. and obtain the internal sequence (yn )n≤N . Let f ∈ l ∞ (N) be arbitrary. Then set N ∗ f (n)yn . The mapping f → f˜ is the desired embedding, cf. Example f˜ = k=1 4.2.17, 1. The main and surprising assertion in the following theorem is that whenever the nonstandard hull is Dedekind complete its norm is order continuous, a conclusion which is far from being true in general. be the nonstandard hull Theorem 4.3.32 (C.W. Henson–L.R. Moore, [23]) Let E of an internal Banach lattice E. The following assertions are equivalent: has order-continuous norm. (i) E is Dedekind-complete, i.e., every order-bounded subset has a least upper (ii) E bound. (iii) c0 is not 1-lattice representable in E. (iv) c0 is not 1-finitely lattice representable in E. (v) l ∞ (N) is not 1-lattice representable in E. Proof By the preceding corollary and the analog of Proposition 4.3.24 (iii), (iv), and (v) are equivalent. Moreover (v) is equivalent to (i) by [40], Corollary 2.4.3. The implication (i) ⇒ (ii) is also well known (see e.g. [40], Theorem 2.4.2). So it remains to prove (ii) ⇒ (iii):
132
M.P.H. Wolff
and let T : c0 → E be (ii) ⇒ (iii): Assume that c0 is 1-lattice representable in E such an embedding. Let en = (δk,n ) be as in the proof of the preceding corollary. It was shown there that there exists a hyperfinite sequence (yn )n≤N as described in N that lemma, in particular yn = T (en ) holds. Then p = k=1 yk has norm 1. p . Assume The set B = {T (en ) : n ∈ N } is bounded above by that sup(B) exists. s = sup(B). Since s ≥ nk=1 T (ek ), we have Then there is an s > 0 in∗ E with (s − nk=1 yk )− 0, hence n(s − nk=1 yk )− ≤ 1 for all standard n. Therefore by the Spillover Principle, Theorem 2.8.11 (i) there exists N2 ≤ N infinitely large N2 N2 such that N2 (s − k=1 yk )− ≤ 1, hence (s − k=1 yk )− 0. This implies s ≥ N2 n ∧ ( k=1 yk ) ≥ k=1 T (ek ) for all standard n. But then (s − y N2 )∧ ≥ nk=1 T (ek ) s − y N2 < s, a contradiction. and (s − y N2 )∧ = One consequence of this theorem is that ∞ (N)∧ is no longer a dual Banach lattice since such Banach lattices are Dedekind-complete, and because as an AM– space with unit ∞ (N)∧ is isometrically and lattice isomorphic to some space C(K ) of all continuous functions on a compact space K (see Theorem 4.2.35), its norm is not order continuous. The space ∞ (N) is a von Neumann algebra, which means it is isomorphic to a weakly closed subalgebra of the algebra L(H ) of all bounded operators on an appropriate Hilbert space H . In fact ∞ (N) can be identified with the algebra of multiplication by functions on 2 (N), and as such ∞ (N) is weakly closed. A von Neumann algebra A is Dedekind-complete with respect to the order S ≤ T if S and T are selfadjoint and (T − S) = U ∗ U for an appropriate U ∈ A. A reasoning similar to that one in the proof of (ii) ⇒ (iii) in the above theorem shows that the nonstandard hull of a von Neumann algebra is never a von Neumann algebra. A C ∗ –algebra A is a von Neumann algebra iff it is a dual space. By a famous theorem of Sakai ([56], Corollary 1.13.3) the predual is unique up to isometric isomorphisms. A surprising and very important result of Groh [18] says that the nonstandard hull of the predual of a von Neumann algebra is again the predual of another von Neumann algebra. This result is still relevant, see [3]. Given some knowledge of the theory of von Neumann algebras, the proof is astonishingly not too hard. Theorem 4.3.33 Let M be a von Neumann algebra with predual M∗ . Then the ∗ is again the predual of a von Neumann algebra. nonstandard hull M Proof The predual M∗ is an M–module by the following left and right actions of M on M∗ : For all x ∈ M as well as for all ϕ ∈ M∗ the linear forms y → ϕ(x y) and y → ϕ(yx) are in M∗ . By the Transfer Principle for all x ∈ ∗ M as well as for all ϕ ∈ ∗ M∗ the linear forms y → ϕ(x y) and y → ϕ(yx) are in ∗ M∗ . But this in turn as well as for all ϕˆ ∈ implies that for all xˆ ∈ M M∗ the linear forms yˆ → ϕ (xˆ yˆ ) M∗ can obviously be identified with a closed and yˆ → ϕ ( yˆ x) ˆ are in M∗ . Moreover Since M is σ(M , M ) dense in M the (the dual space of M). subspace of M proposition follows by [64], Theorem III.2.7.
4 Branch Spaces and Linear Operators
133
4.3.4 Notes All results of Sect. 3.1 are taken from [21] and almost all results of Sect. 3.2 are taken from [23]. These two papers are fundamental and pioneering contributions to nonstandard functional analysis. The proof of the theorem of Banach–Alaoglu is due to W.A.J. Luxemburg [36]. Corresponding results within the framework of ultraproducts may be found in [19] where one also will find further references to the history of the results. C.W. Henson (see [23]) has developed a special logical language that allows one to express problems concerning relations which can be approximately satisfied. Together with nonstandard analysis this gives new and deep insight into various properties of special Banach spaces and their nonstandard hulls. We recommend in particular the papers [20, 23]. Recent development is reported in [24]. Another aspect—the combination of Loeb measure theory with functional analysis—also gives interesting new results. For example, a Banach space in which a ball is contained in the range of some countably additive measure is superreflexive (see [62]). A third aspect is the use of nonstandard analysis for infinite constructions, e.g., infinite tensor products of C -algebras. This interesting field has applications in quantum physics, see [27, 30, 72, 75].
4.4 Elementary Theory of Linear Operators 4.4.1 Compact Operators Let E be a standard Banach space (always in V (X )). By Theorem 3.5.1 a subset K of E is relatively compact iff ∗ K ⊂ E + μ(0) = ns(∗ E) where ns(∗ E) denotes the set of all nearstandard points in ∗ E. We have already considered internal S–continuous operators in Proposition 4.2.22 (operator always means linear mapping). The simplest ones are those of standard finite rank. Let T be such an operator. Then dim(T (E)) = n standard, and its nonstandard n hull (see 4.2.23) is of the same rank as T (notice that T is of the form T = k=1 ϕk ⊗ x k ). Another class of simple operators is the class of compact operators. Let E, F be standard Banach spaces. The linear operator T from E to F is compact iff T maps bounded sets onto norm relatively compact sets. We denote the unit ball of a Banach space by B(0, 1). Proposition 4.4.1 Let E, F be (standard) Banach spaces, and let T : E → F be a linear operator. The following assertions are equivalent: (i) T is compact. (ii) ∗ T (∗ B(0, 1)) ⊂ ns(∗ F) (the set of nearstandard points).
134
M.P.H. Wolff
maps E into F (identified with a subset of F). (iii) T is compact. (iv) T Proof (i) ⇔ (ii) ⇔ (iii) as well as (iv) ⇒ (i) are obvious (use Theorem 3.5.1). (i) ⇒ (iv): Let ε > 0 be given. Then there exists a finite set M with d(T (B(0, 1), M) < ε . By transfer (remember ∗ M = M since M is finite) we obtain ∗ d(∗ T ∗ (B(0, 1), B(0, ( M) < ε, hence d(T 1), M) < ε. Since ε > 0 was arbitrary, T B(0, 1)) is precompact. Concerning uniform convergence we have the following useful lemma: Lemma 4.4.2 Let E, F be standard Banach spaces, and let (Tα )α∈A be a net of bounded linear operators from E to F. Moreover, let T also be a bounded operator from E to F. The following assertions are equivalent. (i) (Tα ) → T uniformly. (ii) For all x ∈ Fin(∗ E) and all infinitely large α ∈ ∗ A\A we have Tα x x. uniformly. α ) → T (iii) (T Proof (i) ⇒ (iii) follows directly from Corollary 4.2.26. (i) ⇒ (ii): By transfer Tα x − ∗ T x < ε for all normalized x, all α infinitely large and each standard ε > 0. So (ii) follows. (ii) ⇒ (i): Let ε > 0 standard be fixed, and choose α0 infinitely large. Then this α0 satisfies the formula ∃α ∀β ∀x [x = 1 and β ≥ α implies Tβ x − T x < ε]. The Downward Transfer Principle yields the assertion. Recall that the Banach space E has the bounded approximation property if the identity I is the limit in the strong operator topology of a norm bounded net of operators of finite rank. Almost all classical Banach spaces have this property, but the algebra L(H) of all bounded operators on a separable Hilbert space H fails to have this property (a result due to Szankowski). Proposition 4.4.3 Let E, F be arbitrary Banach spaces. (i) The uniform limit T of a sequence (Tn ) of compact operators is compact. (ii) If F possesses the bounded approximation property then every compact operator T is the uniform limit of operators of finite rank. n map E into F and (T n ) converges uniformly to T by Lemma 4.4.2. Proof (i) All T ( E) ⊂ F, and T is compact by Proposition 4.4.1. So T (ii) Let (Pα )α∈A be a norm bounded net of operators of finite rank converging strongly to the identity I on F. Then Tα = Pα T is of finite rank for all α ∈ A. If α ∈ ∗ A\A is infinitely large and x ∈ Fin(∗ E) then T x is nearstandard, and
4 Branch Spaces and Linear Operators
135
if y = ◦ T x is its standard part then Pα y y. But since Pα is S–bounded by hypothesis we obtain Pα T x Pα y y T x. Since x ∈ Fin(∗ E) and α infinitely large are arbitrary, the assertion follows by Lemma 4.4.2. Now we give an easy proof of the theorem of Schauder asserting that the adjoint or dual operator of a compact operator is also compact. Theorem 4.4.4 (Schauder) The operator T from E to F is compact iff its adjoint T is compact. Proof Let T be compact, and let ϕ ∈ ∗ (F ) satisfy ϕ ≤ 1. Then by Theorem 4.3.15 its standard part ψ, say, with respect to the weak∗ –topology exists, is given by < x, ψ >< x, ϕ > for all standard x, and satisfies ψ ≤ 1. We show that ∗ T (ϕ) is norm-nearstandard to ∗ (T ψ): Let y ∈ Fin(∗ E) be arbitrary. Then < y, ∗ (T ∗ T y, ψ > ∗T y
< ∗ T y, ϕ >=< y, ∗ (T )ϕ > .
nearstandard
Proposition 4.2.3 (iv) yields the assertion. Conversely if T is compact so is T by the first part of our proof. But T = T | E , where E is identified with its canonical image in E .
4.4.2 Fredholm Operators We adhere to the notions introduced in the preceding section. Let E, F be standard Banach spaces and denote the space of compact operators where from E to F by K(E, F). For T ∈ L(E, F) consider the operator Q F ◦ T ˜ ) Q F denotes the quotient mapping from F onto F = F/F. Its kernel ker(Q F ◦ T contains E, hence there is a unique operator T : E → F. The mapping T → T is = 0 iff T ∈ K(E, F). Therefore, easily seen to be linear and continuous. Moreover T we obtain the following proposition. induces an isometric linear representation of Proposition 4.4.5 The map T → T F). If in addition E = F then this the Calkin space L(E, F)/K(E, F) into L( E, ˜ representation is multiplicative from the Calkin algebra L(E)/K(E) into L( E). An operator T ∈ L(E, F) is called a Fredholm operator if dim(ker T ) + dim(F/T (F)) < ∞. A well-known standard argument shows that a Fredholm operator T has closed range. More precisely E = ker(T ) ⊕ E 1 and F = T (F) ⊕ F2 are topological decompositions and T1 = T | E 1 is bijective from E 1 onto T (F). Set
136
M.P.H. Wolff
S : F → E by S(y) = T1−1 (y) for y ∈ T (F) and S(z) = 0 for z ∈ F2 . Then S satisfies T ST = T . This in turn gives the following version of Atkinson’s Theorem: Theorem 4.4.6 Let T be an element of L(E, F). The following assertions are equivalent: (i) (ii) (iii)
T is Fredholm. is bijective. T T + K(E, F) is invertible in L(E, F)/K(E, F).
Proof We use the notations of the paragraph preceding the theorem. = N and F 2 = F2 (see (i) ⇒ (ii): ker(T ) = N and F2 are finite dimensional, hence N = T ( F) ⊕ F2 , = N ⊕ Corollary 4.2.10). This gives the decomposition E E 1 and F = T1 is invertible with inverse S where S is as in the paragraph preceding the so T theorem. is not injective. (ii) ⇒ (i): If dim(ker(T )) = ∞ then ker T \E = ∅, hence T Moreover T (E) is closed by Proposition 4.2.30. Finally, assume that dim(F/T (E)) = ∞. By Theorem 4.2.20 there exists a hyperfinite dimensional subspace H of ∗ F containing F as an external subspace. Set G = T (E). By transfer the ∗ –linear hull L of H and ∗ G is internally closed and L/∗ G is hyperfinite dimensional, in particular L = ∗ F. Thus by Proposition 4.2.6 is bijective there exists y ∈ ∗ F\L of norm 1 satisfying d(y, L) ≥ 1/2). Since T ∗ ∗ g+z = y F/F = G/F. So there exists g ∈ G, g finite, and z ∈ F such that hence y g + z ∈ L, a contradiction to d(y, L) ≥ 1/2. (i) ⇒ (iii): Let S be as in the paragraph preceding the theorem. (i) implies (ii), is invertible in L( E, F). But then S˜ = T −1 follows, hence (iii) holds. i.e. T (iii) ⇒ (ii) follows from Proposition 4.4.5. The index of a Fredholm operator T is defined to be ind(T ) = dim(ker(T )) − dim(F/T (F)) = dim(ker(T )) − dim(ker(T )). Obviously ind(I ) = ind(I + A) = 0 for all A of finite rank. Moreover ind(T ) = 0 for all bijective T . The famous multiplication formula ind(ST ) = ind(S) + ind(T ) can be proved completely by methods from pure linear algebra (see e.g. [42], Theorem 1.4.8). The following theorem is an immediate consequence of Theorem 4.4.6. Theorem 4.4.7 The set F(E, F) of all Fredholm operators from E to F is open in L(E, F), and the index is continuous on F(E, F). Proof By Theorem 4.4.6, F(E, F) is the inverse image (with respect to the quotient mapping) of the set of invertible elements in L(E, F)/K(E, F), which is known to be open. Let now T have index 0. Then E = ker(T )⊕E 1 , F = T (E)⊕F2 as previously introduced. Also, ind(T ) = 0 implies dim(ker(T )) = dim(F2 ) , hence there exists a (necessarily continuous) linear bijection V¯ : ker(T ) → F2 . Set V (x) = V¯ (P x) where P is the projection onto ker(T ) with kernel E 1 . Then T + V is invertible. Now
4 Branch Spaces and Linear Operators
137
let R ∈ ∗ F(E, F) satisfy R ∗ T . Thus, R + ∗ V ∗ (T + V ), whence R + ∗ V is invertible. Now R = (R + ∗ V )−1 (I − (R + ∗ V )−1∗ V ), hence ∗
ind(R) = ∗ ind((R+ ∗ V )(I −(R+ ∗ V )−1∗ V )) = 0+ ∗ ind(I −(R+ ∗ V )−1∗ V ) = 0,
since (R + ∗ V )−1∗ V is of finite rank. Therefore, ind−1 ({0}) is open in F(E, F). Since ind is a homomorphism into Z, the assertion follows. Corollary 4.4.8 If K is a compact operator and z = 0 in K then ind(zT − K ) = ind(T ) holds. Proof The index is constant on the path t → zT − t K (0 ≤ t ≤ 1).
4.4.3 Notes The nonstandard analysis of compact operators was initiated by A. Robinson and A.R. Bernstein [6] who solved the invariant subspace problem for polynomially compact operators. The easy proof of Theorem 4.4.4 is taken from [54], cf. also [37]. As was pointed out above the first treatment of the theory of Fredholm operators by means of Fréchet products was given by Sadovskii [55]. Section 4.4.2 is to some extent within the spirit of that paper. These ideas came up again, apparently independent of [55], in [7, 9]. For the nonstandard analysis of semi-Fredholm operators see [63, 76].
4.5 Spectral Theory of Operators 4.5.1 Basic Definitions and Facts Let E be a Banach space and let T be a bounded linear operator on E. The resolvent set is ρ(T ) = {z ∈ C : (z − T ) is bijective} , and on ρ(T ) the resolvent R(z, T ) is defined by R(z, T ) = (z − T )−1 . Notice that R(z, T ) is continuous by the closed graph theorem. Moreover, ρ(T ) is open, and R(·, T ) is holomorphic satisfying the famous resolvent equation R(z, T ) − R(y, T ) = (y − z)R(z, T )R(y, T ).
(4.5)
The complement of ρ(T ) is called the spectrum σ(T ). It is compact since for |z| > T the Neumann series T n z −(n+1 ) converges to R(z, T ). This also implies the fact that lim|z|→∞ R(z, T ) = 0. Moreover, r (T ) = sup{|z| : z ∈ σ(T )} is called the spectral radius of T . As a consequence of Liouville’s theorem the spectrum is never empty.
138
M.P.H. Wolff
Now z is called an eigenvalue if ker(z − T ) = {0}. The space ker(z − T ) is the space of eigenvectors corresponding to z. The set σ p (T ) of all eigenvalues of T is called the point spectrum of T . A value z is an approximate eigenvalue if inf{(z − T )x : x = 1} = 0. The set of all approximate eigenvalues forms the approximate point spectrum σa (T ) of T . It is closed in σ(T ). A point z ∈ σ(T ) is called a Riesz point of T if it is a pole of R(z, T ) for which 1 R(v, T )dv is of finite rank. A Riesz point z is always the residue Q = 2πi |v−z|=δ
isolated in the spectrum of T , and moreover it is an eigenvalue with an eigenspace of dimension equal to the rank of the residuum. A very useful notion was introduced by L. Trefethen [65]: For ε > 0 we define ρε (T ) = {z ∈ ρ(T ) : R(z, T ) < 1ε }. The set σε (T ) = C\ρε (T ) forms the so–called ε–pseudospectrum. It has started to play an important role in modern numerical analysis. The formula for the Neumann series (see above) produces for |z| > T the 1 inequality R(z, T ) ≤ |z|−T . It follows that for every ε > 0, the set {z : |z| > T + ε} is contained in ρε (T ). Notice that if T is a normal operator on a Hilbert space H, the equality R(z, T ) = 1/d(z, σ(T )) holds for all z ∈ ρ(T ), whence σε (T ) is easily determined. Tn ˆ at infinity is the series ∞ The Laurent series of R(λ, T ) A§ n=0 λn+1 , the radius of convergence of it is given by r (T ) = lim supn T n 1/n . Therefore r (T ) = sup{|λ| : λ ∈ σ(T )}. It is called the spectral radius of T . The following formula is well known: Proposition 4.5.1 r (T ) = lim T n 1/n . n
Proof We shall use the formula ST ≤ ST at various places. Let s := inf n T n 1/n . Then for ε > 0 there exists k ∈ N such that s ≤ T k 1/k < s + ε/2. Let N ≈ ∞ be arbitrary. Then there exists a unique q ≤ k − 1 and P ≈ ∞ such that N = Pk + q. We obtain T N ≤ T k P · T q = (T k 1/k )k P · T q < (s + ε/2)k P · T q . Taking the Nth root we obtain 1
T N 1/N ≤ (s + ε/2) 1+q/k P · T q/N < s + ε, because q/N , q/k P 0. Since N ≈ ∞ was arbitrary we have lim supn T n 1/n < (s + ε) from which the assertion follows as ε > 0 was arbitrary.
4 Branch Spaces and Linear Operators
139
4.5.2 The Spectrum of an S–bounded Internal Operator Let E denote an internal Banach space, and let T be an internally bounded operator on E, i.e. T ∈ L(E). By transfer we may define all the notions above also for T ; ) in the case that T is we want to investigate the connection between σ(T ) and σ(T S-bounded (see Sect. 4.4.2). To this end we introduce the external sets ρb (T ) = {z ∈ ρ(T ) : R(z, T ) finite} and ρ∞ (T ) = {z ∈ ρ(T ) : R(z, T ) infinitely large}. By assumption the operator norm of T is finite. In the following we denote the standard part map on Fin∗ C by z → ◦ z. denote the nonstandard hull on E of the S–bounded internal Theorem 4.5.2 Let T operator T on E. Then the following assertions hold: ) = {◦ z : inf{(z − T )x : x = 1} 0}, and σa (T ) consists only of (i) σa (T eigenvalues. ) = {◦ z : z ∈ ρb (T )}. (ii) ρ(T ) ⊂ {◦ z : z ∈ σε (T )} ⊂ (iii) Fix 0 < ε < ε with both numbers standard. Then σε (T )}. σε ( T Proof (i) (I) Assume that inf{(z − T )x : x = 1} =: α 0. Then by transfer, for 0 < η 0 there exists an x of norm 1 satisfying (z − T )x ≤ α + η. x = ◦ z x. But then T (II) Assume now that α − T )x : x = 1} 0.
:= inf{(z x > ◦ α/2 for all x with x = 1, which implies by Then ◦ z − T ). definition that ◦ z ∈ / σa (T (ii) This follows from Lemma 4.2.28. ) by Lemma 4.2.28, (iii) (I) Let z ∈ σε (T ) be arbitrary. If z ∈ σ(T ) then ◦ z ∈ σ(T ). Therefore, assume z ∈ ρ(T ). Then (z − T )−1 ≥ 1 hence ◦ z ∈ σε (T ε )−1 ≥ 1 , since ε is standard, so {◦ z : and Lemma 4.2.28 gives (◦ z − T ε ). z ∈ σε (T )| ⊂ σε (T / σε (T ), (II) Assume that z is standard and z ∈ / {◦ v : v ∈ σε (T )}. Then z ∈ 1 1 −1 ) ≤ < . Since ε is standard, Lemma 4.2.28 yields hence (z − T ε ε the assertion. ) but equality does not hold in Remark 4.5.3 Obviously {◦ z : z ∈ σa (T )} ⊂ σa (T general. For example, let N ∈ ∗ N\N be arbitrary and consider the internal operator T given on C N by k≥2 e T (ek ) = k−1 0 k=1 ((en ) denotes the canonical basis). If one takes the usual scalar product norm on C N then T is a partial isometry, in particular bounded by 1. On the other hand T ) = {z ∈ C : |z| ≤ 1} since T is internally nilpotent hence σ(T ) = {0}. But σa (T
N induces the shift on the invariant subspace of the Hilbert space C spanned by the vectors { ek : k = r [N /2], r ∈ N}, where [N /2] is the greatest integer less than N /2.
140
M.P.H. Wolff
. In Corollary 4.5.4 (i) If z ∈ σ(T ) and |z| = r (T ) then ◦ z is an eigenvalue of T ) ≥◦ r (T ). particular r (T (ii) Let T be a standard bounded operator on the standard Banach space E. Then T ); (a) σ(T ) = σ(∗ (b) σa (T ) = σ p (∗ T ); (c) σε (T ) = σε (∗ T ). Proof (i) If z ∈ σ(T ) and |z| = r (T ), then R(v, T ) is unbounded near z (i.e. for |v| > r (T ) and v z) since otherwise z would not be a singularity of R(·, T ). So there exists a v z with v ∈ ρ∞ (T ), and the assertion follows. Problem 4.5.5 Prove assertion (ii). Hint: Use the foregoing theorem as well as Corollary 4.2.26. ) >◦ r (T ) may happen is shown by the operator constructed in Remark That r (T 4.5.3. Let T be a bounded linear internal operator on the internal Banach space E. A point z ∈ ∗ C is called an S–Riesz point if it is a Riesz point with residue of standard finite rank. As before by B(z, r ) we denote the set B(z, r ) = {v ∈ C : |v − z| < r }. We have the following theorem: Theorem 4.5.6 Let T be an S–bounded operator on the internal Banach space E. with residue of rank r . Then there exists Moreover, let z ∈ C be a Riesz point of T ∗ a standard δ such that the set σ(T ) ∩ B(z, δ) is not empty and consists of at most r S–Riesz points z 1 , . . . , z k with z j z and kj=1 rank(z j ) = r , where rank(z j ) denotes the rank of the residue of z j . )\{z}}/2 . Set η = δ/2. Consider the Proof (I) Let δ = inf{|z − v| : v ∈ σ(T )−1 : v ∈ K } annulus K = {v ∈ C : η ≤ |v − z| ≤ δ}. If a = sup{(v − T ∗ and M = 2a then K ⊂ ρ1/M (T ). To see this, assume the contrary. Then there ), a contradiction to the choice exists a v ∈ ∗ K ∩ σ1/M (T ). Hence ◦ v ∈ σ1/M (T of M. 1 R(v, T )dv (II) By the Transfer Principle the spectral projection q Q = 2πi exists and Q < δM is finite. )dv = Res(R(z, T )). = 1 R(v, T Claim: Q 2πi
|v−z|=δ
|v−z|=δ
) = R(v, Proof of the claim: By Lemma 4.2.28 R(v, T T ) for all v ∈ K . ) the resolvent equation (see Eq. 4.5) yields Moreover, since K ⊂ ρ1/M (T R(v, T ) − R(w, T ) ≤ |v − w|M 2 ; in particular R(., T ) is uniformly S– continuous on ∗ K . Hence, the Riemann sums Rm =
m−1 1 δ R(z + δ exp(2πik/m), T ) exp(2πik/m) 2π i m k=0
4 Branch Spaces and Linear Operators
141
m → satisfy Rm Q for all infinitely large m. On the other hand R 1 )dv. This proves the claim. R(v, T 2πi |v−z|=δ
E) = r < ∞. But Proposition 4.2.10 then implies dim Q(E) = r , (III) Now dim Q( and the assertion follows by the Transfer Principle applied to T | Q(E) .
4.5.3 The Spectrum of Compact Operators and the Essential Spectrum The spectral theory of compact operators is completely described in the following theorem. It is classical material. Theorem 4.5.7 Let T be a standard compact linear operator on the Banach space E. Then its spectrum σ(T ) is at most countable and each z = 0 in σ(T ) is isolated and an eigenvalue with finite dimensional eigenspace. Proof By Corollary 4.4.8 for all λ = 0 λI − T is a Fredholm operator and the index ind(λI − T ) vanishes, since ind(λ(I ) = 0. Thus if 0 = λ ∈ σ(T ) then λ is an eigenvalue and the corresponding eigenspace E λ is finite dimensional. Now let λ = 0 be an eigenvalue of T . Assume that λ is not isolated and let (λn )n be a sequence in σ(T ) converging to λ where λn = λm for n = m . Without loss of generality assume inf(λn ) =: η > 0. Each λn is an eigenvalue. Let xn be a normalized eigenvector corresponding to λn . Then {xn : n ∈ N} is linear independent. Let Mn be the linear hull of {x1 , . . . , xn }. Then obviously T (Mn ) ⊂ Mn . Moreover for y ∈ Mn+1 \Mn T y − λn+1 y ∈ Mn holds. For y = βxn+1 + u with u ∈ Mn , hence T y = λn+1 βxn+1 + T u where T u ∈ Mn . Set y1 = x1 . By Corollary 4.2.9 to every n ≥ 2 there exists yn ∈ Mn of norm 1 satisfying d(yn , Mn−1 ) > 1/2. Let n, p ∈ N be arbitrary. Then T yn+ p − T yn = T yn+ p − λn+ p yn+ p + λn+ p yn+ p − T yn = λn+ p yn+ p − v, where v ∈ Mn+ p−1 . Hence T yn+ p − T yn ≥ λn+ p d(yn+ p , Mn+ p−1 ≥ η/2. By Lemma 4.2.16 T y N is a remote point for every N ≈ ∞ . This contradicts Proposition 4.4.1, since T is compact.
Definition 4.5.8 Let T be a bounded linear operator on the Banach space E. Then the essential spectrum σess (T ) is the set of λ ∈ C such that λI − T is not a Fredholm operator.
142
M.P.H. Wolff
In order to avoid trivial cases let E be infinite dimensional. If T is compact then σess (T ) = {0} by Theorem 4.5.7. Moreover λ ∈ σess (T ) iff λ ∈ σ(T˜ ). Here T˜ is the equivalence class of T in the Calkin algebra. Theorem 4.4.6 now gives the following result: Proposition 4.5.9 Let T be a bounded linear operator on the infinite dimensional Banach space E. Then σess (T ) = ∅ and ress (T ) := sup{|λ| : λ ∈ σess (T )} ≤ r (T ). ˜ and Problem 4.5.10 (a) Prove the proposition. Hint: Consider the operator T˜ on E, use Theorem 4.4.6 as well as T ≥ T˜ and the formula for the spectral radius, see Proposition 4.5.1. (b) Let T be a bounded linear operator on E, such that T n is compact for some n ∈ N. Show that Theorem 4.5.7 holds also in this case. Hint: Use Proposition 4.5.1 for T˜ .
4.5.4 Closed Operators and Pseudoresolvents The nonstandard hull of a closed operator Let (A, D(A)) be a closed densely defined operator on the Banach space E. More precisely D(A) is a dense subspace and the graph G(A) = {(x, Ax) : x ∈ D(A)} is closed in E × E. Its resolvent set ρ(A) is defined as ρ(A) = {λ ∈ C : (λ − A) is bijective onto E}. If λ ∈ ρ(A), then (λ − A)−1 =: R(λ, A) is continuous by the closed graph theorem. Now, ρ(A) is open (but it might be empty). As in the case of a bounded operator the resolvent R(., A) : ρ(A) λ → R(λ, A) is holomorphic and satisfies the resolvent equation (see Eq. (4.5)). The complement of ρ(A) is the spectrum σ(A). The point spectrum σ p (A), the approximate point spectrum σa (A), and the ε–pseudospectrum σε (A) are defined as previously (see p. xxx). because In general it is quite difficult to define the nonstandard hull of A in E is no longer the graph of a mapping in E. To solve the problem we recapitulate G(A) the notion of a pseudoresolvent introduced by E. Hille (see [81], VIII.4): Let D ⊂ C be not empty and R : D → L(E) be a function satisfying the resolvent equation R(u) − R(v) = (v − u)R(u)R(v). Then all operators R(u) have a common null space denoted by N (R) and a common range, denoted by R(E). Moreover R(u)R(v) = R(v)R(u) holds for all u, v, ∈ D. Theorem 4.5.11 (standard, see [81], p. 21) (i) Let R : D → L(E) be a pseudoresolvent and assume that there exists a sequence (λn ) ⊂ D with lim |λn | = ∞ such that (λn R(λn )) is bounded. Then R(E) = {x ∈ E : lim λn R(λn )x = x} and N (R) ∩ R(E) = {0}.
4 Branch Spaces and Linear Operators
143
(ii) A pseudoresolvent is the resolvent of a closed densely defined linear operator A iff N (R) = {0}. Then R(E) is the domain of definition of A and A = u I − R(u)−1 . In the following let us assume that (A, D(A)) is closed, densely defined and that ρ(A) = ∅. Moreover let us assume that there is a sequence (λn ) ⊂ ρ(A) such that lim |λn | = ∞ and (λn R(λn , A)) is bounded. These hypotheses are satisfied for A selfadjoint in a Hilbert space or for A the generator of a strongly continuous semigroup, for in the latter case u R(u, A) converges strongly to the identity for u → ∞ (for details see the next section). Now by the Transfer Principle ρ(A) n ))n is A) =: R(λ) for which (λn R(λ λ → R(λ, defines a pseudoresolvent on E bounded. R = R( E). The space E R is invariant under R(u) for Therefore, we define E is injective with dense range D( A) := R(u)( E ). We now set all u and R(u)| R R E −1 A = u I − ( R(u)| ER ) , and call this the nonstandard hull of the closed operator (A, D(A)). R = E and Remark 4.5.12 If A has compact resolvent then by Proposition 4.4.1, E A = A.
General spectral theory of pseudoresolvents In order to avoid the above special assumptions on the resolvent of A and still retain the power of nonstandard hulls, the singular set of a pseudoresolvent R was introduced in [52], following an idea of E. Hille [26]. This singular set coincides with the spectrum of the closed operator A if R(λ) = (λ − A)−1 . It enables one to avoid the construction of the nonstandard hull of a closed operator. Let R : D → L(E) be a pseudoresolvent and fix λ0 ∈ D. Then set Dmax = {λ ∈ C : λ = λ0 or (λ0 − λ)−1 ∈ ρ(R(λ0 ))}, and define Rmax on Dmax by Rmax (λ) = R(λ0 )(I − (λ0 − λ)R(λ0 ))−1 . The singular set sing(R) is defined to be the complement of Dmax in C. The connection with the spectrum of bounded operators is given by the two formulas σ(R(λ))\{0} = {(λ − μ)−1 : μ ∈ sing(R)}, sing(R) = {λ − 1/μ : μ ∈ σ(R(λ))\{0}}.
(4.6) (4.7)
144
M.P.H. Wolff
In complete analogy to the spectral theory of bounded operators we introduce the point spectrum as well as the approximate point spectrum of the pseudoresolvent as follows: sing p (R) = {α ∈ C : ∃x[x = 1 and (λ − α)R(λ)x = x]}, singa (R) = {α ∈ C : inf{((λ − α)R(λ) − I )x : x = 1} = 0}.
(4.8) (4.9)
The resolvent equation (4.5) shows that this definition is independent of the particular λ. Now fix some λ ∈ Dmax . An easy calculation shows that sing p (R) = {λ − 1/μ : μ ∈ σ p (R(λ))\{0}}, singa (R) = {λ − 1/μ : μ ∈ σa (R(λ))\{0}}.
(4.10)
Moreover, if R(λ) = (λ − A)−1 for some closed operator A then sing p (R) = σ p (A) and singa (R) = σa (A). The nonstandard hull of a (standard) pseudoresolvent is obviously a pseudoresolvent. More precisely the following proposition holds: Proposition 4.5.13 ([52], Proposition 2.1) Let R : D → L(E) be a pseudoresolvent. The following assertions hold: is a pseudo-resolvent with R(λ) = R(λ). : D → L( E), λ → R(λ) R and sing(R) = sing( R). Dmax (R) = Dmax ( R) and then the orders of the poles are λ0 ∈ C is a pole of R iff it is a pole of R equal. In particular sing(R) ∩ ∂ Dmax ⊂ sing p ( R). (iv) singa (R) = sing p ( R).
(i) (ii) (iii)
Proof (i) follows by a careful application of the Transfer Principle. (ii) follows from Eq. 4.3 by an application of the Transfer Principle to the fixed operator R(λ). For (iii) and (iv) use Eq. (4.10) and apply Corollary 4.5.4 (ii) to the fixed operator R(λ). As a corollary we obtain Corollary 4.5.14 Let A be a closed densely defined operator on E. Let there exist an unbounded sequence (λn ) in ρ(A) such that (λn R(λn , A))n is bounded. Then the following assertions hold: (i) σ(A) = σ( A). (ii) σa (A) = σ p ( A).
4.5.5 Notes The spectral theory of internal S–bounded operators as presented here is due to the author (cf. also [51, 77]). Corollary 4.5.4 is partly new. Part (ii) (a) and (b) of it
4 Branch Spaces and Linear Operators
145
however are very well–known and trace back (within the frame work of Fréchet– products) to Quigley (see [53]). These facts were rediscovered by Berberian [4], Lotz (cf. [57] V.1), and others and have been used extensively since then. Theorem 4.5.6 is new. The corresponding result within the frame work of ultraproducts may be found in [51]. Section 4.5.3 is well–known. The proofs seem to be new, though they are based on the standard ones (cf. [81], for a nonstandard treatment see also [54]). Section 4.5.4 is new. The corresponding treatment of closed operators within the theory of ultraproducts is due to Krupa [33]. The use of pseudo-resolvents in the spectral theory of closed operators goes back to Hille (see [26]). Its present form is taken from [52] where it is developed within the theory of ultraproducts.
4.6 Selected Applications 4.6.1 Strongly Continuous Semigroups A typical example of the preceding notions is the generator (A, D(A)) of a bounded strongly continuous semigroup T = (Tt )t≥0 of operators Tt on E. Let us recall this notion in a little more detail: T is called a strongly continuous semigroup if Ts+t = Ts Tt for all s, t ≥ 0 and if also for all x ∈ E lim Tt x = x holds. In this t→0
case, its generator (A, D(A)) is defined by x ∈ D(A) iff Ax := lim 1t (Tt x − x) t→0
exists. The semigroup property implies that t → Tt x is norm continuous for every x ∈ E. Moreover x ∈ D(A) ⇒ Tt x ∈ D(A) and (Tt x) = ATt x = Tt Ax. (For details on strongly continuous semigroups see e.g. [48].) In what follows the hypothesis that T be bounded is not severely restrictive since in the general case there exists an M > 0 such that t → exp(−Mt)Tt is strongly continuous and bounded as well. ∞ Since T is bounded by R(z)(x) := e−t z Tt x dt, there is a bounded linear oper0
ator defined for z ∈ C with Re(z) > 0. This turns out to be a pseudoresolvent with lim u R(u)x = x u→∞
(as is easily seen, cf. part (II) of the proof of the next theorem). Therefore, we can apply Theorem 4.5.11. Since the kernel N (R) = {0}, by the formula above, R(·) is the resolvent of the closed densely defined operator B = u − (R(u))−1 (which is independent of u > 0). The easily proved formula Tt R(u)x = e R(u)x − e ut
ut 0
t
e−us Ts xds
(4.11)
146
M.P.H. Wolff
shows first of all that the map t → Tt R(u) is continuous with respect to the operator norm, and secondly that (Tt R(u)x) t=0 = u R(u)x − x = A R(u)x holds, whence B = A. It follows that (A, D(A)) satisfies our assumptions that were made in the previous section in order to construct A. can If (A, D(A)) is the generator of a bounded strongly continuous semigroup, A be characterized in a manner different than that given in the previous section. We adhere to the notions and notations given there. Theorem 4.6.1 (cf. [73]) Let (A, D(A)) be the generator of a bounded strongly continuous semigroup T = (Tt )t≥0 . Then the following assertions hold: t | R = { x : t → ∗ Tt x is S-continuous}. Moreover, the restriction t → T (i) E ER is a strongly continuous semigroup with generator A. = { R : ∃x ∈ x [x ∈ ∗ D(A), ∗ Ax is finite, and t → ∗ Tt ∗ Ax is (ii) D( A) x ∈ E S-continuous]}. . Proof (i) Let H = {x ∈ Fin(E) : t → ∗ Tt x is S-continuous} and set G = H Denote the bound of (Tt ) by M. Let u > 0 be given. (I) Since t → Tt R(u, A) is continuous with respect to the operator norm, so is t → ∗ Tt ∗ R(u, A). So t → ∗ Tt ∗ R(u, A)y is S–continuous (cf. Lemma ⊂ G, which in turn implies 4.4.2) for every y ∈ Fin(∗ E), hence R(u, A)( E) that E R ⊂ G. (II) Conversely, let t → ∗ Tt x be S–continuous. For each standard ε > 0 there exists a standard δ such that ∗ Tt x − x < ε for all t ∈ ∗ R with 0 ≤ t < δ. ∞ Because 0 u exp(−ut)dt = 1, we have by the Transfer Principle u ∗ R(u, A)x − x = u
∞
exp(−ut)(∗ Tt x − x)dt
0 δ
exp(−ut)∗ Tt x − x ∞ u exp(−ut)dt +(M + 1)
≤u
0
δ
≤ ε + (M + 1) exp(−uδ)). R . Thus, This estimate shows that limu→∞ u R(u, A)xˆ = x, ˆ whence xˆ ∈ E R holds. That t → T t | is a strongly continuous semigroup is G ⊂ E ER obvious. The rest of (i) follows from the formula 4.11 above applied to this semigroup. = R(u)( R ) for some u > 0. By (i) ⇒ x = (ii) By definition D( A) E x ∈ D( A) ∗ R(u, A)y for some y ∈ H . But then ∗ Ax = ∗ Ax − ux + ux = ux − y is finite and in H .
4 Branch Spaces and Linear Operators
147
Conversely, assume that x ∈ ∗ D(A), and that x as well as ∗ Ax = y are in H . Then a short calculation shows that x = ∗ R(u, A)(ux − y). Since ux − y ∈ H , = E R by (i). we obtain xˆ = R(u, A)(u xˆ − yˆ ), where (u xˆ − yˆ ) ∈ H The space on which (Tˆt ) is strongly continuous might be strictly larger than Eˆ R as the following example shows. Example 4.6.2 (cf. [73]) Let E be the space 2 (Z), and consider the group action Tt ( f )(k) = exp(itk) f (k) for t ∈ R. It is easily seen that R(z, A) is compact, whence R = E. On the other hand for every ε > 0 and {t1 , . . . tn } there exists a we have E k ∈ Z such that sup{| exp(it j k) − 1| : j ≤ n} < ε (see [25], Theorem 26.14). Take a hyperfinite set M ⊂ ∗ R containing all of R and choose ε 0 and k ∈ ∗ Z such that sup{| exp(itk) − 1| : t ∈ M} < ε. Set f = ek = (δk,l )l∈∗ Z . Then Tˆt ( fˆ) = fˆ holds, but t → ∗ Tt ( f ) is not S–continuous, since for t = π/k 0 ∗ Tt ( f ) = − f . Corollary 4.6.3 Let T = (Tt )t≥0 be a bounded strongly continuous semigroup on the (standard) Banach space E. The following assertions are equivalent: (i) T is uniformly continuous, i.e. the map t → Tt is continuous from R+ into L(E) , equipped with the operator norm. (ii) For all x ∈ Fin(∗ E) the map ∗ R+ t → ∗ Tt (x) is S–continuous. (iii) The generator A is bounded. Proof (i) ⇒ (ii). Obvious. R = E, so R(λ, (ii) ⇒ (iii): By Theorem 4.6.1 E A) is a topological isomorphism. −1 ˆ But then A = λI − R(λ, A) is bounded. (iii) ⇒ (i). Since A is bounded and closed, it is everywhere defined. This implies that R(λ, A) is bijective, and since t → Tt R(λ, A) is continuous with respect to the operator norm, the assertion follows.
4.6.2 Approximation of Operators and of Their Spectra General Theory Another field for which nonstandard functional analysis works well is approximation theory. We give some examples concerning the approximation of spectra (for another application in this field see [51]). To this end we recall the definition of the distance from the bounded set A to the set B in C given by d(A, B) = supa∈A (inf{|b − a| : b ∈ B}). In the following assume that a sequence (Sn )n∈N approximates in a vague sense the not necessarily continuous but at least densely defined and closed operator A. We shall look for the possible convergence of parts of the spectrum of Sn to some part of the spectrum of A. If Sn is defined on the Banach space E and T is a bounded operator such that lim T − Sn = 0 then it is not hard to prove that lim d(σ(Sn ), σ(T )) = 0
148
M.P.H. Wolff
holds. But we point out that lim sup d(σ(T ), σ(Sn )) = 0 may happen (see [32], p. 210). A slightly weaker form of the assertion turns out to hold in a much more general setting, as we now show. First of all let us recall the notion of discrete convergence from approximation theory (cf. [47]): Let E and Fn be Banach spaces. Let E 1 be a dense subspace of E and for each n let Pn : E 1 → Fn be an arbitrary linear operator. The quadruple (E, E 1 , (Fn ), (Pn )) is called an approximation scheme if lim Pn un = u holds for all u ∈ E 1 . Notice that in contrast to [47] we do not require the Pn to be continuous. The mappings Pn are so to say asymptotically isometric and this is enough for applications. A sequence (u n )n with u n ∈ Fn converges discretely to u ∈ E 1 if lim u n − Pn un = 0. In that case we write u = d − limn u n . Let E 0 ⊂ E 1 be another dense subspace of E and let A : E 0 → E 1 be a linear operator. A sequence (Sn )n of linear operators Sn on Fn approximates A discretely on E 0 if loosely spoken d − limn Sn Pn u“ = Au holds for all u ∈ E 0 . More precisely this means lim Sn Pn u − Pn Aun = 0 for all u ∈ E 0 .
n→∞
(4.12)
We denote this approximation by A = d − lim Sn . Strong convergence of a sequence of bounded operators is a special case of this notion. In fact set E = E 1 = Fn , and Pn = I for all n. In order to clear up these notions of discrete approximation, we prove the following lemma: Lemma 4.6.4 Let (E, E 1 , (Fn ), (Pn )) be a fixed approximation scheme. Then the following assertions hold: (i) For N ∈ ∗ N\N the operator P N | E 1 =: P˜N is well–defined and embeds E 1 isometrically into FN . Its unique extension to an isometry from E into F N will ˜ also be denoted by PN . (ii) Let (A, D(A)) be a densely defined operator in E and let D(A) ⊂ E 1 satisfy A(D(A)) ⊂ E 1 . Assume that (Sn )n approximates A| D(A) discretely and moreover assume that (Sn Pn x)n is bounded for every x ∈ D(A). Let N be as above. Then P˜N A| D(A) = S N P˜N | D(A) . Proof (i) If u is in E 1 then PN u N u by hypothesis. Hence (i) follows. (ii) Since (Sn )n is bounded S N is S–continuous. The remainder follows from Eq. 4.12. Example (1) Let Fn = Cn with the scalar product norm (x1 , . . . , xn )2n = n 4.6.5 2 2 1 |x j | /n. Let E = L ([0, 1]), E 1 = { f ∈ E : f continuous, f (0) = f (1)}, and E 0 = { f ∈ E 1 : f ∈ E 1 }. Set A f = f and Pn : E 1 → Fn , f →
k f . n 0≤k≤n−1
4 Branch Spaces and Linear Operators
149
Define Sn = n(Tn − In ) where In is the identity on Fn and Tn is the shift given by xk+1 1 ≤ k ≤ n − 1 Tn (x)k = . Then A = d − lim Sn . Notice that (Sn )n k=n x1 is not bounded, as is easily seen. (2) Let E = l 2 (N) be the usual Hilbert space and let T be the shift given by (T f )(k) = f (k + 1). Then σ(T ) = σa (T ) = {z ∈ C : |z| ≤ 1}. Set Fn = E, Pn = I and f (k + 1) k ≤ n − 1 (Sn f )(k) = 0 otherwise. Then (Sn ) converges strongly to T . Notice that σ(Sn ) = {0} , so the spectra do not converge in any reasonable sense. (3) Let S be the adjoint T ∗ of T . Then σa (S) = T = {z ∈ C : |z| = 1}. Set Fn = Cn , equipped with the usual scalar product norm, take Pn f = ( f (1), . . . , f (n)) and Sn (x1 , . . . , xn ) = (xn , x1 , . . . , xn−1 ). Obviously (Sn ) converges discretely to S. In this case not even the ε–pseudospectrum of Sn behaves well. In fact, we have lim supn d(σ(S), σε (Sn )) = 1 − ε. In the following, let (A, D(A)) be a fixed closed densely defined operator on the Banach space E. A subspace E 0 ⊂ D(A) is a core of A if the closure of A| E 0 equals A. If E 0 is a core of A and λ ∈ σa (A) then there exists a sequence (xn )n of normalized elements in E 0 such that limn (λ − A)xn = 0 holds. Theorem 4.6.6 ([78]) Let (E, E 1 , (Fn ), (Pn )) be a fixed approximation scheme. Let (Sn ) be a sequence of bounded operators on Fn which discretely approximates the operator A on the core E 0 ⊂ E 1 of A. Then for every ε > 0 we have σa (A) ⊂
σε (Sk )
n∈N k≥n
Proof Let ε > 0 (standard) be given and let λ ∈ σa (A) be arbitrary. Because E 0 is a core of A there exists x ∈ E 0 of norm 1 such that (λ − A)x < ε/2. By hypothesis for all N ∞ and for all y ∈ E 1 we have PN y y, which implies in particular PN x 1 as well as PN (λ − A)x (λ − A)x < ε/2. Moreover, PN (λ − A)x (λ − S N )PN x by hypothesis, whence (λ − S N )PN x < ε/2, which in turn implies that λ ∈ σε (S N ). Let A = {n ∈ N : ∀k ≥ n[(λ − Sk )Pk x < ε/2]}. By what is proved so far ∗ A contains ∗ N\N. The assertion now follows from the Spillover Principle (Theorem 2.8.12 (ii)) and the Downward Transfer Principle. The next corollary holds in particular if all Fn are Hilbert spaces and if moreover all operators Sn are normal. Corollary 4.6.7 Assume that in addition to the hypotheses of the theorem, we have R(λ, Sn ) ≤ 1/d(λ, σ(Sn )) for each n ∈ N and all λ ∈ ρ(Sn ). Then for every compact subset K ⊂ C
150
M.P.H. Wolff
lim d(σa (A) ∩ K , σ(Sn )) = 0.
n→∞
Proof Let λ ∈ ∗ (σa (A) ∩ K ) be arbitrary and let ε > 0 be a standard real number. Because K is compact the standard part ◦ λ =: z exists and is in σa (A) since this set is closed. By the theorem there exists n(ε, z) such that R(z, Sn ) ≥ 2/ε holds for all n ≥ n(ε, z). By hypothesis this implies that d(z, σ(Sn )) ≤ ε/2 for all these n. The Transfer Principle implies that d(λ, σ(S N )) d(z, σ(S N )) < ε for all N ∞, and the assertion follows. The proofs of the next two corollaries are obvious. Corollary 4.6.8 If all Fn are Hilbert spaces and all Sn are unitary operators, then σa (A) ⊂ {z ∈ C : |z| = 1}. Corollary 4.6.9 If E and Fn are Hilbert spaces and A as well as all Sn are normal, then limn→∞ d(σ(A) ∩ K , σ(Sn )) = 0 for every compact subset K of C. : Consider the first Example in 4.6.5. Sn is normal and σ(Sn ) = { exp(2iπk/n)−1 1/n k = 1, . . . , n}. It follows that {2iπk : k ∈ Z} = σ(A) (a fact to be proved much more easily directly).
Compact Operators In case of approximation of compact operators a much better estimate for the spectrum is possible. In this section we follow [79]. Let (E, E 1 , (Fn ), (Pn )) be a fixed discrete approximation scheme. Definition 4.6.10 A sequence (xn ) ∈ Fn is called discretely compact (dcompact for short) if for every ε > 0 there exists a finite set Y (ε) ⊂ E 1 depending on ε such that lim sup d(xn , Pn (Y (ε))) < ε. n
A subset X ⊂ Fn is called uniformly d-compact if for every ε > 0 there exists a finite set Y (ε) ⊂ E 1 such that lim sup d(xn , Pn (Y (ε))) < ε n
holds for all (xn ) ∈ X . Remark 4.6.11 There are various similar notions in the literature, see e.g. [2, 16], which turn out to be less general than ours. For a discussion see [79], pp. 227–228. The characterization of d-compactness by nonstandard analysis reads as follows:
4 Branch Spaces and Linear Operators
151
Proposition 4.6.12 A sequence (xn )n ∈ Fn is discretely compact if and only if to every N ≈ ∞ there exists x ∈ E such that P˜N x = x N. Proof Let x be discretely compact and let N ≈ ∞. To r ∈ N there exist a finite set Yr with lim supn d(xn , Pn (Yr )) < 2−r , in particular d(x N , PN (∗ Yr )) < 2−r . Choose yr ∈ Yr satisfying d(x N , PN yr ) < 2−r . Since PN is almost an isometry the sequence (y)r is a Cauchy sequence in E 1 hence convergent to some x ∈ E. It follows that y N = P˜N x. Conversely assume that (xn )n is not discretely compact. Then there exists ε > 0 and to every finite Y ⊂ E 1 as well as to every m ∈ N there exists n ≥ m such that d(xn , Pn (Y )) > ε. Now take a hyperfinite Y ⊂ ∗ E 1 with E 1 ⊂ Y (externally). Let M ≈ ∞. Then there exists N ≥ M with d(x N , PN (Y )) > ε. In particular for y ∈ E 1 it follows x N − y > ε, so there is no standard x with P˜N x = x N. The characterization of uniform compactness is a little more cumbersome and will not be needed in the sequel. Let us consider again Example 1 in 4.6.5, but now choose ⎛
1 1⎜ ⎜1 An = ⎜ . n ⎝ .. 1
⎞ 0 0 ··· 0 1 0 ··· 0⎟ ⎟ ⎟. .. ⎠ . 1 1 ··· 1
x The sequence (An Pn f )n converges discretely to the Volterra operator f → (x → 0 f (u)du). The next proposition shows that the set {(An Pn f )n : f = 1} is uniformly d-compact. Proposition 4.6.13 Let the linear operator A : E 1 → E 1 be approximated by the sequence (An ) of operators An : Fn → Fn . Then A is compact if and only if the set X = {(An Pn x)n : x = 1} is uniformly d-compact. Proof Let A be compact, and let ε > 0 be given. Then there exists a finite set Y with d(A(B0, 1), Y ) < ε/2, where B(0, 1) denotes the unit ball in E 1 . Let N ≈ ∞ be arbitrary. Then for all (standard) x of norm 1 we have PN Ax A N PN x as well as d(PN Ax, PN Y ) < ε/2, hence d(A N PN x, PN (Y )) < ε, which implies that X is uniformly d-compact. Conversely let X be uniformly d-compact. Then to ε > 0 there exists a finite set Y ⊂ E 1 with lim supn d(An Pn x, Pn (Y )) < ε/2 for all x ∈ E 1 of norm 1. Let N ≈ ∞ be arbitrary. Then for each x of norm 1 we have d(A N PN x, PN (Y )) < ε/2 as well as PN Ax A N PN x. Since PN is almost an isometry we obtain d(Ax, Y ) < ε for all x of norm 1, and the assertion follows. Now we describe the approximation of the spectrum of A, whenever A is compact. Let the compact operator A be discretely approximated by (An ) We define D = {z ∈ C : ∃(xn )n , xn n = 1 for all n, (xn )n d-compact and lim inf (z − An )xn n = 0}. n
152
M.P.H. Wolff
Theorem 4.6.14 Let (An )n be a uniformly bounded sequence of operators An on Fn that approximates discretely the compact operator A. Then D ⊂ σ(A) ⊂ D ∪ {0}. Proof To z ∈ D there exists a normalized d-compact sequence (xn )n satisfying lim inf n (z − An )xn n = 0. Therefor there exists N ≈ ∞ with A N x N zx N .Since (xn )n is d-compact there exists a (standard) y ∈ E with x N = P˜N y. So we obtain A N P˜N y =
A N x P˜N Ay =
N = zx N = z P˜N y. It follows z ∈ σ(A), since P˜N is an isometry. Conversely let 0 = z ∈ σ(A) be arbitrary and let y be a normalized eigenvector. Then the sequence (xn )n given by xn = Pn y = 1z Pn Ay is d-compact by Proposition 4.6.13. Moreover for N ≈ ∞ we get A N x N = A N PN y PN Ay = z PN y = zx N , which implies even limn (z − An )xn n = 0. More results are gained by refining the notion of discrete convergence, see [78].
4.6.3 Super Properties Roughly spoken a property concerning subsets of a given Banach space or concerning operators is called a super property if the corresponding object in the nonstandard hull has this property. We know already the notion of superreflexivity. Moreover compactness of an operator is also a super property (see Proposition 4.4.1). We turn now towards two other notions which are super properties. Recall that an operator T ∈ L(E) is called uniformly stable if {T n : n ∈ N } is relatively compact with respect to the operator norm. It is called stable if the orbits o(T, x) := {T n x : n ∈ N } are relatively norm compact for every x ∈ E. Definition 4.6.15 Let E be a Banach space and let T be an operator whose powers are uniformly bounded (supn T n < ∞). (i) T is called superstable if Tˆ is stable on E. (ii) T is called superergodic if Tˆ is mean ergodic. Recall first of all that in Banach spaces precompactness and relative compactness are equivalent notions, and furthermore that a sequence (u n )n in a Banach space F, say, is precompact if ∀ε > 0∃L ∈ N∀n ∈ N [d(u n , {u 1 , . . . , u L }) < ε] .
(4.13)
4 Branch Spaces and Linear Operators
153
This formula is better known than the following equivalent one: $ % ∀ε > 0∀ϕ ∈ NN ∃L ∈ N d(u ϕ(L) , {u 1 , . . . , u L }) < ε .
(4.14)
Let B E be the closed unit ball of the Banach space E. Let ε > 0 be arbitrary and let ϕ ∈ NN be a given sequence. Then for m, k ∈ N and k ≤ m we define Aε,ϕ (m, k) = {x ∈ B E : T ϕ(m) x − T k x < ε}. Set Aε,ϕ (m) =
k
Aε,ϕ (m, k) = {x ∈ B E : d(T ϕ(m) x, {T x, T 2 x, . . . , T m x}) < ε}.
1
With the aid of the two equivalent formulas 4.13 and 4.14 it is not too hard to prove the following two statements: (i) T is uniformly stable iff for all ε > 0 and all sequences ϕ there exists m ∈ N and k ≤ m such that B E = Aε,ϕ (m, k). (ii) T is stable iff for all ε > 0 and all sequences ϕ B E = m∈N Aε,ϕ (m). With this in mind we give now a characterization of superstable operators, which is independent of the nonstandard hull. Theorem 4.6.16 The following assertions are equivalent: (i) T is superstable. (ii) Tˆ is superstable. (iii) For all ε > 0 and all ϕ ∈ NN there exist an L ∈ N such that BE =
L
Aε,ϕ (m).
1
Remark 4.6.17 Condition (iii) and the discussions above show the position of the notion of super stability within the other notions of stability. Proof (i) → (iii): By definition Tˆ is stable. Using formula 4.14 we obtain & ' ˆ {Tˆ x, ˆ . . . , Tˆ m x}) ˆ <ε . ∀ε > 0∀ϕ ∈ NN ∀xˆ ∈ B Eˆ ∃m ∈ N d(Tˆ ϕ(m) x, We fix ε and ϕ. Then we can “lift” the remainder of the formula and get ∀x ∈ ∗ B(E)∃st m ∈ N
&
∗
' d(∗ T ϕ(m) x, {∗ T x, . . . , ∗ T m x}) < ε ,
where ∃st means “there exists a standard”. But this letter formula is equivalent to ∃st
fin
M ⊂ N∀x ∈ ∗ B(E)∃m ∈ M
&
∗
' d(∗ T ϕ(m) x, {∗ T x, . . . , ∗ T m x}) < ε ,
154
M.P.H. Wolff
(see Nelson’s algorithm [44]) where ∃st fin M means “there exists a standard finite set M”. Now by the transfer Principle this latter formula holds in our standard model (obviously without the exponent “standard”). Now set L = max(M). then the following formula holds: & ' ∃L ∈ N∀x ∈ ∗ B(E)∃m ≤ L d(T ϕ(m) x, {T x , . . . , T m x}) < ε , which is equivalent to (iii). Aη,ϕ (m) ⊂ (iii) → (i): Let ε, ϕ be given. Then (iii) holds also for η = ε/2. Moreover ∗ ˆ E ˆ ˆ Aε,ϕ (m), where the upper index means the set formed in E for T . The transfer principle yields L(η) L(η) ˆ E ∗ Aη,ϕ (m) ⊂ Aε,ϕ (m), B Eˆ = 1
1
which implies that Tˆ is stable by the paragraph preceding the theorem. So T is superstable. Eˆ (m) (see the proof of (i) → ˆ = 1L Aε,ϕ (i) → (ii): (i) implies (iii), which gives B( E) (iii). But this in turn yields (ii) because of (iii) → (i). The remainder is obvious. The importance of super stability is underlined by the following theorem: Theorem 4.6.18 ([49]) (i) If T is superstable then {z ∈ σ(T ) : |z| = 1} =: σ1 (T ) is at most countable. (ii) If σ1 (T ) is at most countable and E is superreflexive then T is superstable. For the proof we refer to [49]. In a similar manner we can characterize superergodicity (see [80]). First of all, since every power bounded operator T on a reflexive Banach space E is necessarily ergodic because the unit ball is weakly compact we obtain that every power bounded operator T on a superreflexive Banach space E is superergodic, because the nonstandard hull Eˆ of E is reflexive by Theorem 4.3.27. So the notion makes only sense in non superreflexive spaces. 1 n
Let T be a power bounded operator on the Banach space E. We set Mn (T ) = n−1 k T . Moreover for every ε > 0 and for every sequence ϕ ∈ N N set 0 Bε,ϕ (m) = {x ∈ B E : d(Mϕ(m) x, {M1 x, . . . Mm x}) < ε}.
Similarly to our proof of Theorem 4.6.18 we obtain the following proposition:
4 Branch Spaces and Linear Operators
155
Proposition 4.6.19 Let T be a power bounded operator on the Banach space E. The following assertions are equivalent: (i) (ii) (iii)
T is superergodic. ˆ Tˆ is superergodic on E. For every ε > 0 and ϕ ∈ NN there exists L ∈ N such that B E = 1L Bε,ϕ (m).
M. Yahdi [80] has shown by examples that the notion of superergodicity lies strictly between that one of ergodicity and uniform ergodicity. Finally let us point out that analogous results can be gained also for strongly continuous semigroups, see [49].
4.6.4 The Fixed Point Property Due to lack of space we can only give a brief glimpse into this field. Almost since the beginning of its development ultraproduct methods were used heavily (see [31, 39, 60]). To our knowledge the only paper where nonstandard analysis is used is that by A. Wi´snicki [71]. We give a short rapport of the paper [69] which gives an interesting insight into the theory. In the following let T be a self mapping on the the bounded closed convex subset C of the Banach space E. It is called nonexpansive if T (x) − T (y) ≤ 1 for all x, y ∈ C. It is called a contraction as usual if it is Lipschitz continuous with Lipschitz constant L(T ) < 1. The subset C is said to have the fixed point property if every nonexpansive map on it has a fixed point. A Banach space E has the fixed point property or FFP for short, if every closed bounded convex subset has this property. In [39] B. Maurey used the ultraproduct technique to prove the fixed point property for every reflexive subspace of L 1 ([0, 1]). We start with the following lemma: Lemma 4.6.20 Let C be a bounded closed convex subset of the Banach space E, and let T : C → C be nonexpansive. Then there exists a sequence (xn )n with lim n T (xn ) − xn = 0. Such a sequence is called an approximate fixed point sequence . Proof Fix a point x0 and for every n ∈ N consider the contraction Tn (x) = n1 x0 + (1 − n1 )T (x). Let xn be its unique fixed point. Then (xn )n is the desired sequence. . We need still another notion of the geometry of Banach spaces: a closed subset A is called metrically convex if to every two points x, y ∈ A there exists z ∈ A such that x − y = x − z + z − y.
156
M.P.H. Wolff
In the following we consider the nonstandard hull Cˆ ⊂ Eˆ as well as the nonstandard hull Tˆ of T . Obviously Cˆ is bounded, closed, and convex, if C possesses these properties. Moreover Tˆ is nonexpansive if T is so. Note that on account of the lemma above Tˆ has always fixed points. The next theorem is due to B. Maurey [39]. It is essential in this field: Theorem 4.6.21 (B. Maurey) The set Fi x(Tˆ ) of fixed points of Tˆ has the metric fixed point property. Proof Let a, ˆ bˆ be two different fixed points of Tˆ . Choose a ∈ a, ˆ b ∈ bˆ arbitrarily and set a − b = λ, as well as a − T (a) = ξ1 , b − T (b) = ξ2 . ξ3 := ξ1 + ξ2 is infinitesimal, hence by Robinson’s Sequential Lemma (Theorem 2.8.13), applied to the sequence (sn ) = (n 2 ξ3 )n there exists N ≈ ∞ such that N 2 ξ3 < 1. We set η = 1/N and D = {z ∈ ∗ C : a − z, b − z ≤ λ2 + η}. D is intern by Keisler’s Internal Definition Principle (Theorem 2.8.4) and it is internally closed. Claim: x0 := (a + b)/2 ∈ D, so D = ∅. Proof : a − x0 = 21 a − b = λ2 = b − x0 . Claim: The map S : D → ∗ C, given by S(z) = η2 (a + b) + (1 − η)T (z), maps D into itself. Proof : We have η a − S(z) = (1 − η)(a − T (a)) + (1 − η)(T (a) − T (z)) + (a − b). 2 Using T (a) − T (z) ≤ a − z ≤ λ/2 + η, since z ∈ D, we obtain η λ +η + λ 2 2 λ = (1 − η)ξ1 + (1 − η)η + . 2
a − S(z) ≤ (1 − η)ξ1 + (1 − η)
But ξ1 < 1/N 2 = η 2 hence (1 − η)(ξ1 + η) < η(1 − η)(1 + η) = η(1 − η 2 ) which implies a − S(z) ≤ λ/2 + η. Similarly one proves the other inequality, and the claim follows. Applying the Transfer Principle to the contraction S we get a unique fixed point x of S. Then x ∈ D and xˆ is a fixed point of Tˆ satisfying ◦
ˆ ≤ aˆ − x ˆ ≤ ◦ λ. λ = aˆ − b ˆ + xˆ − b ≤◦ λ/2
≤◦ λ/2
Definition 4.6.22 A Banach space E has property (Sm ) if for every metrically convex set A of the unit sphere S(E) with diam(A) ≤ 1 there exists ξ ∈ E such that ξ is strictly positive on A.
4 Branch Spaces and Linear Operators
157
In [69] it is proved that a Banach space which is uniformly noncreasy (UNC for short) possesses property (Sm ). This class of Banach spaces contains the uniformly convex as well as the uniformly smooth spaces. An UNC-Banach space is superreflexive and its nonstandard hull is also UNC. So in particular it possesses property (Sm ). Therefore the following result is important: Theorem 4.6.23 (A. Wisnicki [69]) Let E be a superreflexive Banach space and assume that its nonstandard hull possesses (Sm ). Then E has FFP. In order to prove the theorem we have to recall some facts of the general fixed point theory: Let E do not have FFP. Then there exists a closed convex set K with more than one point and a nonexpansive mapping T on it without fixed points with the following properties: (i) conv(T (K )) = K . (ii) K is diametral, i.e. diam(K ) = sup y x − y for all x ∈ K . (iii) Every approximative fixed point sequence (xn )n satisfies limn xn − x = diam(K ) for all x ∈ K . By rescaling and shifting one can assume that 0 ∈ K , moreover that diam(K ) = 1 and finally that whenever K is weakly relatively compact there exists an approximate fixed point sequence converging weakly to 0. Now let Eˆ be the nonstandard hull in a polysaturated extension (see Sect. 2.9) of the standard world. Then using (iii) we can prove the following facts: (a) diam(Fi x(Tˆ )) = diam( Kˆ ) = diam(K ). (b) Kˆ , and Fi x(Tˆ ) are diametral and xˆ ∈ Fi x(Tˆ ) implies x ˆ = diam(K ). We now can use all that we have discussed above to prove Wi´snicki’s theorem. Proof Assume that E does not possess the FFP. Then there exists a closed convex subset K , say, and a nonexpansive mapping T on it without fixed points and with the properties listed above. Let (xn )n be an approximate fixed point sequence of T . Without loss of generality we may assume that (xn )n converges weakly to 0 (notice that all bounded closed subsets are weakly compact since E is reflexive). Moreover we can also assume diam(K ) = 1. Hence Fix(Tˆ ) is a subset of the unit sphere which is metrically convex by Theorem 4.6.21. Since Eˆ possesses (Sm ) there exists ˆ which is strictly positive on Fix(Tˆ ). an element η ∈ ( E) By Theorem 4.3.27 η = ξˆ for an element ξ ∈ Fin(E ). Because of Theorem 4.3.15 ξ is weakly nearstandard to a standard ϕ ∈ E . Then < x N , ϕ > 0 for all N ≈ ∞. Applying Robinson’s Sequential Lemma (Theorem 2.8.13) to the sequence (sn )n given by sn :=< xn , ξ − ϕ > (notice that sn 0 for all standard n) we obtain N ≈ ∞ such that < x N , ϕ > − < x N , ξ >=< x N , ξ − ϕ > 0, hence 0 < x N , ξ > because 0 < x N , ϕ >. But this implies x N ∈ Fi x(Tˆ ) as well as ˆ < x N , ξ >= 0, a contradiction.
158
M.P.H. Wolff
Remark 4.6.24 As discussed already above, this theorem implies that all UNC Banach spaces possess FFP, a result due to S. Prus [46].
4.6.5 References to Further Applications of Nonstandard Analysis To operator Theory (1) There is an easy generalization of the notion of compact operators based on containing E, and let T | F be Proposition 4.4.1: Let F be a closed subspace of E the class of continuous operators T such that T ( E) ⊂ F. The problem is how to characterize these operators in standard terms. A fruitful application was given in [58], where E is considered to be a Banach lattice and F is the closed ideal of generated by E. This application generalized considerably earlier results on E operators on atomic Banach lattices. This research is continued by V. G. Troitsky, see [66] in a very interesting manner. (2) Let E be an ordered Banach space, and let (Tt ) be a semigroup of positive operators. Assume that (Tt ) dominates asymptotically another semigroup (St ). More precisely this means that limt→∞ d(Tt x − St x, E + ) = 0 holds for all x in the positive cone E + of E. Which properties of (Tt ) are shared by (St )? Examples are stability, asymptotic almost periodicity and spectral properties (at least if additional assumptions are made concerning the underlying space E). For some important results see e.g. [14, 50–52]. (3) There are also some sophisticated applications to semigroups of operators on C ∗ –algebras and its applications to the mathematical theory of many particle systems, see e.g. [68, 72, 75].
4.6.6 Notes The first one who used Fréchet products in the context of spectral theory of strongly continuous semigroups seems to have been R. Derndinger [12]. Since then ultrapower techniques were used quite frequently (see e.g. [43]). For more recent results see the papers mentioned in the previous subsection. The research on the connection between discrete convergence of operators and the convergence of spectra in this generality is due to the author [77, 78]. More concrete results have been obtained in the context of approximation of pseudo– differential operators by Gordon et al. [2]. For another application of nonstandard analysis to approximation theory see [74]. There is a far going generalization of discrete convergence due to B. Silbermann and his school, see e.g. [59]. Nonstandard analysis however is not applied there up to now though it seems very promising. An extensive nonstandard analytical treatment of concrete closed operators, e.g. of differential operators, is to be found in [1].
4 Branch Spaces and Linear Operators
159
Acknowledgments I would like to thank Prof. C. W. Henson, University of Illinois at ChampaignUrbana who gave me the important reference to the work of D. Dacunha-Castelles and J. L. Krivine, and who read very carefully an earlier version of Sects. 4.1 and 4.2; Prof. E. Gordon, University of Nishni Novgorod and now Eastern Illinois University, with whom I discussed his own approach to the theory of discrete approximation; Dr. H. Ploss, University at Vienna, for many helpful discussions on the theory of strongly continuous semigroups, some of which prevented me from unsightly errors; and Prof. Dr. Eduard Emel’yanov from the Middle East Technical University at Ankara who carefully read the final version of the first edition eliminating some more misprints and errors.
References 1. S. Albeverio, J.E. Fenstad, R. Høegh-Krohn, T. Lindstrøm, Nonstandard Methods in Stochastic Analysis and Mathematical Physics (Academic Press, Orlando, 1986) 2. S. Albeverio, E. Gordon, A. Khrennikov, Finite dimensional approximations of operators in the spaces of functions on locally compact abelian groups. Acta Appl. Math. 64, 33–73 (2000) 3. H. Ando, U. Haagerup, Ultraproducts of von Neumann algebras. J. Funct. Anal. 266, 6842–6913 (2014) 4. S.K. Berberian, Approximate proper vectors. Proc. Am. Math. Soc. 13, 111–114 (1962) 5. S. Baratella, S.A. Ng, Some properties of nonstandard hulls of Banach algebras. Bull. Belg. Math. Soc. Simon Stevin 18, 31–38 (2011) 6. A.R. Bernstein, A. Robinson, Solution of an invariant subspace problem of K.T. Mith and P.R. Halmos. Pac. J. Math. 16, 421–431 (1966) 7. S. Buoni, R. Harte, A.W. Wickstead, Upper and lower Fredholm spectra I. Proc. Am. Math. Soc. 66, 309–314 (1977) 8. A. Connes, Noncommutative Geometry (Academic Press, New York, 1994) 9. J.J.M. Chadwick, A.W. Wickstead, A quotient of ultrapowers of Banach spaces and semiFredholm operators. Bull. Lond. Math. Soc. 9, 321–325 (1977) 10. D. Dacunha-Castelle, J.L. Krivine, Application des ultraproducts a l’etude des espaces et des algebres de Banach. Studia Math. 41, 315–334 (1995) 11. M. Davis, Applied Nonstandard Analysis (Wiley, New York, 1977) 12. R. Derndinger, Über das Spektrum positiver Generatoren. Math. Z. 172, 281–293 (1980) 13. N. Dunford, J. Schwartz, Linear Operators Part I (Interscience Publishers, New York, 1958) 14. E.Y. Emel’yanov, U. Kohler, F. Räbiger, M.P.H. Wolff, Stability and almost periodicity of asymptotically dominated semigroups of positive operators. Proc. Am. Math. Soc. 129, 2633–2642 (2001) 15. P. Enflo, J. Lindenstrauss, G. Pisier, On the “three space problem”. Math. Scand. 36, 199–210 (1975) 16. E.I. Gordon, A.G. Kusraev, S.S. Kutadeladze, Infinitesimal Analysis (Kluver Academic Publisher, Dordrecht, 2002) 17. G. Greiner, Zur Perron-Frobenius Theorie stark stetiger Halbgruppen. Math. Z. 177, 401–423 (1981) 18. U. Groh, Uniformly ergodic theorems for identity preserving Schwarz maps on W -algebras. J. Oper. Theory 11, 395–402 (1984) 19. S. Heinrich, Ultraproducts in Banach space theory. J. Reine Angew. Math. 313, 72–104 (1980) 20. S. Heinrich, C.W. Henson, L.C. Moore, A note on elementary equivalence of C(K ) spaces. J. Symb. Log. 52, 368–373 (1987) 21. C.W. Henson, L.C. Moore, The nonstandard theory of topological vector spaces. Trans. Am. Math. Soc. 172, 405–435 (1972) 22. C.W. Henson, Nonstandard hulls of Banach spaces. Isr. J. Math. 25, 108–144 (1976) 23. C.W. Henson, L.C. Moore, Nonstandard Analysis and the theory of Banach spaces, in Nonstandard Analysis—Recent Developments, ed. by A.E. Hurd (Springer, Berlin, 1983), pp. 27–112
160
M.P.H. Wolff
24. C.W. Henson, J. Iovino, Ultraproducts in analysis, in Analysisand Logic, ed. by C. Finet (e.) et al. Report on three minicourses given at the international conference "Analyse et logique" Mons, Belgium, 25-29 August, Cambridge University Press, London Mathematical Society Lecture Notes Series , vol. 262 (2002), pp. 1-110 25. E. Hewitt, K.H. Ross, Abstract Harmonic Analysis I (Springer, Berlin, 1963) 26. E. Hille, R.S. Phillips, Functional Analysis and Semi-groups, vol. 31 (American Mathematical Society Colloquium Publications, Providence, 1957) 27. T. Hinokuma, M. Ozawa, Conversion from nonstandard matrix algebras to standard factors of type I I1 . Ill. Math. J. 37, 1–13 (1993) 28. A.E. Hurd, P.A. Loeb, An Introduction to Nonstandard Real Analysis (Academic Press, Orlando, 1985) 29. R.C. James, Characterizations of reflexivity. Studia Math. 23, 205–216 (1963/64) 30. G. Janssen, Restricted ultraproducts of finite von Neumann algebras, in Contributions to NonStandard Analysis. Studies in Logic and the Foundations of Mathematics, vol. 69. ed. by W.A.J. Luxemburg, A. Robinson (North Holland, Amsterdam, 1972), pp. 101–114 31. M.A. Khamsi, B. Sims, Ultra-methods in metric fixed point theory, in Handbook of Metric Fixed Point Theory, ed. by W. Kirk, B. Sims (Kluwer Academic Publishers, Dordrecht, 2001), pp. 177–199 32. T. Kato, Perturbation Theory of Operators (Springer, Berlin, 1976) 33. A. Krupa, On various generalizations of the notion of an F-power to the case of unbounded operators. Bull. Pol. Acad. Sci. Math. 38, 159–166 (1990) 34. H.E. Lacey, The Isometric Theory of Classical Banach Spaces (Springer, New York, 1974) 35. J. Lindenstrauss, H. Rosenthal, The L p spaces. Isr. J. Math. 7, 325–349 (1969) 36. W.A.J. Luxemburg, A general theory of monads, in Applications of Model Theory to Algebra, Analysis and Probability, ed. by W.A.J. Luxemburg (Holt, Rinehart and Winston, New York, 1969), pp. 18–86 37. W.A.J. Luxemburg, Near–standard compact internal linear operators, in: Developments in nonstandard mathematics, eds. by N. J. Cutland et al. Pitman Res. Notes Math. Ser. vol. 336 (London, 1995) pp. 91–98 38. A. Martínez-Abejón, An elementary proof of the principle of local reflexivity. Proc. Am. Math. Soc. 127, 1397–1398 (1999) 39. B. Maurey, Points fixes des contractions de certains faiblement compact de L 1 . Seminaire d’analyses fonctionelle 1980–81 (Ecole Polytechnique, Palaiseau, 1981) 40. P. Meyer-Nieberg, Banach Lattices (Springer, New York, 1991) 41. G. Mittelmeyer, M.P.H. Wolff, Über den Absolutbetrag auf komplexen Vektorverbänden. Math. Z. 137, 87–92 (1974) 42. G.J. Murphy, C ∗ -Algebras and Operator Theory (Academic Press, Boston, 1990) 43. R. Nagel (ed.), One–Parameter Semigroups of Positive Operators. Lecture Notes in Mathematics vol. 1184 (Springer, Berlin, 1986) 44. E. Nelson, Internal set theory: a new approach to nonstandard analysis. Bull. Am. Math. Soc. 83, 1165–1198 (1977) 45. Siu-Ah Ng, Nonstandard methods in Functional Analysis. Lecture and Notes (World Scientific, Singapore, 2010) 46. S. Prus, Banach spaces which are uniformly noncreasy, in Proceedings of 2nd World Congress of Nonlinear Analysis (Athens 1996). Nonlinear Anal. 30, 2317–2324 (1997) 47. H.J. Reinhardt, Analysis of Approximation Methods for Differential and Integral Equations (Springer, Berlin, 1985) 48. A. Pazy, Semigroups of Linear Operators and Applications to Partial Differential Equations (Springer, Berlin, 1983) 49. F. Räbiger, M.P.H. Wolff, Superstable semigroups of operators. Indag. Math., N.S. 6, 481–494 (1995) 50. F. Räbiger, M.P.H. Wolff, Spectral and asymptotic properties of dominated operators. J. Aust. Math. Soc. (Series A) 63, 16–31 (1997)
4 Branch Spaces and Linear Operators
161
51. F. Räbiger, M.P.H. Wolff, On the approximation of positive operators and the behaviours of the spectra of the approximants. Integral Equ. Oper. Theory 28, 72–86 (1997) 52. F. Räbiger, M.P.H. Wolff, Spectral and asymptotic properties of resolvent-dominated operators. J. Aust. Math. Soc. 68, 181–201 (2000) 53. C.E. Rickart, General Theory of Banach Algebras (Van Nostrand, Princeton, 1960) 54. A. Robert, Functional analysis and NSA, in Developments in Nonstandard Mathematics, Pitman Research Notes in Mathematics Series, vol. 336, ed. by N.J. Cutland, et al. (Springer, London, 1995), pp. 73–90 55. B.N. Sadovskii, Limit-compact and condensing operators. Uspehi Math. Nauk 27, 81–146 (1972) 56. S. Sakai, C ∗ -Algebras (Springer, Berlin, 1971) 57. H.H. Schaefer, Banach Lattices and Positive Operators (Springer, Berlin, 1974) 58. A. Schepp, M.P.H. Wolff, Semicompact operators. Indag. Math. New Ser. 1, 115–125 (1990) 59. M. Seidel, B. Silbermann, Banach algebras of operator sequences. Op. Matrices 6, 385–432 (2012) 60. B. Sims, “Ultra”-techniques in Banach Space Theory. Queen’s Papers in Pure and Applied Mathematics , vol. 60 (Queen’s University, Kingston, 1982) 61. K.D. Stroyan, W.A.J. Luxemburg, Introduction to the Theory of Infinitesimals (Academic Press, New York, 1976) 62. Yeneng Sun, A Banach space in which a ball is contained in the range of some countable additive measure is superreflexive. Canad. Math. Bull. 33, 45–49 (1990) 63. D.G. Tacon, Generalized semi-fredholm transformations. J. Aust. Math. Soc. Ser. A 34, 60–70 (1983) 64. M. Takesaki, Theory of Operator Algebras (Springer, Berlin, 1978) 65. L.N. Trefethen, Pseudospectra of matrices, in: Numerical Analysis, Proceedings of the 14th Dundee Conference 1991 ed. by D.F. Griffiths, Pitman (London, 1993) pp. 234–264 66. V.G. Troitsky, Measures of noncompactness of operators on Banach lattices. Positivity 8, 165– 178 (2004) 67. G. Vainikko, Funktionalanalysis der Diskretisierungsmethoden (Teubner, Leipzig, 1976) 68. R. Werner, M.P.H. Wolff, Classical mechanics as quantum mechanics with infinitesimal . Phys. Lett. Ser. A 202, 155–159 (1995) 69. A. Wi´snicki, Towards the fixed point property for superreflexive spaces. Bull. Aust. Math. Soc. 64, 435–444 (2001) 70. A. Wi´snicki, On the structure of fixed point sets of nonexpansive mappings, in: Proceedings of the 3rd Polish Symposium on Nonlinear Analysis. Lecture Note in Nonlinear Analysis vol. 3, (2002), pp. 169-174 71. A. Wi´snicki, The super fixed point property for asymptotically nonexpansive mappings. Fundam. Math. 217, 265–277 (2012) 72. M.P.H. Wolff, R. Honegger, On the algebra of fluctuation operators of a quantum meanfield system. Quantum Probab. Relat. Top. IX, 401–410 (1994) 73. M.P.H. Wolff, Spectral theory of group representations and their nonstandard hull. Isr. J. Math. 48, 205–224 (1984) 74. M.P.H. Wolff, An application of spectral calculus to the problem of saturation in approximation theory. Note di Matematica XII, 291–300 (1992) 75. M.P.H. Wolff, A nonstandard analysis approach to the theory of quantum meanfield systems, in Advances in Analysis, Probability and Mathematical Physics—Contributions of Nonstandard Analysis, ed. by S. Albeverio, W.A.J. Luxemburg, M.P.H. Wolff (Kluwer Academic Publisher, Dordrecht, 1995), pp. 228–246 76. M.P.H Wolff, An introduction to nonstandard functional analysis, in: Proceedings of Nonstandard Analysis–Theory and Applications eds. by L.O. Arkeryd, N.J. Cutland C.W. Henson (Kluwer Academic Publisher, Dordrecht, 1997), pp. 121–151 77. M.P.H. Wolff, On the approximation of operators and the convergence of the spectra of the approximants. Op. Theory: Adv. Appl. 13, 279–283 (1998)
162
M.P.H. Wolff
78. M.P.H. Wolff, Discrete approximation of unbounded operators and the convergence of the spectra of the approximants. J. Approx. Theory 113, 229–244 (2001) 79. M.P.H Wolff, Discrete approximation of compact operators and approximation of their spectra, in: Nonstandard Methods and Applications in Mathematics, Lecture Notes in Logic vol. 25 (206) eds. by N.J. Cutland, Mauri Di Nasso, David A. Ross (A.K. Peters Ltd., Wellesley) pp. 224–231 80. M. Yahdi, Super-ergodic operators. Proc. Am. Math. Soc. 134, 2613–2620 (2006) 81. K. Yosida, Functional Analysis, 2nd edn. (Springer, Berlin, 1968)
Part III
Compactifications
Chapter 5
General and End Compactifications Matt Insall, Peter A. Loeb and Małgorzata Aneta Marciniak
5.1 Introduction Here we follow the work in [7], where nonstandard analysis [9, 12, 15] is used to extend and simplify previous works in the literature on compactifications. Recall that the monad of a standard point x is the intersection of all nonstandard extensions of standard open neighborhoods of x. Points in the monad of some standard point are called nearstandard; a remote point is any point in the nonstandard extension of the space that is not nearstandard. A space has at least one remote point if and only if it is not compact. A central theme of this chapter is the following: A compactification of a regular space is produced by any equivalence relation on the set of remote points. The new points of the resulting compactification are the equivalence classes of remote points. This yields a compact space containing the original point set as a dense subset. The relative topology on that dense subset is in some cases, however, weaker than the original topology. Compactifications constructed in the literature often employ a
M. Insall (B) Department of Mathematics and Statistics, Missouri University of Science and Technology, 400 W. 12th St., Rolla, MO 65409-0020, USA e-mail: [email protected] P.A. Loeb Department of Mathematics, University of Illinois, 1409 West Green Street, Urbana, IL 61801, USA e-mail: [email protected] M.A. Marciniak Department of Mathematics, Engineering and Computer Science, LaGuardia Community College, CUNY, 31-10 Thomson Avenue, Long Island City, NY 11101, USA e-mail: [email protected] © Springer Science+Business Media Dordrecht 2015 P.A. Loeb and M.P.H. Wolff (eds.), Nonstandard Analysis for the Working Mathematician, DOI 10.1007/978-94-017-7327-0_5
165
166
M. Insall et al.
continuous map from the original space into a compact space. Forming a compactification by attaching appropriate points is incisive; it allows a better understanding of the relationship between the original space and the set of compactifying points. For example, given a family of bounded real-valued functions on the original space, as was considered by the second author in [10], one can call two remote points equivalent if the nonstandard extension of each of the functions in the family has infinitesimal variation on the two point set. This leads to compactifications such as ˇ the Stone-Cech compactification, requiring only regularity, not complete regularity, of the original space. A continuum, i.e., a compact, connected, metrizable space, arises as a compactification in many natural situations. One approach to such compactifications is the theory of topological ends (see [3]). Using nonstandard analysis, we illuminate that theory and extend it. Simple examples of end compactifications are the two point compactification of the real line and the one point compactification of the complex plane. The notion of topological ends was introduced by Freudenthal in [3] to formalize the intuitive notion of a “hole” in a noncompact space. Freudenthal used sequences of connected open sets having nonempty compact boundary with empty intersection of the sets in the sequence. That approach was recently extended by the third author in [11] and then by the first and third author in [8] using nested nets of open sets with the same properties. The work in [7] on which this chapter is based shows that the special assumptions of these previous papers can be eliminated, so that our results apply to all regular, connected and locally connected topological spaces. Moreover, the definition presented here extends the notion of ends in those works in the sense that some spaces with intuitive “holes” fail to have ends (in the senses of the older works) at the locations of the “holes”, but with the newer definition given here (from [7]) they have ends there and only there.1 Among other results from [7] is the fact that a product space with two or more noncompact factors has only one end. Again, a simple example is the complex plane. Literature [5, 6] shows varied uses of ends and various methods of producing end compactifications. For example, Halin (see [5]) introduced graph-theoretical ends using equivalence classes of rays at infinity. Those ends are, in general, distinct from Freudenthal’s ends. Other literature, described in [7], includes work of Diestel [1] and Goldbring [4] as well as work that foreshadows [7] such as that of Salbany and Todorov [15, 16], and of course Robinson [13].
1 One
might consider applying nonstandard methods to the notion of ends presented in [8], leading one to use enlargements of directed sets and nets. The first author looked at this briefly with Mr. Tom Cuchta, but preliminary investigations suggest that the resulting theory is essentially equivalent—at greater complexity cost—with the work in [7].
5 General and End Compactifications
167
5.2 General Compactifications Let (Z , T ) be a topological space. We assume that Z is regular, by which we mean that for each p ∈ Z , the singleton set { p} is a closed set, and for each open neighborhood U of p there is an open neighborhood V ⊆ U of p whose closure, V , is contained in U . We also assume that Z is noncompact. If there is a compact subset K 0 of our original space such that the interior of K 0 is not regular, then we assume that Z is the complement of the interior of K 0 . We now fix a κ-saturated nonstandard extension of (Z , T ), where κ is greater than the cardinality of the topology T . Recall that compact subsets of Z are closed in Z . Since we are not requiring that Z be a metric space, all reference to monads means topological monads, rather than metric monads. That is, for a point p ∈ Z , the monad of p consists of those points in ∗ Z that are in the nonstandard extension of every standard open neighborhood of p. In case the topology is determined by a metric, the metric monads and topological monads of standard points coincide, but this makes the notion of “remote point” require some care because one can define monads of nonstandard points as well, and these monads may be different for the metric case and the topological case. Definition 5.2.1 Given x ∈∗ Z , x is remote if x is not in the topological monad of any standard point of Z . Equivalently, x is remote if and only if it is not nearstandard. Note that remote points need not be “far away” from standard points. For example, if a single standard point p is removed from the complex plane, the remaining points of the monad of p are then remote points in the nonstandard extension of the resulting space. Let there be given an equivalence relation, ∼, on the set of remote points of ∗ Z . We often refer to the relationship x ∼ y between two remote points x and y by saying that x and y are equivalent. In general, equivalence classes are external. In the following definition, the point set Z is the set in the standard model. Definition 5.2.2 A point of Z will be called an s-point. An equivalence class of remote points under the given equivalence relation ∼ will be called an r-point; it will be denoted by [x]∼ (where x is some representative of the equivalence class.) Let Y be the point set consisting of all s-points and r-points topologized as follows: 1. The nonstandard extension of Z , ∗ Z , is supplied with the S-topology. 2. The mapping ϕ : ∗ Z → Y is given by ϕ(x) =
stx [x]∼
if x is nearstandard, if x is remote.
3. The neighborhood filter base B(y) at a point y of Y is given as follows: B(y) =
{ϕ (∗ U ) : U ∈ T , y = [x]∼ ⊆ ∗ U } {ϕ (∗ U ) : U ∈ T , z ∈ U }
if y = [x]∼ is an r-point, if y = z is an s-point.
168
M. Insall et al.
In [7] it is shown that the above is a valid topologization of the set Y . That is, a set O ⊆ Y is declared “open” if it contains a member of the filter base of each of its points, or equivalently, provided that O=
ϕ ∗ U : ϕ ∗ U ∈ B( p), ϕ ∗ U ⊆ O . p∈O
Here is the relevant proposition with the proof left to the reader (or see [7]): Proposition 5.2.3 The collection B( p) at a point p ∈ Y is in fact a filter base. We let TY denote the topology on Y , i.e., the collection of open sets, generated by the neighborhood filter bases. Note that the members of the neighborhood filter bases are not in general open sets. To make use of the topology TY , it is necessary to make certain observations about the topological properties of the space (Y, TY ). Proposition 5.2.4 Let A ⊆ Z be nonempty. Then any s-point that is the standard part of a point in ∗ A is a point of the T -closure of A. Any s-point in the T -closure of A is also a point of the TY -closure of A. In fact, ϕ (∗ A) is a subset of the TY -closure of A. Proof That standard parts of points of ∗ A are points of the T -closure of A is clear. Assume that x ∈ Z is a point in the T -closure of A. Any TY open set W that contains x also contains a set ϕ (∗ U ) for which U ∈ T includes the point x. Consequently, some point z is in U ∩ A. Since z ∈ ϕ (∗ U ) ∩ A ⊆ W ∩ A, x is in the TY -closure of A. Thus any s-point in ϕ (∗ A) is in the TY -closure of A. If p is an r-point in ϕ (∗ A), let W ∈ TY contain p. There is a V ∈ T with the equivalence class corresponding to p contained in ∗ V and ϕ (∗ V ) ⊆ W . Because p ∈ ϕ (∗ A), ∗ A includes some remote point y in the equivalence class p = [y]∼ , and hence ϕ(y) = p. By the choice of V , y ∈ ∗ V . Since ∗ V ∩ ∗ A = ∅, downward transfer yields that V ∩ A includes some point x, whence x is in A and also in ϕ (∗ V ) ⊆ W . Thus every point in ϕ (∗ A) is a point of the TY -closure of A. There is a simple example in [7] of a space Z with an equivalence relation on the remote points and an open set U for which the TY -closure includes an r-point not in ϕ (∗ U ). The next result, taken from [7], is essentially the fact that the space Y we have constructed is a compactification of the original space Z . The proof relies upon a result of Salbany and Todorov [15], noted in Chap. 3 of this book, that ∗ Z , with the S-topology, is a compact space. Theorem 5.2.5 The map ϕ is a continuous surjection from ∗ Z onto Y , whence, Y is compact. Moreover, the point set Z is dense in Y supplied with the TY -topology. In general, the T -topology on Z is stronger than the relative TY -topology on Z . Definition 5.2.6 Let A ⊆ ∗ Z . Then A is not equivalence class splitting if for each remote point y ∈ A, we have [y]∼ ⊆ A.
5 General and End Compactifications
169
As indicated in [7] the following central result follows immediately from the regularity of (Z , T ) and the definition of members of TY . Theorem 5.2.7 Let O ⊆ Z be open, and let A O ⊆ ∗ O be a set of remote points that is not equivalence class splitting. Assume that: 1. To each x ∈ O corresponds a Vx ∈ T with x ∈ Vx such that the T -closure of Vx is contained in O and the remote points in ∗ Vx are contained in A O . 2. For every a ∈ A O , there is a T -open set Va such that the T -closure of Va is contained in O, every remote point in ∗ Va is in A O , and the equivalence class of a, [a]∼ , is contained in ∗ Va . Let W = x∈O ϕ (∗ Vx ) ∪ a∈A O ϕ (∗ Va ). Then 1. W ∈ TY , 2. W ∩ Z = O, 3. x ∈ O ⇒ x ∈ W ⊆ ϕ (∗ O) ∈ B(x), and 4. p = ϕ (a), a ∈ A O ⇒ p ∈ W ⊆ ϕ (∗ O) ∈ B( p). If the above hold for all open subsets of Z , then the T -topology on Z is actually the relative TY -topology on Z . The following two corollaries are taken from [7]. The easy proofs are left to the reader. Corollary 5.2.8 Assume that O ∈ T has the property that for each x ∈ O there is -closure of Vx is contained in O and ∗ Vx is not a Vx ∈ T with x ∈ Vx such that the T equivalence class splitting. Set W = x∈O ϕ (∗ Vx ). Then W ∈ TY and W ∩ Z = O. If each O ∈ T satisfies the above assumption, then the T -topology on Z equals the relative TY -topology on Z . Corollary 5.2.9 If T is a locally compact topology on Z , then the T -topology on Z equals the relative TY -topology on Z . Local connectivity of the space (Y, TY ) is at times an important consideration in what follows. In particular, we have the following result using the above notation. Corollary 5.2.10 If O ⊆ Z is an open and connected set, and if each of the sets Vx , x ∈ O, is connected, then W is connected. Proof Any set containing a connected set S and contained in the closure of S is connected. Therefore, the result follows from Proposition 5.2.4. As noted in [7], the last corollary could also be proved using the following connectivity result. Proposition 5.2.11 The nonstandard extension of any connected open subset of Z is also connected, in the S-topology.
170
M. Insall et al.
In [10], the second author of this chapter described methods of imbedding a Hausdorff space into a compact space so that each function in a given family of continuous functions on the original space has a continuous extension to the compactification, and the family of extensions separates the points of the “remainder”, i.e., the set of new points in the compact space. The following theorem from [7] describes how our approach here, using nonstandard methods and equivalence relations on the remote points, relates to that earlier work. The proof of this theorem, as presented in [7] is very straightforward in our setting. Moreover, applying this theorem to the class of all bounded, continuous, realˇ valued functions on a regular space Z gives an extension of the Stone-Cech compactification construction to include regular non-completely regular spaces, so that it is not necessary to imbed Z in a product space to obtain the compactification. As it is remarked in [7], application of this construction to R. Arens’ example of a regular but not completely regular space (see [2], p. 154), produces an example where the T -topology is stronger than the relative TY-topology. Theorem 5.2.12 Let Q be a collection of bounded, continuous, real-valued functions on Z , and call two remote points x and y of the enlargement ∗ Z equivalent provided that for each f ∈ Q, ∗ f (x) −∗ f (y) is infinitesimal. Let Y be the compactification for this equivalence relation. For each r-point p in Y , and each f ∈ Q, set f ( p) equal to the standard part of ∗ f (x) for all x in the equivalence class corresponding to p. Then f : Y → R is a continuous extension of f . Moreover the set of extensions of members of Q separates the r-points of Y .
5.3 End Compactifications The title of this section is not a political slogan. We work with a noncompact topological space (Z , T ) that is regular, connected and locally connected. If there is a compact subset K 0 of our original space such that, on the interior of K 0 , regularity or local connectivity fails, then we assume that Z is the complement of the interior of K 0 . Recall that any component W of the complement of a compact set is an open set because any x ∈ W has a connected open neighborhood that is contained in W . Let κ be a fixed cardinal number that is greater than the cardinality of the topology T . Throughout this and the next section, we will work in a κ-saturated nonstandard extension of (Z , T ). Let us say that a set A ⊆ ∗ Z is remote provided that each of its points is a remote point in ∗ Z . Definition 5.3.1 We say that two remote points x, y in ∗ Z are equivalent, and we write x ∼ y if there is a remote, internally connected set A with x ∈ A and y ∈ A. The following is an easy observation, following from the fact that the union of two connected sets containing a common point is connected.
5 General and End Compactifications
171
Proposition 5.3.2 The relation ∼ is an equivalence relation in the set of remote points. In general, the equivalence classes under this equivalence relation are external. The following definition is our incisive revision of the notions of ends that have been previously considered by Freudenthal in [3], and by the first and third authors of the present chapter of this book in [8]. It is this notion, easily available and highly intuitive only in the nonstandard setting, that provides such a natural expansion of the results of those previous works on end compactifications. Definition 5.3.3 We call the equivalence class containing a remote point x ∈ ∗ Z the end of Z represented by x. We apply here the results of the previous section on general compactifications. In particular, we employ the notion of a not equivalence class splitting subset of ∗ Z . Definition 5.3.4 We call the compactification corresponding to the equivalence relation, ∼, of this section the end compactification of Z . We call an open set O ∈ T non end-splitting, or just NES if ∗ O is not equivalence class splitting with respect to the equivalence classes forming ends.
Example 5.3.5 Let Z = 0, 13 ∪ 23 , 1 × [0, 1] in the real plane, and let T be the subspace topology on Z . As explained in [7], it follows that 1. Z has a unique end, e 2. e is the set of all points in the nonstandard extension of the square with positive but infinitesimal second coordinate and horizontal coordinate between 13 and 23 but not in the monad of either. 3. The standard points (1/3, 0) and (2/3, 0) in Z cannot be separated from each other by disjoint open neighborhoods in Y . 4. The standard points (1/3, 0) and (2/3, 0) in Z cannot be separated from e by disjoint open neighborhoods in Y . 5. The topology T is strictly stronger than the relative TY topology on Z . 6. the net method of [8] does not work to define the end in Z . Recall that a regular space is locally compact if and only if each of its compact subsets is contained in an open set for which the closure is compact. This standard fact is employed in the essentially standard proof of the following result (see [7]). Theorem 5.3.6 Let K be a compact subset of Z contained in an open set W ⊆ Z with compact closure W . Then all but a finite number of components of Z K are contained in the compact set W . It follows that if Z is locally compact, then the complement of any compact subset of Z has only a finite number of components with nonstandard extension containing remote points. The following example involves a connected and locally connected space that is not locally compact.
172
M. Insall et al.
Example 5.3.7 Let Z = {0} ∪
In ,
n∈N
where for each natural number n, In = (0, 1n ] is a homeomorphic copy of the halfopen interval (0, 1]. We take as a base for the neighborhood system of each p ∈ In the usual open base inherited from the real line. A typical element of the base for the neighborhood system of 0 is given by a standard positive ε < 1; it is the set Oε := {0} ∪
{x ∈ In : x < ε} .
n∈N
For each H ∈ ∗ NN, the non-infinitesimal points in I H form an end. These are the only ends. The monad of 0 is given by μ(0) = {0} ∪
{x ∈ In : x 0} .
n∈∗ N
The nonstandard extension of any standard open neighborhood U of 0 in Z has nonempty intersection with every end, so to form a member of B(0), ϕ (∗ U ) must contain the nonstandard interval I H for every unlimited H ∈ ∗ N, and thus In for all n ≥ m U for some m U ∈ N. Therefore, the end compactification of Z is not Hausdorff, but a Hausdorff quotient is obtainable by mapping every end to 0. Note that the number of ends in the compactification depends on the cardinality of ∗ N in the selected nonstandard extension. We give the statement and proof of this next proposition much as it is given in [7] for open sets. We formulate it more generally, but of course it still holds for open sets. The result demonstrates that Robinson’s nonstandard criterion for compactness can be ostensibly weakened for boundaries. Proposition 5.3.8 Suppose A ⊆ Z and that p∈ ∗ ∂ A is in the monad of some standard point x ∈ Z . Then actually, x ∈ ∂ A. It follows that the boundary ∂ A of a nonempty subset A of Z is compact if and only if every point α∈ ∗ ∂ A is in the monad of some standard point of Z . Proof Under the conditions of the proposition, let U be any standard open neighborhood of x; then p∈ ∗ U . Since p∈ ∗ ∂ A, the open set ∗ U contains points both inside and outside ∗ A, so by downward transfer, U contains points both inside and outside A. It follows that x ∈ ∂ A, whence p is nearstandard in ∗ ∂ A. Therefore, if all points of ∗ ∂ A are in monads of standard points of Z , then they are in monads of standard points of ∂ A, so ∂ A is compact. The converse clearly holds. Connectedness and, in particular, components play a significant role in investigations relating ends to compactifications. We state without proof the following result
5 General and End Compactifications
173
about non end-splitting components, leaving the proof to the reader, or the reader may refer to [7]. Proposition 5.3.9 Any connected component of the complement of a compact set is non end-splitting. The assumption of local compactness simplifies the study of ends, so the following proposition is useful. Proposition 5.3.10 If Z is locally compact, and if x and y are non-equivalent remote points in ∗ Z , then there is a compact set K ⊆ Z such that x and y are in the nonstandard extensions of different components of Z \ K . Proof To assume the contrary, suppose that for every compact set K , x and y are in the nonstandard extension of the same component of Z \ K . Using local compactness and saturation, it follows that there is a nonstandard compact set C containing all near-standard points such that x and y are in the same component of ∗ Z \ C, so that y ∼ x. Since Z is connected, the only nonempty subset of Z with empty boundary is Z . Clearly, Z is not end-splitting. In general, we have the following result for non-trivial open subsets of Z . We refer the reader to [7] for the proof of this result. Theorem 5.3.11 A nonempty open set has a compact boundary if and only if it is non end-splitting. In [8], the first and third authors defined ends of topological spaces in terms of nets of open sets ordered by reverse inclusion (i.e., V ≥ U iff V ⊆ U ). The sets forming the net each have nonempty boundary and have empty intersection of the closures. For any such net, each member U contains an open subset with compact boundary. That is, each such net is refined by some net of nonempty open sets, each having compact boundary. Corollary 5.3.12 Let {Uα }α∈ be a net of nonempty open subsets of Z with compact boundary directed by downward inclusion and having empty intersection of the ∗ U , the end represented by x is contained closures. For any remote point, x, in α α∈ in α∈ ∗ Uα . Proposition 5.3.13 Under the assumptions of the above corollary, let x and y be remote points in α∈ ∗ Uα . Then y is in the end represented by x. Therefore this end is the same as the end determined by the method of [8] using the net {Uα }α∈ . Proof Because of saturation, x and y are in some common internally connected open set V ⊂ α∈ ∗ Uα . If z ∈ Z (i.e., if z is a standard point of ∗ Z ), then for some index α, z ∈ Z \ U α . Consequently, no point of V is in the monad of z. That is, no point of V is nearstandard. Thus, x and y are equivalent, and determine the same end.
174
M. Insall et al.
The above corollary and proposition, both from [7], illuminate the relationship between the approach in [8] and the one here using nonstandard methods. In particular, if for any such net, each member U contains an open subset with compact boundary (as is assumed in [8]), then the ends obtained in [8] and our ends defined using nonstandard methods, are in a natural one-to-one correspondence. Example 5.3.5 demonstrates a case for which the assumption of [8] is not satisfied but nonstandard ends still can be used. Here is an example with an infinite number of “ends that are near”. Example 5.3.14 Let C denote the complex plane, topologized with the usual topology, let Q denote the set of rational numbers on the real-axis, and let I be the open real interval (0, 1). Set Z = C \ (Q ∩ I ). The space Z is not locally compact, and the nonstandard rational numbers strictly between 0 and 1 are not included in ∗ Z . Nonstandard methods sharpen the analysis in [8] of the space Z . As is true for the complex plane itself, nonstandard points outside the extension of every standard bounded set are remote and form a single end. This can be proved, for example, using our next section since C is topologically the product of two copies of the real line. Now if γ is an irrational number between 0 and 1, then points of ∗ C that have not been removed in the monad of γ are mapped by ϕ onto γ. Points in ∗ C in the monad of a removed rational number q ∈ (0, 1) are remote and form a single end. Open discs in Z that are symmetric about the real axis and intersect that axis in a set with irrational minimum and maximum values are NES. The resulting end compactification is homeomorphic with the extended complex plane, i.e., the Riemann Sphere.
5.4 Product Spaces Direct products of sets play a significant role in much of mathematics. For example, they are involved in defining the notion of an algebra, and hence in the study of topological algebras, including but not limited to, the classical structures of analysis, such as topological groups, topological rings, topological vector spaces, and such. Here we describe some connections of our previous sections to products of topological spaces and the compactifications of such product spaces. For each α in some indexing set I, let X α be a given topological space, and consider the corresponding space X α supplied with the product topology. Recall α∈I
that the projection mapping defined from such a product space onto a factor of that space is an open mapping, meaning that it takes any open subset of the product onto an open subset of that factor. X α , let pα denote the projection of p onto X α . That is, For any point p in α∈I
p = ( pα )α∈I . Let st (I) denote the collection of all standard indices in ∗ I. For each α ∈ st (I), let μα ( pα ) be the monad of pα in X a . The monad of p in the product space is the product
5 General and End Compactifications
175
μα ( pa ) ×
X α.
α∈∗ Ist(I )
α∈st(I )
It is not difficult to see then that a point p in the nonstandard extension of a product is remote if and only if there exists some α ∈ st (I) such that pα is remote. To apply our results on ends, we now assume that each X α is connected and locally connected. X α is connected if and Proposition 5.4.1 Fix β ∈ I and A ⊆ X β . Then A × α =β
only if A is a connected subset of X β . Proof Let U and V be a pair of nonempty open sets in the product space that form a disconnection of A × X α . Note that the projection of U and of V onto X β α =β
and each X α must be nonempty, and U ∪ V covers A ×
X α . For each α = β,
α =β
the projections of U and of V onto X α have a nonempty intersection since X α is connected. Moreover, the projections on X β form a disconnection of A. The desired result follows. Theorem 5.4.2 Assume that at least two spaces in the family X α , α ∈ I, are nonX α has exactly one end. compact. Then the product space α∈I
Proof We reproduce the proof given in [7]. Given a remote point pβ in ∗ X β where β ∈ st (I), the product pβ × ∗ X α is internally connected and contains no α =β
nearstandard point. Similarly, given a remote point pγ in ∗ X γ where γ ∈ st (I) and γ = β, the product pγ × ∗ X α is internally connected and contains no α =γ
nearstandard point. These two connected sets have points in common, namely those with projection pβ in ∗ X β and projection pγ in ∗ X γ . Therefore, in terms of the equivalence relation for ends, all points of pβ × ∗ X α are equivalent to all
points of pγ ×
∗
α =γ
α =β
X α . Moreover, as noted above, a point is remote in ∗
if and only if the projection pβ is remote for at least one β ∈ st (I).
Xα
α∈I
Now let p be any point in the nonstandard extension of the product, and assume that exactly one index, β, corresponds to a noncompact factor of the product space in question. Then p is remote if and only if pβ is remote. Also, it is plain to see that an internal set in the nonstandard extension of the product is (internally) connected and contains no nearstandard points if and only if its projection onto ∗ X β shares that property. Consequently, we have
176
M. Insall et al.
Theorem 5.4.3 Suppose in the above setting X α is compact for each α = β and X β is not compact. Then the number of ends for X α is the number of ends for X β . α∈I
Remark 5.4.4 Work that remains in our research on compactifications includes applications to pre-topologies (see [14]), to box products and proximity spaces, and to compactifications of topological algebras.
References 1. R. Diestel, D. Kühn, Graph-theoretical versus topological ends of graphs. J. Comb. Theory Ser. B 87, 197–206 (2003) 2. J. Dugundji, Topology (Allyn and Bacon Inc. Boston, 1966) 3. H. Freudenthal, Uber die Enden topologischer Räume und Gruppen. Math. Z. 33, 692–713 (1931) 4. I. Goldbring, Ends of groups: a nonstandard perspective. J. Log. Anal. 3(7), 1–28 (2011) 5. R. Halin, Über unendliche Wege in Graphen. Math. Ann. 157, 125–137 (1964) 6. H. Hopf, Enden offener Räume unendliche diskontinuierliche Gruppen. Comment. Math. Helv. 16, 81–100 (1943/4) 7. M. Insall, P.A. Loeb, M.A. Marciniak, End compactifications and general compactifications. J. Log. Anal. 6(7), 1–16 (2014). doi:10.4115/jla.2014.6.7 8. M. Insall, M.A. Marciniak, Nets defining ends of topological spaces. Top. Proc. 40, 1–11 (2012) 9. H.J. Keisler, An infinitesimal approach to stochastic analysis. Mem. Am. Math. Soc. 48, 297 (1984) 10. P.A. Loeb, Compactifications of Hausdorff spaces. Proc. Am. Math. Soc. 22, 627–634 (1969) 11. M.A. Marciniak, Holomorphic extensions in smooth toric surfaces. J. Geom. Anal. 22, 911–933 (2012) 12. A. Robinson, Non-standard Analysis (North-Holland, Amsterdam, 1966) 13. A. Robinson, Compactification of groups and rings and nonstandard analysis. J. Symb. Log. 34, 576–588 (1969) 14. J. Šlapal, A Digital pretopology and one of its quotients. Top. Proc. 39, 13–15 (2012) 15. S. Salbany, T. Todorov, Nonstandard analysis in topology: nonstandard and standard compactifications. J. Symb. Log. 65, 1836–1840 (2000) 16. S. Salbany, T. Todorov, Lecture Notes: Nonstandard Analysis in Topology, arXiv:1107.3323
Part IV
Measure and Probability Theory
Chapter 6
Measure Theory and Integration Horst Osswald
6.1 Introduction Loeb measures have been applied in various fields of real analysis. In his fundamental paper [15] Peter Loeb has given the first applications to probability theory. Also developed at that time (and published later in [16]) was an application constructing representing measures in potential theory. (See Sect. 3.12.2.) Fix an unlimited positive integer H and put T := H1 , H2 , . . . , H . This set T is infinite, but ∗ finite, and can be interpreted as a “time line”, which is closely related to the continuous time line [0, ∞[, because each real number between 0 and ∞ is infinitely close to some t ∈ T . The sample space ∗ RT of our probability spaces is also fixed, where ∗ RT is the internal set of all internal H 2 -tuples of elements in ∗ R. We also fix an internal stochastic process B : ∗ RT × T → ∗ R, defined by B (X, t) := s∈T, s≤t X s . However, in case of the infinite-dimensional Brownian motion, ∗ R is replaced by a hyperfinite-dimensional Euclidean space. The value B(X, t) can be understood as the profit (or loss) at time t during the game X ∈ ∗ RT . This profit or loss depends on the special probability measure we have to choose on the sample space ∗ RT . Let us start with Peter Loeb’s construction of Poisson processes from a hyperfinite model of tossing an extremely manipulated coin. Fix a standard β > 0. Let l 1 be the internal Borel probability measure on ∗ R, concentrated on {0, 1}, where l 1 ({0}) = 1 − Hβ and l 1 ({1}) = Hβ . Let l be the internal H 2 -fold product of l 1 on ∗ RT . Note that for each X ∈ {0, 1}T
β l ({X }) = 1 − H
m H 2 −m β · , H
H. Osswald (B) Mathematisches Institut der Universität München, Theresienstr. 39, 80333 Munich, Germany e-mail: [email protected] © Springer Science+Business Media Dordrecht 2015 P.A. Loeb and M.P.H. Wolff (eds.), Nonstandard Analysis for the Working Mathematician, DOI 10.1007/978-94-017-7327-0_6
179
180
H. Osswald
where m is the number of zeros occurring in X . Using that (X, t) → s≤t X s − El 1 x is an l-square integrable martingale, we will see that for l L -almost all X ∈ ∗ RT and all r ∈ [0, ∞[ Bl (X, r ) := olim ◦ B(X, s),
(+)
s↓r
is well defined. Here limo s↓r denotes the right hand limit at r . Moreover, Loeb has shown that Bl is a Poisson process of rate β. We will see that Bl is a càdlàg process, which means that it is right continuous and has left hand limits l L -almost surely. The next convincing example of the usefulness of Loeb measures is Bob Anderson’s [2] construction of Brownian motion from a hyperfinite model of tossing an unbiased coin now. Let a 1 be the internal Borel probability measure on ∗ R, concentrated on − √1 , √1 , setting H
H
a1
1 √ H
=
1 = a1 2
1 −√ H
.
Let a be the H 2 -fold product of a 1 on ∗ RT . Then for all X ∈ a ({X }) :=
√1 , − √1 H H
T
1 . 2 ( 2 H )
Anderson has shown that for a L -almost all X ∈ ∗ RT and all limited t ∈ T , Ba : ∗ RT × [0, ∞[→ R, (X, ◦ t) →
◦
B(X, t)
(++)
is well defined and continuous. Moreover, he has proved that Ba is the onedimensional Brownian motion. In view of this result, the process B is not only a model for tossing an unbiased coin, but B also describes a random walk; that is the motion of a particle moving along the real axis in √1 steps with the probability of a left—or a right step being H
equal to 21 . Of course, we could have defined Ba using the equation under (+). Then Ba , defined by (++), were a continuous version of Ba , defined by (+). The internal process B is a Brownian motion for the internal measure a with an infinitesimal error, see Exercise 11.5 in [21]. Cutland [6] has used a measure C on ∗ RT for which B is a correct internal Brownian motion, see Proposition 11.2.1 in [21]: let C be the internal H 2 -fold product of the internal centered Gaußian measure C 1 on ∗ R of variance H1 , i.e.,
C (A) := 1
e A
− H2 x 2
dx
H . 2π
6 Measure Theory and Integration
181
Cutland has shown that the process BC := Ba is C L -a.s. well defined, continuous and the one-dimensional Brownian motion. Here is the difference between BC and Ba , BC is well defined C L -a.s., while Ba is well defined a L -a.s. However, both processes are the Brownian motion. Similar results hold for arbitrary one-dimensional Lévy processes. Let L and be two possibly very different Lévy processes. It is shown in Chap. 15 of [21] that there exist internal Borel probability measures m 1 , μ1 on ∗ R such that L can be identified with Bm and with Bμ . Here m and μ are again the H 2 -fold internal products of m 1 , μ1 , respectively, and Bm and Bμ are given by equation (+). Then Bm is well defined m L -almost surely and Bμ is well defined μ L -almost surely, where m L and μ L are the Loeb probability measures over m and μ. Note that the processes Bm and Bμ are standard parts almost surely of the same internal process B; almost surely with respect to possibly very different measures. The processes Bμ and can be identified and, of course, Bm and L, because they satisfy the same Lévy triplet. Lévy triplets characterize Lévy processes via the Lévy Khintchine formula, the Fourier transformation of Lévy processes. Lévy triplets for one-dimensional Lévy processes are of the form (a, C, ρ) where a, C ∈ R with C ≥ 0 and ρ is a Borel measure on R, the so-called Lévy measure. It is in general infinite and can be obtained from a suitable Loeb measure on ∗ R. While the Lebesgue measure is infinite in the neighborhood of infinite, the Lévy measure is infinite in the neighborhood of zero, if the Lévy measure itself is infinite. In case of Brownian motion the Lévy triplet is (a, C, ρ) = (0, 1, A → 0), in case of the Poisson process of rate β (a, C, ρ) = (β, 0, A −→ β · 1 A (1)). Each Lévy triplet can be satisfied by a process B◦ , defined under (+), where ◦ is the internal product on ∗ RT of a certain internal probability measure ◦1 on ∗ R. For details see the work of Lindstrøm [14] or [21], Chap. 15. In Sect. 6.2.6 we will see that Lebesgue measure on [0, ∞[ is equivalent to a Loeb counting measure on T . All the preceding expositions are striking examples of the usefulness of Loeb measures. In Chap. 7 Malliavin calculus will be studied for the infinite-dimensional Brownian motion, extending Cutland’s [6] and Cutland and Ng’s [7] results for the onedimensional case, following [21]. We also study Malliavin calculus for symmetric Poisson processes, which are, from a certain point of view, more subtle than the non-symmetric Poisson processes. In the following fix a superstructure V of cardinality κ such that at least the real numbers are individuals and a monomorphism ∗ from V into a κ+ saturated nonstandard model W . Given a measure μ, we will say that a property holds μ-a.e. (almost everywhere) if it holds outside a set of μ-measure 0. If μ is a probability measure, we shall write μ-a.s. (almost surely) instead of μ-a.e.
182
H. Osswald
6.2 Loeb Measures Loeb spaces (, L μ (C), μ L ) are—in general—finite σ-additive complete measure spaces in the usual standard sense. They enjoy the following properties. The σ-algebra L μ (C) is generated by an internal algebra C. By saturation, this algebra is rich enough to guarantee that each element of L μ (C) is equivalent to an element of C (see (+) below). Moreover, the σ-additive measure μ L on L μ (C) is infinitely close to an internal measure μ defined on the generating set C (see (++) below). In particular, if C is ∗ finite, then μ L may be infinitely close to a counting measure μ. In Sect. 6.2.6 we will see that, for instance, the Lebesgue measure on [0, ∞[n is σ-isomorphic to a now infinite Loeb measure on a ∗ finite set. Therefore, the Lebesgue measure can be treated in a certain sense as though it were a counting measure.
6.2.1 Loeb Measure Spaces Let be an internal nonempty set in W and let C be an internal algebra on . Recall that W is a nonstandard model, introduced at the beginning of Sect. 2.10. By internal induction, one can show that for each k ∈ ∗ N and each internal k-tuple (A1 , . . . , Ak ) in C, A1 ∪ · · · ∪ Ak ∈ C and A1 ∩ · · · ∩ Ak ∈ C. Assume that μ is an internal finitely additive measure defined on C with values in ∗ [0, S] for some S ∈ R. Again by internal induction, μ(A1 ∪ · · · ∪ Ak ) = μ(A1 ) + · · · + μ(Ak ) for each k ∈ ∗ N and each internal k-tuple (A1 , . . . , Ak ) in C such that Ai ∩ A j = ∅ for i = j. Moreover, the set function ◦
μ : C A → ◦ (μ(A))
is a finitely additive measure on the algebra C with values in [0, S]. From Proposition 2.9.7 (1) it follows that ◦ μ is even σ-additive on the algebra C. By Caratheodory’s extension theorem, ◦ μ can be extended to a measure on the σ-algebra σ(C) generated by C, that is, σ(C) is the intersection of all σ-algebras on containing the elements of C. The completion of this measure is called the Loeb measure associated with μ. We shall now present a more informative construction of Loeb measures, combining both methods in the articles of Loeb [15, 17]: An arbitrary, possibly external, subset N ⊆ is called a μ L -nullset, if μouter (N ) := inf
◦
μ(A) | A ∈ C, N ⊆ A = 0.
Set Nμ L := {N ⊆ | N is a μ L -nullset} .
6 Measure Theory and Integration
183
Lemma 6.2.1 (a) Each subset of a μ L -nullset is a μ L -nullset. (b) The set of μ L -nullsets is closed under countable unions. Proof (a) is obvious. To prove (b), assume that N1 , . . . , Nk , . . . ∈ Nμ L . In order to show that k∈N Nk ∈ Nμ L , fix an ε ∈ R+ . For each k ∈ N there exists an Ak ∈ C with Nk ⊆ Ak and ◦ μ(Ak ) < 2εk , thus μ(Ak ) < 2εk . Set Bk := A1 ∪ · · · ∪ Ak . Then μ(Bk ) < ε. Now we apply Theorem 2.10.18. Let (Bk )k∈ ∗ N be an internal extension of (Bk )k∈N with μ(Bk ) < ε for all k ∈ ∗ N. By the Spillover Principle, there exists K ∈ ∗ N with Bk ⊆ Bk+1 and μ(Bk ) < ε for all k ≤ K . an unlimited Since k∈N Nk ⊆ k∈N Ak ⊆ B K and μ(B K ) < ε, we conclude that k∈N Nk is a μ L -nullset. In the following we will use the techniques in the proof of Lemma 6.2.1 over and over again. Then we will often simply write: by saturation … If B ⊆ and A ∈ C, then A is called a μ L -approximation of B if the symmetric difference AB := (A \ B) ∪ (B \ A) of A and B is a μ L -nullset. We define L μ (C) := {B ⊆ | B has a μ L -approximation A ∈ C} ,
(+)
μ L (B) := ◦ μ(A) if A is a μ L -approximation of B.
(++)
Theorem 6.2.2 (Loeb [15]) (1) (2) (3) (4) (5)
μ L is well defined, that is, μ L does not depend on the chosen μ L -approximation. L μ (C) is a σ-algebra with C ⊆ L μ (C). μ L : L μ (C) → [0, S] is σ-additive. L μ (C) is complete. A subset B ⊆ belongs to L μ (C) iff for each ε ∈ R+ there exist A, A ∈ C such that A ⊆ B ⊆ A and μ(A \A) < ε.
Proof (1) Assume that A and A are μL -approximations of B ⊆ . Then AA ∈ Nμ L , because AA ⊆ (AB) ∪ BA ∈ Nμ L . Since AA ∈ C, wehave μ(AA ) 0. Since A is the disjoint union of A \ A) and (A \ A \ A , we have μ(A) μ(A \ A) + μ(A) − μ(A \ A ) = μ(A ). Therefore, μ(A) μ(A ), thus, ◦ μ(A) = ◦ μ(A ). (2) Since each A ∈ C is a μ L -approximation of A, we obtain C ⊆ L μ (C). We will now show that L μ (C) is a σ-algebra. Since ∈ C, ∈ L μ (C). Fix B, B ∈ L μ (C) with μ L -approximations A, A ∈ C of B, B , respectively. Then A\A is a μ L -approximation of B\B and A ∩ A is a μ L -approximation of B ∩ B , because of the lemma and because
A \ A B \ B ∪ A ∩ A B ∩ B ⊆ (AB) ∪ (A B ) ∈ Nμ L .
184
H. Osswald
Fix a sequence (Bk )k∈N in L μ (C) such that Bi ∩ B j = ∅ for i = j, and for each Bk fix a μ L -approximation Ak . We may assume that Ai ∩ A j = ∅ for i = j, because if Ck is a μ L -approximation of Bk , k ∈ N, then Ck \(C1 ∪ · · · ∪ Ck−1 ) is a μ L -approximation of Bk . The latter is the case, because (recall that the Bi are pairwise disjoint) Bk (Ck \ (C1 ∪ · · · ∪ Ck−1 )) ⊆
k
(Bi Ci ) ∈ Nμ L .
i=1
Since
k i=1
◦ μ(A
i)
= ◦ μ(
k i=1
s :=
Ai ) ≤ ◦ μ() ≤ S for all k ∈ N,
∞
◦
μ(Ak ) ∈ [0, S].
k=1
By 6.2.1), there is an A ∈ C with Ak := saturation (see the proof of Lemma 1 k∈N Ak ⊆ A and μ(A) < s + k for each k ∈ N. It follows that +μ(A) s. Ak , fix ε ∈ R . We may In order to show that A is a μ L -approximation of choose k ∈ N such that s − ◦ μ(Ak ) < ε where Ak := A1 ∪ · · · ∪ Ak . It follows that Ak ⊆ A \ Ak ∈ C and μ(A \ Ak ) s − ◦ μ(Ak ) < ε. A Ak = A \ Moreover, A is a μ L -approximation of Lemma 6.2.1, A
Bk ⊆
Bk :=
k∈N
Bk , because, by
Ak ∈ Nμ L . (Ak Bk ) ∪ A
This proves that Bk ∈ L μ (C). (3) We shall now show that μ L is σ-additive. Choose Bk and Ak , k ∈ N, and A as in the proof of (2). Then μL (
Bk ) = ◦ μ(A) = s =
∞ k=1
◦
μ(Ak ) =
∞
μ L (Bk ).
k=1
(4) Assume that μ L (B) = 0 and N ⊆ B. Fix a μ L -approximation A of B. Then A is also a μ L -approximation of N , because A, AB ∈ Nμ L and N A ⊆ A ∪ (AB). It follows that N ∈ L μ (C). (5) “⇒” Fix B ∈ L μ (C), a μ L -approximation C ∈ C of B, and an ε ∈ R+ . Then there exists a D ∈ C with CB ⊆ D and μ(D) < ε. We now have C \ D ⊆ B ⊆ C ∪ D and μ ((C ∪ D) \ (C \ D)) ≤ μ(D) < ε.
6 Measure Theory and Integration
185
“⇐” Suppose that for each n ∈ N there are An , An ∈ C with An ⊆ B ⊆ An and such that μ(An \An ) < n1 . By saturation, there is an A ∈ C with An ⊆ A ⊆ An for each n ∈ N. Fix ε ∈ R+ and n ∈ N with n1 < ε. Then AB ⊆ An \ An ∈ C and μ(An \ An ) <
1 < ε. n
This proves that A is a μ L -approximation of B, thus, B ∈ L μ (C). Definition 6.2.3 The measure space (, L μ (C), μ L ) is called the Loeb space over (, C, μ). The measure μ L is called the Loeb measure over μ. The interested reader can find information about vector valued Loeb measures and their applications in, for example, [18, 20].
6.2.2 Loeb Measures over Gaußian Measures In this section we present a first example of a Loeb measure and give a useful and intuitive characterization of measurable norms, which build the basic for infinitedimensional Gaußian measures and the Malliavin calculus on abstract Wiener spaces. Fix a separable real Hilbert space (H, ·) inside the standard model V . Let E denote the set of all finite-dimensional subspaces of H. Fix E ∈ E and an orthonormal basis (e1 , . . . , ed ) of E. The centered Gaußian measure γ E of variance 1 is defined on E by setting for all Borel sets B ⊆ E γ E (B) :=
1 √ 2π
d
(x1 ,...,xd )∈Rd |
d
i=1 xi ei ∈B
e
− 21
d
2 i=1 xi
d(x1 , . . . , xd ).
Note that γ E does not depend on the orthonormal basis of E (see Lemma 4.2.1 in [21]). Note that γ assigns to each E ∈ E the Gaußian measure γ E on E of variance 1. By Transfer, ∗ γ assigns to each E ∈ ∗ E the internal Gaußian measure (∗ γ) E on E of variance 1. Orthogonality on H is denoted by ⊥. A norm |·| on H is called Gross measurable (see [9]) if for all n ∈ N there exists an E n ∈ E such that for all E ∈ E with E ⊥ E n 1 1 < . γ E x ∈ E | |x| ≥ n n Proposition 6.2.4 Fix a norm |·| on H. The following statements are equivalent: (a) |·| is Gross measurable. (b) For all E ∈ ∗ E with E ⊥ ∗[H] (which means that E ⊥ ∗ a for all a ∈ H),
186
H. Osswald
∗ E x∈E| γ
L
|x| 0 = 0.
This means that the Gaußian Loeb measure on ∗ finite-dimensional subspaces of ∗ H, orthogonal to ∗ [H], is concentrated on the infinitesimals. Proof “(a) ⇒ (b)” Let |·| be Gross measurable and let E ⊥ with E ∈ ∗ E. ∗[H] E 1 1 ∗ ∗ It suffices to show that for n ∈ N, ( γ) x ∈ E | |x| ≥ n < n . There exists an E n ∈ E such that 1 1 < . ∀F ∈ E, F ⊥ E n γ F x ∈ F | |x| ≥ n n By Transfer, and since E ⊥ ∗ E n , we obtain the desired result. “(b) ⇒ (a)” Assume (a) is not true. Then there exists an n ∈ N such that for all E ∈ E there exists an F ∈ ∗ E, F ⊥ ∗ E with (∗ γ) F x ∈ F | ∗ |x| ≥ n1 ≥ n1 . ∗ E with ∗ E ⊥ F for all E ∈ E and By saturation, there exists an1 F ∈ F 1 ∗ ∗ ( γ) x ∈ F | |x| ≥ n ≥ n . But then F ⊥ ∗[H]. It follows that
∗ F
γ
L
x∈F|
∗
|x| 0 ≥
◦∗ F
γ
x∈F|
∗
|x| ≥
1 n
≥
1 , n
Thus (b) is not true.
6.2.3 Loeb Measurable Functions In this section we will show that L μ (C)-measurable functions are infinitely close to internal C-measurable functions. Fix a separable metric space (M, d) and assume that M is a set in the standard model V. Then the metric d is also a set in V. Recall that a function f : → M is L μ (C)-measurable if f −1 [G] ∈ L μ (C) for each open set G ⊆ M. By transfer, an internal function F : → ∗ M is called C-measurable if F −1 [G] ∈ C for each ∗ open set G ⊆ ∗ M. If F is C-measurable and the range of F is ∗ finite, then F is called C-simple. A C-measurable function F : → ∗ M is called a lifting of f : → M if f (X ) F(X ) for μ L -almost all X ∈ , that is, if F and f are equal up to a nullset and an infinitesimal error. Theorem 6.2.5 (Loeb [15], Anderson [2]) A function f : → M is L μ (C)measurable iff f has a lifting. We even may assume that the lifting has a ∗ finite range. Proof “⇒” Assume that f is L μ (C)-measurable. Fix a countable open base (G n )n∈N of M. Since Bn := f −1 [G n ] ∈ L μ (C), Bn has a μ L -approximation An ∈ C. It ∗ ∗ suffices to find a C-measurable function F : → M (with finite range) such that for all x ∈ \ n∈N An Bn the following implication holds: If f (x) ∈ G n , then
6 Measure Theory and Integration
187
F(x) ∈ ∗ G n , that is, x ∈ f −1 [G n ] = Bn ⇒ F(x) ∈ ∗ G n . Now, if x ∈ / An Bn , then x ∈ Bn iff x ∈ An . Therefore, we try to find an F such that x ∈ An ⇒ F(x) ∈ ∗ G n for each n ∈ N. However, since there may exist i 1 , . . . , i k ∈ N with Ai1 ∩ · · · ∩ Aik = ∅ and ∗ G i1 ∩ · · · ∩ ∗ G ik = ∅, such a function F cannot exist. We are therefore led to consider coarser μ L -approximations An of Bn : By recursion, we define a sequence (An )n∈N in C as follows. Set A1 if G 1 = ∅ (which is equivalent to ∗ G 1 = 0) A1 := ∅ if G 1 = ∅. Then A1 is a μ L -approximation of B1 and A1 = ∅ implies ∗ G 1 = ∅. Assume that k ≥ 1 and A1 , . . . , Ak−1 are already defined so that Ai is a μ L -approximation of Bi and Ai1 ∩ · · · ∩ Aim = ∅ implies ∗ G i1 ∩ · · · ∩ ∗ G im = ∅ for all i 1 , . . . , i m ∈ {1, . . . , k − 1}. Define Ak := Ak \
Ai 1 ∩ · · · ∩ Ai m .
i 1 ,...,i m
Notice that Ak ∈ C and that for all n ∈ N and i 1 , . . . , i n ∈ {1, . . . , k} ∗
G i1 ∩ · · · ∩ ∗ G in = ∅ ⇒ Ai1 ∩ · · · ∩ Ain = ∅.
k−1 Moreover, since Ak Bk ⊆ Ak Bk ∪ i=1 (Ai Bi ) ∈ Nμ L , we conclude that Ak is a μ L -approximation of Bk . In order to apply saturation, we will show that for each n ∈ N there exists a C-simple function F : → ∗ M such that F[Ai ] ∈ ∗ G i for all i = 1, . . . , n. Let (Aσ )σ∈J be the partition of given by (A1 , . . . , An ); here J is the set of all strictly increasing finite tuples σ = (i 1 , . . . , i k ) in {1, . . . , n} and Aσ := Ai1 ∩ · · · ∩ Aik ∩
\ Aj.
j∈{1,...,n}\{i 1 ,...,i k }
If σ = ∅, then Aσ = \(A1 ∪ · · · ∪ An ) . For each σ = (i 1 , . . . , i k ) = ∅ in J such that ∗ G i1 ∩ · · · ∩ ∗ G ik = ∅ choose a bσ ∈ ∗ G i1 ∩ · · · ∩ ∗ G ik and define F(x) := bσ for each x ∈ Aσ . Choose b ∈ M and define F(x) := b if x ∈ \(A0 ∪ · · · ∪ An ). It is clear that F is defined for all x ∈ / A0 ∪ · · · ∪ An . If x ∈ A0 ∪ · · · ∪ An , then there exists a σ = (i 1 , . . . , i k ) = ∅ in J with x ∈ Aσ . Since Ai1 ∩ · · · ∩ Aik = ∅, ∗G ∩ · · · ∩ ∗G ∗ ∗ i1 i k = ∅, thus F : → M is well defined and F[Ai ] ∈ G i for all i = 1, . . . , n. By saturation,there exists a C-simple function F : → ∗ M / with F[Ai ] ∈ ∗ G i for all i ∈ N. We have thus obtained the following: Let x ∈ B and let G be an open set with f (x) ∈ G. Then there exists ∈ N (A ) n n μ L n∈N an n ∈ N such that f (x) ∈ G n ⊆ G; thus x ∈ f −1 [G n ] = Bn , which implies that x ∈ An . It follows that F(x) ∈ ∗ G n ⊆ ∗ G. This proves that F(x) f (x).
188
H. Osswald
“⇐” Suppose now that F is a lifting of f and assume that G is an open set in M. Since M is a separable metric space, there exists a sequence (G n )n∈N of open sets G n such that G = n∈N G n = n∈N G n , where G n is the closed hull of G n . It follows that x ∈ | F(x) ∈ ∗ G n ⊆ {x ∈ | f (x) F(x)} ∈ Nμ L . f −1 [G] n∈N
Since
n∈N
x | F(x) ∈ ∗ G n ∈ L μ (C) and L μ (C) is complete, f −1 [G] ∈ L μ (C).
6.2.4 Loeb Spaces over the Product of Internal Spaces Fix two internal measure spaces (, C, μ) and , C , μ such that μ() and μ ( ) are limited. A set of the form A × A with A ∈ C and A ∈ C is called a measurable rectangle and C ⊗ C denotes the ∗ σ-algebra generated by the internal algebra of all ∗ finite disjoint unions of measurable rectangles. The internal product measure of μ and μ on C ⊗ C is denoted by μ ⊗ μ . Let C ⊗ C be the completion of C ⊗ C with respect to μ ⊗ μ ; the corresponding extension of the measure μ ⊗ μ is also denoted by μ ⊗ μ . Analogous notation shall be used for the standard product. The next result shows that the usual product L μ (C) ⊗ L μ (C ) of the Loeb σ-algebras L μ (C) and L μ (C ) is contained in the Loeb product L μ⊗μ (C⊗ C ). In general, this inclusion is strict (examples have been given by Hoover and Norman). In [23] Yeneng Sun characterizes the Loeb spaces for which the inclusion is strict (see Proposition 8.4.5). In [4] the reader can find concrete examples and it is shown how extremely large the Loeb product space is compared to the product of Loeb spaces. Proposition 6.2.6 (Anderson [2]) If B ∈ L μ (C)⊗L μ (C ), then B ∈ L μ⊗μ (C ⊗ C ) and μ ⊗ μ L (B) = μ L ⊗ μ L (B).
Proof It is easy to see that × N ∈ N(μ⊗μ ) L , N × ∈ N(μ⊗μ ) L whenever N ∈ NμL , N ∈ Nμ L , respectively. Suppose that X ∈ L μ (C) and Y ∈ L μ (C ). By Sect. 6.2.1 (+), there exist U ∈ C and V ∈ C such that U X ∈ Nμ L and V Y ∈ NμL . Since (U × V ) (X × Y ) ⊆ (U X ) × ∪ ( × (V Y )) ∈ N(μ⊗μ ) L , we obtain X × Y ∈ L μ⊗μ (C ⊗ C ) and, μ ⊗ μ L (X × Y ) μ ⊗ μ (U × V ) =
μ(U ) · μ (V ) μ L (X ) · μL (Y ) = μ L ⊗ μL (X × Y ).
6 Measure Theory and Integration
189
It follows that L μ (C) ⊗ L μ (C ) ⊆ L μ⊗μ (C ⊗ C ) and μ ⊗ μ L coincides with μ L ⊗ μL on L μ (C) ⊗ L μ (C ). Finally, let N ⊆ M ∈ L μ (C) ⊗ L μ (C ) with μ L ⊗ μL (M) = 0. We have seen that M ∈ N(μ⊗μ ) L . Since Loeb spaces are complete N ∈ L μ⊗μ (C ⊗ C ) and μ ⊗ μ L (N ) = 0.
6.2.5 The Hyperfinite Time Line T Important examples of Loeb spaces are the hyperfinite ones. Fix a hyperfinite time line T := H1 , H2 , . . . , H where H ∈ ∗ N is unlimited. For technical reasons we choose H := G! where G is unlimited. It follows that each standard positive integer divides H. For each t ∈ T define Tt := H1 , H2 , . . . , t . Notice that [0, 1] = {◦ t | t ∈ T1 }. Fix a standard d ∈ N and define the internal counting measure νtd on ∗ P(Ttd ) by setting νtd (B) := |B| ·
1 Hd
for each internal subset B ⊆ Ttd , where |B| denotes the ∗ finite number of elements d . Since of A. If d = 1, we shall write νt for νtd , and, if t = H , we shall write ν d for ν H we have studied finite Loeb spaces so far, fix a limited t ∈ T . The Loeb space over d (Ttd , ∗ P(Ttd ), νtd ) is denoted by (Ttd , L ν d , νtd L ). By Sect. 6.2.4, L νt ⊆ L ν d , t t d where L νt denotes the d-fold product of the Loeb σ-algebras L νt in the usual sense. Since T d is a ∗ finite set, each internal function F : T d → ∗ R is, by Theorem 2.8.4, ∗ P(T d )-measurable and ν d -integrable, and
Td
Fdν d =
F(t1 , . . . , td )
(t1 ,...,td )∈T d
1 . Hd
Since for d, m ∈ N, ∗ P(T d ) ⊗ ∗ P(T m ) = ∗ P(T d+m ) and ν d ⊗ ν m = ν d+m (here we identify ((x1 , . . . , xn ), (xn+1 , . . . , xn+m )) with (x1 , . . . , xn+m )), we obtain, using Sect. 6.2.4, the following result. Proposition 6.2.7 Fix standard m, d ∈ N and a limited t ∈ T . Then L ν d ⊗ L νtm is t a subset of L ν d+m and for each B ∈ L ν d ⊗ L νtm , t
t
νtd
L
⊗ νtm L (B) = νtd ⊗ νtm (B) = νtd+m (B). L
L
190
H. Osswald
6.2.6 Lebesgue Measure as a Counting Measure In a further application, we shall construct a Loeb counting measure ν d L on the extends the Lebesgue measure λd on hyper-finite set T d , d ∈ N, which strictly d [0, ∞[ . The standard part map st : n∈N Tnd → [0, ∞[d is defined by setting st (t1 , . . . , td ) := (◦ t1 , . . . ,◦ td ). The set
d n∈N Tn
is the limited part of T d . Define for all limited s ∈ T sts : st Tsd
Each mapping f defined on [0, ∞[d can be converted to a mapping f T on the limited part of T d by setting f T (t1 , . . . , td ) := f
◦
t1 , . . . , ◦ td .
Occasionally we shall use the notion “lifting” in a slightly modified way. Fix a separable metric space (M, δ) inside our superstructure V . Let us call an internal d d function F : T d → ∗ M a lifting f : [0, ∞[ → M if F Tt dis a lifting of f Tt of d for all limited t ∈ T , that is, for νt L -almost all (t1 , . . . , td ) ∈ Tt F(t1 , . . . , td ) f
◦
t1 , . . . , ◦ td .
The following result shows the well known close relationship between νnd L and Lebesgue measure λd on [0, n]d (see Albeverio et al. [1] Theorem 2.3.4 and Proposition 2.3.5): The image measure of νnd L by the standard part map stn on Tnd is the Lebesgue measure λd on [0, n]d . Denote by Lebd the set of Lebesguemeasurable sets in Rd . Theorem 6.2.8 Fix standard d, n ∈ N. (a) A subset B ⊂ [0, n]d is Lebesgue measurable iff stn−1 [B] ∈ L νnd , in which case λd (B) = νnd (st −1 [B]). L
(b) A function f : [0, n]d → M is Lebesgue measurable iff f has a lifting F : Tnd → ∗ M. Proof (a) “⇒” By standard measure theory concerning product measures and, since, d by Sect. 6.2.5, νnd L = (νn )dL on L νn , it suffices to show that for each interval I ⊆ [0, n], stn−1 [I ] ∈ L νn and (νn ) L (stn−1 [I ]) = λ(I ). Let a, b, a ≤ b be the end-points of I . Set ]a, b[Tn := {t ∈ Tn | a < t < b} . We will show that ν(]a, b[Tn ) b − a.
6 Measure Theory and Integration
191
For a = b this is obvious. Let a < b. Choose l, m ∈ ∗ N such that and (l+m−1) < b ≤ (l+m) H H . Then ν(]a, b[Tn ) =
(l−1) H
≤a<
l H
m l +m l = − b − a. H H H
In particular, c := {t ∈ Tn | t c} is a (νn ) L -nullset for all c ∈ [0, n]. It follows that a ∪ b = 0. a ∪ b with (νn ) L stn−1 [I ] ]a, b[Tn ⊆ −1 ◦ Therefore, st−1 n [I ] ∈ L νn and (νn ) L (stn [I ]) = νn (]a, b[Tk ) = b − a = λ(I ). −1 Since L νk is complete, st [I ] ∈ L νk and (νn ) L (stn−1 [I ]) = λ(I ). “⇐” Fix B ⊆ [0, n]d with stn−1 [B] ∈ L νnd . Then, setting I := [0, n],
stn−1 [I d \ B] = Tnd \ stn−1 [B] ∈ L νnd . In order to show that B ∈ Leb(I d ), fix a standard ε > 0. By Theorem 6.2.2 (5), there exist internal A , C ⊆ Tnd such that A ⊆ stn−1 [B] and νnd (stn−1 [B] \ A ) < ε, L −1 n C ⊆ stn [I \ B] and νnd (stn−1 [I d \ B] \ C ) < ε. L
Now A := stn [A ] and C := stn [C ] are compact subsets of I n , thus Borel sets in I d . Moreover, A ⊆ B ⊆ I d \C. By “⇒”, we obtain λd ((I d \ C) \ A) = νnd (stn−1 [(I d \ C) \ A]) ≤ L
νnd stn−1 [I d \ B] \ C + νnd stn−1 [B] \ A < 2ε. L
L
Therefore, for each δ > 0 there exists an open set G := I d \C in I d and a closed set A in I d such that A ⊆ B ⊆ G and λd (G\A) < δ, which proves that B ∈ Leb[0, n]d . (b) Fix f : [0, n]d → M. By Part (a), the Lebesgue measurability of f is equivalent to the L νnd -measurability of f ◦ stn . By Theorem 6.2.5, f ◦ stn is L νnd measurable iff f ◦ stn has a lifting, which means that f has a lifting. d d Let us now extend the preceding result to the whole timeline T and [0, ∞[ by converting ν d to a measure ν d L on T d as follows: Denote by L ν d the set of all Loeb measurable B ⊆ T d , which means that B ∩ Tnd ∈ L νnd for all n ∈ N. Define for B ∈ L ν n ν d (B) := lim νnd (B ∩ Tnd ) ∈ [0, ∞] . L
n→∞
L
192
H. Osswald
d Then T d , L ν d , ν d L is an infinite measure space and ν d L (T d \ Tn ) = 0. n∈N However, this measure space is not too infinite, it is σ-finite, which means that ν d L is concentrated on a countable union of sets of ν d L -finite measure. d d The Loeb space T , L ν d , ν L is a strict extension of the Lebesgue space [0, ∞[d , Lebd , λd in the following sense: n Let Ld denote the set of all B ⊆ T d such that there exists a C B ∈ Leb −1 d d d with B ∩ Tn = (stn ) [C B ∩ [0, n] ] for all n ∈ N, augmented by all ν L Obviously, C B is uniquely determined by B. Vice versa, if C B = C B , then nullsets. ν d L (BB ) = 0, because B ∩ Tnd = B ∩ Tnd for all n ∈ N. Here is an example of a subset B ⊆ [0, 1] in L ν1 \L1 . Set B :=
i ∈ T1 | i is even . H
Then B is internal, thus B ∈ L ν1 and (ν1 ) L (B) = ν1 (B) = 21 , because H is even. Assume that there exists a C B ⊆ [0, 1] with B = {t ∈ T1 | ◦ t ∈ C B }. Notice that n C B ⊇ ]0, 1], because for each r ∈ ]0, 1] there exists an n ∈ ∗ N with n−1 H
1 , 2
In summary, we obtain from Theorem 6.2.8: Theorem 6.2.9 The mapping from Ld onto Lebd with B → C B is a measure preserving bijection from Ld onto Lebd , provided that we identify B and B in case B ∩ Tnd = B ∩ Tnd for all n ∈ N. Because of Theorem 6.2.9, and since the standard part ◦ t of t ∈ T d is ν d L -a.e. well defined, we obtain, Corollary 6.2.10 The L p -spaces L p [0, ∞[d , Lebd , λd and L p T d , Ld , ν d L with p ∈ [1, ∞[ can be identified, because the mapping , ι(ϕ)(t) := ϕ(◦ t) ι : L p [0, ∞[d , Lebd , λd → L p T d , Ld , ν d L
is a canonical (basis independent) isometric isomorphism between these spaces. We call ϕ and ι(ϕ) equivalent and identify both functions. In the preceding result we have defined ◦ (t1 , . .. , tn ) := (◦t1 ,. . . ,◦ tn ), well d defined ν L -a.e. It is often easier to work with L p T d , Ld , ν d L rather than to
6 Measure Theory and Integration
193
work with the corresponding Lebesgue space, because T d is a finite set in the sense d of the nonstandard model and ν L is close to a counting measure. Let , L μ (C) , (μ) L be again a finite Loeb space. We define here st : × T d → × [0, ∞[d , (X, t) → (X,◦ t). Note that st is μ ⊗ ν d L -a.e. well defined. The following extension of Theorem 6.2.2 to product spaces is straightforward. See Theorem 6.3.10 below. Proposition 6.2.11 Fix a separable metric space M within the superstructure V . (a) A subset B ⊂ × [0, ∞[d is L μ (C) ⊗ Lebd -measurable iff st −1 [B] ∈ L μ (C) ⊗ L ν d , in which case (μ) L ⊗ λd (B) = μ ⊗ ν n L (st−1 [B]).,
(+)
d ⊗ Lebd -measurable iff f has a (b) A function d f : × [0, ∞[ → M is L μ (C) ∗ d C ⊗ P T -measurable lifting F : × T → ∗ M, i.e., F(X, t) ∗ f (X,◦ t) for μ ⊗ ν d L -almost all (X, t).
Often it is more convenient to work with the domain
T=d := (t1 , . . . , td ) ∈ T d | ti = t j if i = j . instead of T n . Proposition 6.2.12
d ν L (T d \T=d ) = 0.
Proof Fix an n ∈ N. Since Tnd = (n · H )d and T=d = m := n · H , n d · νnd (Tnd \ Tn,d = ) =
(n·H )! (n H −d)! ,
we obtain with
1 d m − m · (m − 1) · . . . · (m − d + 1) = md
m−d +1 m m−1 · · ... · 0, m m m because d is finite. Therefore, νnd L (Tnd \Tnd= ) = 0, thus ν d L (T d \T=d ) = 0. 1−
Therefore, we can assume and we will assume in general that the liftings F : T d → ∗ R of a Lebesgue measurable function have the property / T=d . F(t1 , . . . , td ) = 0 if (t1 , . . . , td ) ∈ We also require the following definition:
T
194
H. Osswald
Here is a second example of an infinite σ-finite Loeb measure, the counting measure c L on N. For each n ∈ N we define a finitely additive internal measure cn setting cn : ∗ P ∗ N → {0, 1, . . . , n} , A → |A ∩ {1, . . . , n}| . Note that A ∈ ∗ P (∗ N) is a (cn ) L -approximation of B ⊆ ∗ N if A = B ∩ {1, . . . , n}, thus each subset, even each external set B ⊆ ∗ N is in L cn (∗ P (∗ N)) for all n ∈ N. Define c L (B) := lim (cn ) L (B) ∈ N0 ∪ {∞} . n→∞
Since ∗ N \ N is a c L -nullset, c L is the usual counting measure on N, denoted simply by c. In probability theory, especially in the Malliavin calculus, symmetric functions are important. A function g defined on a set X d , d ∈ N, is called symmetric if for each permutation σ on {1, . . . , d} and all (x1 , . . . , xd ) ∈ X d : g(x1 , . . . , xd ) = gσ (x1 , . . . , xd ) := g(xσ(1) , . . . , xσ(d) ). Proposition 6.2.13 Fix a separable Banach B within the superstructure V . Let f : [0, ∞[d → B be Lebesgue measurable. Then f is symmetric λd -a.e. iff f has a symmetric lifting. 1 Note further that if f is symmetric λd -a.e. and F is a lifting of f , then d! σ Fσ {1, is a symmetric lifting of f, where σ runs through the permutations of . . . , d}. 1 d -a.e., and, of Moreover, if F is a symmetric lifting of f, then f = d! f λ σ σ 1 course, d! σ f σ is symmetric. Proof Let σ be a permutation of {1, . . . , d}. If A ⊆ T d is internal, we set Aσ := {(t1 , . . . , td ) | (tσ(1) , . . . , tσ(d) ) ∈ A}. Since |Aσ | = |A|, ν d (A) = ν d (Aσ ). Fix a lifting F : T d → ∗ B of f and an n ∈ N. Let F n : F Tnd and f n := f [0, n]d . Then, by Sect. 6.2.1, for each ε ∈ R+ there exists an internal A ⊂ Tnd such that
(t1 , . . . , td ) ∈ Tnd | F n (t1 , . . . , td ) f n ◦ t1 , . . . , ◦ td ⊂ A and νnd (A) < ε. Since (t1 , . . . , td ) | Fσn (t1 , . . . , td ) f σn ◦ t1 , . . . , ◦ td ⊂ Aσ and νnd (A) = νnd (Aσ ), Fσn is a lifting of f σn . Since this is true for all n ∈ N, Fσ is a lifting of f σ .
6 Measure Theory and Integration
195
Let now f be symmetric λd -a.s. Then for ν d L -almost all (t1 , . . . , td ) ∈ T d , 1 1 ◦ Fσ (t1 , . . . , td ) f σ t1 , . . . , ◦ td = f ◦ t1 , . . . , ◦ td . d! σ d! σ 1 This proves that d! σ Fσ is a symmetric lifting of f. For the converse, let F be a symmetric lifting of f . Then we obtain for ν d L almost all (t1 , . . . , td ) ∈ T d ,
fσ
◦
t1 , . . . , ◦ td Fσ (t1 , . . . , td ) = F(t1 , . . . , td ) f ◦ t1 , . . . , ◦ tn .
Therefore, f = f σ λn -a.s.
6.2.7 Adapted Loeb Spaces Adapted Loeb spaces are used to establish martingale theory on Loeb spaces. Fix an internal probability space (, C, μ). An internal filtration on C is an internal H 2 -tuple (Ct )t∈T of internal algebras Ct ⊆ C such that Ct ⊆ Cs for t ≤ s. Then (, C, (Ct )t∈T , μ) is called an internal adapted probability space. Following Keisler’s idea, we construct from the internal filtration (Ct )t∈T on C, a standard filtration (cr )r ∈[0,∞[ on L μ (C); that is, (cr )r ∈[0,∞[ has the properties (1) and (2) below. Moreover, (cr )r ∈[0,∞[ satisfies the so-called Doob-Meyer conditions; that is, (cr )r ∈[0,∞[ has the properties (3) and (4) below. We use the following notation: If X and Y are subsets of L μ (C), then X ∨ Y denotes the σ-subalgebra of L μ (C) generated by X ∪ Y . Lemma 6.2.14 Fix a σ-algebra S ⊆ L μ (C). Then S ∨ Nμ L = B ∈ L μ (C) | ∃A ∈ S(AB ∈ Nμ L ) .
(+)
Proof “⊇” Fix B ∈ L μ (C) such that AB ∈ Nμ L for some A ∈ S. Then B = (A \ (A \ B)) ∪ (B \ A) ∈ S ∨ Nμ L . “⊆” The proof that X := B ∈ L μ (C) | ∃A ∈ S(AB ∈ Nμ L ) is a σ-algebra is similar to the proof of Part (2) of Theorem 6.2.2; instead of saturation use the fact that S is a σ-algebra. Since S ∪ Nμ L ⊆ X , the result follows. Fix an internal filtration (Ct )t∈T on C. For each r ∈ [0, ∞[ set cr :=
t∈T,tr
Ct ∨ Nμ L =
t∈T,tr
L μ (Ct ) ∨ Nμ L .
196
H. Osswald
Theorem 6.2.15 (Keisler [12]) (1) (2) (3) (4)
For each r ∈ [0, ∞[, cr ⊆ L μ (C) is a σ-algebra. ct ⊆ c s for all t < s in [0, ∞[. ct = s>t cs , that is, the filtration (ct )t∈[0,∞[ is right continuous. If N ∈ Nμ L , then N ∈ c0 ; thus N ∈ ct for each r ∈ [0, ∞[.
Proof (1) Fix t ∈ T with t r. Since Ct ⊆ C and L μ (C) is complete, cr ⊆ L μ (C). It is easy to see that cr is an algebra. To prove that it is a σ-algebra, fix a nondecreasing sequence (Bn )n∈N in cr . Using (+) and Sect. 6.2.1, for each n ∈ N there exist a kn ∈ T with kn r and an An ∈ Ckn such that μ L (An Bn ) = 0. For An := A1 ∪ · · · ∪ An we also have μ L (An Bn ) = 0, because B1 ⊆ · · · ⊆ Bm ⊆ . . .. Set s := limn→∞ ◦ μ(A n ). By saturation, there exist a k ∈ T with r k ≥ kn and an A ∈ Ck with n∈N An ⊆ A andμ(A) < s + n1 for each n ∈ N. Therefore, μ(A) s. It follows that μ L (A n∈N An ) = 0. Since A
n∈N
Bn ⊆
A
n∈N
An ∪
(An Bn ),
n∈N
μ L (A n∈N Bn ) = 0, thus, n∈N Bn ∈ ct . (2) This is true, because Ct ⊆ Cs for t < s in T . (3) By (2), ct ⊆ s>t cs . Now let C ∈ cs for all s > t. Then there exists a decreasing sequence (kn )n∈N in T with t < ◦ kn and limn→∞ ◦ kn = t such that for each n ∈ N there is an An ∈ Ckn with μ L (An C) = 0. By saturation, there exist a k ∈ T with r ≤ k ≤ kn and an A ∈ Ck such that μ(AA1 ) < n1 for each n ∈ N, thus μ(AA1 ) 0. Since r k and μ L (AC) ≤ μ L (AA1 ) + μ L (A1 C) = 0, C ∈ cr . (4) It is obvious that each N ∈ Nμ L belongs to c0 . The filtration (cr )t∈[0,∞[ on L μ (C) is called the standard part of the internal filtration (Ct )t∈T on C, and the quadruple (, L μ (C), μ L , (ct )t∈[0,∞[ ) is called the adapted Loeb space over (, C, μ, (Ct )t∈T ). Problems Let μ1 be an internal Borel probability measure on ∗ R and let μ be the H 2 -fold product of μ on the set B of internal Borel sets on ∗ RT . We assume that μ is absolutely continuous with respect to the internal Lebesgue measure λT , that is, λT (B) = 0 implies μ(B) = 0 for each B ∈ B. Define for each t ∈ T
Bt := B × ∗ RT \Tt | B is an internal Borel set in ∗ RTt . Let (br )r ∈[0,∞[ be the standard part of the internal filtration (Bt )t∈T .
6 Measure Theory and Integration
197
Fix r ∈ [0, ∞[. Following Keisler’s idea [12], we define an equivalence relation ∼r on ∗ RT by setting X ∼r Y :⇔ ∀s ∈ T (s r ⇒ (X i )i≤s = (Yi )i≤s ). Now define B ∈ br∼ :⇔ B ∈ L μ (B) and ∀X, Y ∈ ∗ RT (X ∼r Y ⇒ (X ∈ B ⇔ Y ∈ B)). For each subset D ⊆ ∗ RT and each t ∈ T we define D t := πt [D] × ∗ R{s∈T |t<s} . where πt (X s )s∈T := ((X s )s≤t ). Prove that br∼ is a σ-algebra. Prove: If s ≤ t in T , then D s ⊆ D t and D ⊆ D s . Prove: If r ∈ [0, ∞[ and r < ◦ s and D ∈ br∼ , then D = D s . Prove: For each r ∈ [0, ∞[, br = br∼ ∨ Nμ L . Let f : ∗ RT → R be L μT (B)-measurable. Prove that f is br -measurable if f (X ) = f (Y ) for all X, Y ∈ ∗ RT with X ∼r Y. (6) Prove that for each br -measurable f : ∗ RT → R there exists a br∼ -measurable g : ∗ RT → R with f = g μ L -a.s.
(1) (2) (3) (4) (5)
6.3 Standard Integrability for Internal Measures In this paragraph we establish the integration theory on Loeb spaces. Fix an internal measure space (, C, μ) such that μ() is provisionally limited, and fix the corresponding Loeb space (, L μ (C), μ L ). Obviously, if C is a ∗ σ-algebra we have the notion “μ -integrability”, which is nothing more than the usual integrability “copied” from the standard model to the nonstandard model by the Transfer Principle.
6.3.1 The Definition of S-integrability and Equivalent Conditions Fix an internal Banach space B with internal norm |·|. A C-measurable function F : → B (by definition, F is then internal) is called Sμ -integrable if for all unlimited K ∈ ∗ N
|F| dμ 0. {|F|≥K }
198
H. Osswald
For example, if F is C-measurable and limited, i.e., |F| is limited, then F is Sμ -integrable. Fix a standard p ∈ [1, ∞[ and define SL p (μ, B) := F : → B| |F| p is Sμ -integrable . In the case p = 2, we call the elements of SL2 (μ, B) Sμ -square integrable. If B = ∗ R, then we write SL p (μ) instead of SL p (μ,∗ R). Lemma 6.3.1 Assume that F : → ∗ R is C-measurable and F ≥ 0. (a) If F p dμ is limited, then F is limited μ L -a.e. (b) If F p dμ 0, then F 0 μ L -a.e. Proof (a) Since U := {F is unlimited} = n∈N {F ≥ n}, we see that U ∈ L μ (C). Assume that ε := μ L (U ) > 0. Then μ({F ≥ n}) > 2ε for each n ∈ N. By the Spillover Principle, there is an unlimited K ∈ ∗ N with μ({F ≥ K }) > 2ε . We obtain
ε F p dμ ≥ F p dμ > K p · is unlimited, p p 2 {F ≥K } which proves (a). (b) Now assume that F p dμ 0. Then n · F p dμ < n1 for all n ∈ N. By the Spillover Principle, there is an unlimited K ∈ ∗ N such that K · F p dμ < K1 is limited. By (a), K · F is limited μ L -a.e. Since K is unlimited, F 0 μ L -a.e. Proposition 6.3.2 (Anderson [2], Loeb [15]) Let F : → ∗ R+ be C-measurable. The following statements (1),…,(5) are equivalent: (1) F is Sμ -integrable. (2) limn→∞ ◦ {n≤F} Fdμ = 0. (3) For each A ∈ C
is limited in any case, Fdμ 0 if μ(A) 0. A (4)
Fdμ is limited and for each ε
∈ R+ there exists a δ ∈ R+ such that
ε for all A ∈ C with μ(A) < δ. (5) There exists a function g : N → N such that for all n ∈ N
{g(n)≤F}
Fdμ <
A
Fdμ <
1 . n
Proof “(1) ⇒ (2)” Assume that (2) is not true. Then there is an ε ∈ R+ such that n ∈ N | {n≤F} Fdμ ≥ ε is unbounded. By the Spillover Principle, there is a K ∈ ∗ N \ N with {K ≤F} Fdμ ≥ ε, contradicting (1).
6 Measure Theory and Integration
199
“(2) ⇒ (3)” Suppose that (2) is true. Fix a standard ε > 0. Then for some n ∈ N. Then we obtain for all A ∈ C
Fdμ ≤ Fdμ + Fdμ < {n≤F}
A
ε + nμ(A)
{n≤F}
Fdμ < ε
A∩{F
is limited in any case, 2ε if μ(A) 0.
+ Since this result is true for all ε ∈ R , A Fdμ 0 if μ(A) 0. This proves (3). “(3) ⇒ (4)” By (3), Fdμ is limited. Assume that (4) is not true. Then there exists an ε ∈ R+ such that for each n ∈ N there is an An ∈ C with μ(An ) < n1 and An Fdμ ≥ ε. By saturation, there exists an A ∈ C such that μ(A) 0, and A Fdμ ≥ ε, which contradicts (3). “(4) ⇒ (5)” Suppose that (4) is true. Since Fdμ is limited, by the preced≤ F} 0 for each unlimited K ∈ ∗ N. Therefore, and by (4), ing Lemma, μ {K 1 < n for all unlimited K ∈ ∗ N and all n ∈ N. For each n ∈ N set {K ≤F} Fdμ
g(n) := min k ∈ ∗ N | {k≤F} Fdμ < n1 . Since {K ≤F} Fdμ < n1 for all unlimited K ∈ ∗ N, g(n) ∈ N. This proves (5). “(5) ⇒ (1)” By (5), for each n ∈ N and each unlimited K ∈ ∗ N,
{F≥K }
Since n is arbitrary,
Fdμ ≤
{F≥K } Fdμ
{F≥g(n)}
Fdμ <
1 . n
0.
Notice that, since the integral of an Sμ -integrable function is limited, Sμ integrability implies μ-integrability. On the other hand, we note, as an example, that a constant function on having an unlimited value is μ-integrable, but not Sμ -integrable (provided μ() is not infinitesimal). The next result is a simple application of the previous proposition. Corollary 6.3.3 Fix p ∈ [1, ∞[. (α) Fix G, F ∈ SL p (μ, B) and a limited a ∈ ∗ R. Then F + G ∈ SL p (μ, B) and a · F ∈ SL p (μ, B). (β) Fix 1 ≤ q < p and let |F| p dμ be limited. Then F ∈ SLq (μ, B), in particular, SL p (μ, B) ⊆ SLq (μ, B). (γ) A C-measurable function to SL p (μ, B) iff there exists a sequence F belongs p p (G n )n∈N in SL (μ, B) with |F − G n | dμ < n1 for each n ∈ N. 1 |F| p dμ p . By the triangle equality for the internal Proof (α) Define F p := L p -norm · p , we obtain for each A ∈ C 1 A (F + G) p ≤ 1 A F p + 1 A G p
0 if μ(A) 0 is limited.
200
H. Osswald
Since a is limited and therefore |a| p is limited, we obtain
|a F| p dμ = |a| p A
|F| p dμ A
0 if μ(A) 0 is limited.
The assertions now follow from (3). (β) Let |F| p dμ be limited. Then, by Hö lder’s inequality, for each A ∈ C
q· qp
|F| dμ ≤
|F|
q
A
q 1− q p p dμ · 1dμ
A
A
0 if μ(A) 0 is limited.
Now apply again (3). (γ) “⇒” is obvious. “⇐” Suppose that there exists a sequence (G n ) in SL p (μ, B) with |F − G n | p dμ < n1 for each n ∈ N. Since for each A ∈ C, 1 A F p ≤ 1 A (F − G n ) p + 1 A G n p
2 < √ if μ(A) 0 p n is limited,
it follows that F ∈ SL p (μ, B) (again by (3)).
6.3.2 µ L -integrability and Sµ -integrability We use the notation of the preceding section. In this section we will prove slight modifications of the Loeb-Anderson lifting theorem for Bochner integrable functions (see Loeb [15] and Anderson [2] and also [18]). It shows that Bochner integrable functions can be characterized by their S-integrable liftings. We fix a separable Banach space B within the superstructure V with norm |·|. We use the notation in the preceding section. Fix the Loeb space , L μ (C) , μ L over an internal space (, C, μ) where μ() is limited. k αi 1 Ai , where Recall that an L μ (C)-simple function is a function of the form i=1 k ∈ N, (α1 , . . . , αk ) and (A1 , . . . , Ak ) are k-tuples in B and L μ (C), respectively, and 1 Ai is the indicator function of Ai , that is, 1 Ai (x) := 1 if x ∈ Ai , and 1 Ai (x) := 0 if x ∈ / Ai . Internal C-simple functions are defined in a similar way. Define
k i=1
αi 1 Ai dμ L :=
Notice that the integral is well defined.
k i=1
αi μ L (Ai ).
6 Measure Theory and Integration
201
Assume that ( f n )n∈N0 is a sequence of L μ (C)-measurable functions. Then ( f n ) converges to f := f 0 in measure if for each ε ∈ R+ , lim μ L {| f n − f | ≥ ε} = 0.
n→∞
An L μ (C)-measurable function f : → B is called μ L -integrable if there exists a sequence ( f n )n∈N of L μ (C)-simple functions f n such that (I 1) ( f n )n∈N converges to f in measure. (I 2) limn→∞, m→∞ | f n − f m | dμ L = 0. The sequence ( f n )n∈N is called a witness for the integrability of f . (I 2) means that + for each δ ∈ R there is an n 0 ∈ N such that for each n, m ∈ N with n, m ≥ n 0 , | f n − f m | dμ L < δ. If (I 1) and (I 2) are true, we set
Fdμ L := lim
n→∞
f n dμ L .
Notice that Fdμ L is well defined. Set A Fdμ L := 1 A Fdμ L for A ∈ L μ (C). If μ L isa probability measure, then it is a common practice to write Eμ L ( f ) instead of Fdμ L . The index μ L can be omitted if it is clear which measure is meant. Theorem 6.3.4 Fix an L μ (C)-measurable function f : → B and p ∈ [1, ∞[. (a) Then f is Bochner integrable iff f has an Sμ -integrable C-∗ simple lifting F : → ∗ B, in which case
Fdμ L = ◦ Fdμ.
(b) Let F : → ∗ B be a C-measurable lifting of f : → B. Suppose that | f | p is μ L -integrable. Then there exists an unlimited N ∈ ∗ N such that 1{|F|≤N } · F ∈ SL p (μ,∗ B). This function remains a lifting of f . (c) f is Bochner μ L -integrable iff | f | is μ L -integrable. (d) f ∈ L p (μ L , B) iff f has a lifting F ∈ SL p (μ,∗ B), in which case
| f | dμ L = p
◦
|F| p dμ.
Proof (a) “⇒” Choose a witness ( f k )k∈N for the integrability of f with f k = a1 1 B1 + · · · + an 1 Bn . Let Ai ∈ C be a μ-approximation of Bi , i = 1, . . . , n. Then Fk := a1 1 A1 + · · · + an 1 An is a lifting of f k , and Fk dμ B f k dμ L . Since |Fk | is limited, Fk ∈ SL1 (μ,∗ B). Let G : → ∗ B be a C-∗ simple lifting of f according to Theorem 6.2.5. Then there exists a subsequence Fg(m) m∈N of (Fm )m∈N such that for all m ∈ N and all k ≥ g(m)
◦ Fk − Fg(m) dμ < 1 , (+) m
202
H. Osswald
and ◦
1 1 < . μ |Fk − G| ≥ m m
(+)
Let (Fk )k∈ ∗ N be an internal extension of (Fk )k∈ N such that all Fk : → ∗ B are C-∗ simple. By the Spillover Principle, for each m ∈ N there exists an unlimited K m such that for all unlimited K ≤ K m the equations under + become true, when we drop ◦ and replace k by K . There exists an unlimited K ∞ ≤ K m for all m ∈ N. It follows that (+) become true for all m ∈ N if we drop ◦ and replace k with K ∞ . Note that F := FK ∞ is a lifting of f , and by Corollary 6.3.3 (γ), F ∈ SL1 (μ,∗ B). Since
Fk dμ B
f k dμ L →k→∞
Fdμ L ,
we obtain,
Fdμ L − Fdμ ≤
f − f g(k) dμ L +
f g(k) dμ L −
Fg(k) − F dμ, Fg(k) dμ +
which can be made arbitrarily small from the standard point of view. This proves that Fdμ L = ◦ Fdμ. “⇐” The proof is similar to the proof of Lemma 6.1 in [18]. Assume that f has an Sμ -integrable lifting F. Since F f μ L -a.s., by Theorem 6.2.2 (5), there exists a decreasing sequence (An )n∈N in C such that for each n ∈ N, μ(An ) < 1/n and F(x) f (x) for all x ∈ \An . Fix n ∈ N. Since {F(x) : x ∈ \ An } is an internal set of nearstandard points of ∗ B, K n = { f (x) : x ∈ \ An } =
◦
F(x) : x ∈ \ An
is compact. We fix a covering of K n by a finite collection of standard open balls Oi , 1 and center yi . Since {F(x) : x ∈ \ An } ⊂ 1 ≤ i ≤ m, each having radius ri < 2n m ∗ O , we may construct L (C)-simple functions f n as follows: For each x ∈ An i μ i=1 set f n (x) := 0. For each x ∈ / An , set f n (x) := yi if F(x) ∈ ∗ Oi and F(x) ∈ / ∗O ∪ · · · ∪ ∗O 1 i−1 , 1 ≤ i ≤ m. We thus obtain for all x ∈ \An , 1 | f (x) − f n (x)| ≤ f (x) − ◦ F(x) + ◦ F(x) − f n (x) < . n
6 Measure Theory and Integration
203
It follows that 1 1 ≤ μ L (An ) = ◦ μ(An ) ≤ . μL | fn − f | ≥ n n Thus ( f n ) converges to f in measure. Notice that f n is an internal C-measurable function. To show that | f n − f m | dμ L →n,m→∞ 0, fix an ε ∈ R+ . Since F ∈ SL1 (μ), by Proposition 6.3.2 (4), there exists a δ ∈ R+ such that A |F| dμ < ε for each A ∈ C with μ(A) < δ. Choose n 0 ∈ N with n10 < δ. Then we have for each n ∈ N with n ≥ n 0 ,
| f n − F| dμ = | f n − F| dμ + | f n − F| dμ =
\An
An
|F| dμ + An
\An
| f n − F| dμ < ε +
1 μ(), n
because μ(An ) < δ and |F(x) − f n (x)| < n1 for x ∈ / An and f n An = 0. Therefore, for n, m ∈ N with n, m ≥ n 0 we obtain
1 1 ◦ ◦ | f n − f m | dμ L = | f n − f m | dμ ≤ 2ε + μ(), + n m whence, | f n − f m | dμ L →n,m→∞ 0. (b) Set a := | f | p and A := |F| p . Then A is a lifting of a. Since A is therefore limited μ L -a.e., limn→∞ 1{A≤n} a = a μ L -a.e. It follows that
lim
n→∞
a − 1{A≤n} adμ L = 0 and lim 1{A≤n} a − 1{A≤m} adμ L = 0. n,m→∞
Therefore and since lifting of 1{A≤n} a for all n ∈ N, by 1{A≤n} A is an Sμ -integrable (a), limn,m→∞ ◦ 1{A≤n} A − 1{A≤m} Adμ = 0. There exists a strictly monotone increasing function g : N → N such that for all m ∈ N and all n ≥ ∗ g(m) 1{A≤n} A − 1{A≤ ∗ g(m)} Adμ < 1 . m By the Spillover Principle, for each m ∈ N there exists an unlimited Nm such that for all unlimited N ≤ Nm the preceding formula is true, when we replace n by N . ∗ By saturation, there exists an unlimited M ∈ 1N with M ≤ Nm for all m ∈ N. It follows that 1{A≤M} A − 1{A≤g(m)} Adμ < m for all m ∈ N. By Corollary 6.3.3 √ (γ), 1{A≤M} A is Sμ -integrable, thus 1{|F|≤N } F ∈ SL p (μ,∗ B), where N = p M. This function remains a lifting of f , because N is unlimited and f is standard.
204
H. Osswald
(c) Obviously, | f | is μ L -integrable if f is Bochner μ L -integrable. Conversely, assume that | f | is μ L -integrable. Since f is L μ (C)-measurable, f has a C-measurable lifting F. By (b), there exists an unlimited N ∈ ∗ N such that 1{|F|≤N } F is an Sμ integrable lifting of f . By (a), f is Bochner μ L -integrable. (d) Let f be p-times Bochner integrable and let G be a C-∗ simple lifting of f . Then, by (b), there exists an unlimited N ∈ ∗ N such that F := 1{|F|≤N } · G is a lifting of f in SL p (μ,∗ B). Therefore,
| f | p dμ L =
◦
|F| p dμ.
If F is a lifting of f in SL p (μ,∗ B), then f ∈ L p (μ L , B) by Parts (a) and (c). The following corollary tells us that an internal ∗ R+ 0 -valued function F is Sintegrable if and only if the standard part of the integral of F equals the integral of the standard part of F: Corollary 6.3.5 (Anderson [2], Loeb [15]) Let F : → ∗ R+ 0 be C-measurable. ◦ F(x) := ∞ if F(x) is unlimited. if F(x) is limited and set Set ◦ F(x) := a ∈ R+ 0 Then. (a) ◦ Fdμ L ≤ ◦ Fdμ. (b) F is Sμ -integrable iff ◦ FdμFdμ = ◦ Fdμ L < ∞. Proof (a) First assume that μ L {F is unlimited} = ε > 0. Then ◦ Fdμ L = ∞ and we have for all n ∈ N,
◦ ◦ Fdμ ≥ Fdμ ≥ n · ◦ μ {F ≥ n} ≥ n · ε, {F≥n}
◦
thus,
Fdμ
◦
= ∞. Now assume that F is limited μ L -a.e. Then,
Fdμ L = lim
n→∞
1{F
◦
n→∞
1{F
Fdμ.
(b) “⇒” follows from Theorem 6.3.4. To prove “ ⇐”, suppose that Fdμ ◦ Fdμ L < ∞. Then, applying “⇒” to the Sμ -integrable function 1{F
F − 1{F
◦
n→∞
Fdμ L − lim
n→∞
1{F
◦
Fdμ L −
Therefore, F ∈ SL1 (μ), by Corollary 6.3.3 (γ).
◦
Fdμ L = 0.
6 Measure Theory and Integration
205
Let us end this section with the following remark, concerning the connection between Sν -integrability and S-continuity, playing an important role later. Recall that an internal function G : T → ∗ R is S-continuous if G(t) is limited for all limited t ∈ T and if G(s) G(t) for all limited s, t ∈ T with s t. Corollary 6.3.6 Fix an internal F : T → ∗ R such that F Tk is Sνk -integrable for all k ∈ N. (a) Then FI : t → {s∈T |s≤t} Fdν is S-continuous. Therefore, we can define a continuous function Fdν : [0, ∞[→ R by setting
Fdν : ◦ t → ◦ FI (t)
for all limited t ∈ T . (b) If of a Lebesgue measurable function f : [0, ∞[→ R, then r F is a lifting Fdν(r ) for all r ∈ [0, ∞[. 0 f dλ =
6.3.3 Integrable Functions defined on Nn × × [0, ∞[m We extend Theorem 6.3.4 and Lemma 6.2.8 to functions f : Nn × × [0, ∞[m → B, which play an important role in the Malliavin calculus for Lévy processes. Here B is again a separable Banach space inside the superstructure V . This function f is called p-summable if k∈Nn
×[0,∞[m
| f k | dμ L ⊗ λ = p
m
Nn ××[0,∞[m
| f k | p dcn ⊗ μ L ⊗ λm < ∞.
Then we write f ∈ L p (cn ⊗ μ L ⊗ λm , B). If D is a σ-subalgebra of L μ (C) and E a σ-subalgebra of Lebm and if all the f k are, in addition, D ⊗ E-measurable, then we p write f ∈ L D⊗E (cn ⊗ μ L ⊗ λm , B). According to Corollary 6.2.10, we can identify Lebesgue-measurable functions f with Lm -measurable functions g in the following sense: Corollary 6.3.7 Fix p ∈ [1, ∞[. Then p p ι : L L μ (C )⊗Lebm (cn ⊗ μ L ⊗ λm , B) → L L μ (C )⊗Lm (cn ⊗ μ ⊗ ν m L , B), defined by ι( f )(k, X, t) = f (k, X,◦ t) for all k ∈ Nn and (μ ⊗ ν m ) L -almost all (X, t) ∈ × T m , is a canonical isomorphic isometry between both spaces. Recall that “canonical” means that the isometry does not depend on a basis. In view of the following definitions notice that, if A is an internal subset of the unlimited part of T n , then there exists an unlimited S ∈ ∗ N with A ⊆ (T \ TS )n .
206
H. Osswald
If A is an internal subset of the limited elements of T n , then there exists an S ∈ N with A ⊆ TSn . Since non-finite measures on Nn and [0, ∞[m are involved, we need an extension of the notion of S-integrability. Fix an internal Banach space B with norm |·| and fix a C-measurable internal F : ∗ Nn × × T m → B, i.e., F (k, ·, t) is C-measurable for all k ∈ ∗ Nn , t ∈ T m . Fix p ∈ [1, ∞[. p
Definition 6.3.8 We call F locally Scn ⊗μ⊗ν m -integrable if 1×Tσm · Fk ∈ SL p μ ⊗ νσm , B p
for all k ∈ Nn and all σ ∈ N. We call F Scn ⊗μ⊗ν m -integrable, in which case we write F ∈ SL p (cn × μ ⊗ ν m , B), if the following two conditions hold p (i) F is locally Scn ⊗μ⊗ν m -integrable. (ii) For all unlimited S ∈ ∗ N
|F| p dcn ⊗ μ ⊗ ν m , |F| p dcn ⊗ μ ⊗ ν m 0 ∗ Nn ××(T m \T m ) (∗ Nn \S n )××T m S where we have identified S with {1, . . . , S}. p
Lemma 6.3.9 Assume that F is Scn ⊗μ⊗ν m -integrable. Then, (a) ∗ Nn ××T m |F| p dcn ⊗ μ ⊗ ν m is limited. (b) |F| is limited(cn ⊗ μ ⊗ ν m ) L -a.e. (c) Assume that ∗ Nn ××T m |F| p dcn ⊗ μ ⊗ ν m 0. Then |F| 0 a.e. Proof Set ρ := cn ⊗ μ ⊗ ν m and A := ∗ Nn × × T m . (a) Assume that A |F| p dρ is unlimited. Then we obtain for all standard σ ∈ N,
1 p |F| dρ ≤ p
A
1 p |F| dρ + p
A\(σ n ××T m )
1 p |F| dρ . p
σ n ××T m
There exist infinitely many σ ∈ N such that the first summand is unlimited or infinitely many σ ∈ N such that the second one is unlimited. In the first case implies that there exists an unlimited σ ∈ ∗ N such that the Spillover Principle p p |F| dρ ≥ 1, which contradicts the Sρ -integrability of F. Now A\(σ n ××T m) assume that σn ××T m |F| p dρ is unlimited. Then for all standard τ ∈ N,
1 |F| dρ p
σ n ××T m
p
≤
1
σ n ××(T m \Tτm )
p
|F| dρ p
+
1
p
|F| dρ p
σ n ××Tτm
.
p Since σn ××T m |F| p dρ is limited because of the local Sρ -integrability of F, τ the first summand is unlimited for all τ ∈ N. Again by the Spillover Principle
6 Measure Theory and Integration
207
there exists an unlimited τ ∈ ∗ N such that (b)
p contradicts the Sρ -integrability of F. By the definition of (cn ⊗ μ ⊗ ν m ) L in
σ n ××(T m \Tτm ) |F|
p
dρ ≥ 1, which
Sect. 6.2.6 and Lemma 6.3.1,
n c ⊗ μ ⊗ ν m L (a, b, c) ∈ A | |F(a, b, c)| p is unlimited = lim
k→∞
n ck ⊗ μ ⊗ νkm L (a, b, c) ∈ k n × × Tkm | |F(a, b, c)| p is unlimited = 0, p
because F is locally Sc⊗μ⊗ν -integrable. (c) The proof of (c) is similar to the proof of (b). The following result is again a slight extension of the Loeb-Anderson lifting theorem for Banach space valued functions, defined on Nn × × [0, ∞[m . We only prove the Theorem for functions f : × [0, ∞[m → B, because only this special case will be used later. Here B is a separable Banach space inside the superstructure V with norm |·|. Theorem 6.3.10 Fix f : Nn × × [0, ∞[m → B. (a) fl is L μ (C) ⊗ Lebm -measurable for all l ∈ Nn iff f has a C-measurable lifting F : ∗ Nn × × T m → ∗ B, i.e., Fl is a lifting of fl for all l ∈ Nn . p (b) f belongs to L L μ (C )⊗Lebm (cn ⊗ μ L ⊗ λm , B) iff f has a C-measurable lifting F in SL p (cn ⊗ μ ⊗ ν m ,∗ B) , in which case |Fl | p dμ ⊗ ν m | fl | p dμ L ⊗ λm . l∈ ∗ Nn
×T m
l∈Nn
×Lebm
m If p = 1, then l∈ ∗ Nn ×T m Fl dμ ⊗ ν m B l∈Nn ×[0,∞[m fl dμ L ⊗ λ . Notice that (a) and (b) are equivalent to statements, which result by replacing Lebm with Lm and μ L ⊗ λm with (μ ⊗ ν m ) L . Proof Assume that n = 0 and set A := × T m . Let f k := 1×[0,k]m · f for all k ∈ N. If F : × T m → ∗ B is internal, then set Fk := 1×Tkm · F for all k ∈ ∗ N. (a) “⇒” Let f be L μ (C) ⊗ Lebm -measurable. Then for all k ∈ N, f k has a lifting Fk : × Tkm → ∗ B. We may assume that Fk (a) = Fk+1 (a) for all k ∈ N and / × Tkm . Let (Fk )k∈ ∗ N be an internal all a ∈ × Tkm , and that Fk (a) = 0 if a ∈ extension of (Fk )k∈N . Then there exist an unlimited K ∈ ∗ N, K ≤ H such that Fk (a) = Fk+1 (a) for all a ∈ × Tkm and all k < K . Then F := FK is a lifting of f , because lim μ ⊗ νkm L (X, t) ∈ × Tkm | F(X, t) f k (X, ◦ t) = 0.
k→∞
“⇐” If F is a C-measurable lifting of f , then Fk is a C-measurable lifting of f k , thus f k is L μ (C) ⊗ Lebm -measurable for all k ∈ N. Therefore, f is L μ (C) ⊗ Lebm measurable.
208
H. Osswald p
(b) Fix f ∈ L L μ (C )⊗Lebm (μ L ⊗ λm , B) and k ∈ N. By Theorem 6.3.4, there exists a C-measurable lifting Fk : × Tkm → ∗ B in SL p (μ ⊗ νkm , ∗ B) of f k . We may assume that for all k ∈ N (i) Fk (x) = Fk+1 (x) for all x ∈ × Tkm , (ii) Fk (x) = 0 if x ∈ / × Tkm . Since
|Fk | p dμ ⊗ ν m | f k | p dμ L ⊗ λm ≤ | f | p dμ L ⊗ λm , ×[0,∞[m
A
×[0,∞[m
we have in addition to (i) and (ii), (iii) A |Fk | p dμ ⊗ ν m ≤ ×[0,∞[m | f k | p dμ L ⊗ λm + k1 . Let (Fk )k∈ ∗ N be an internal extension of (Fk )k∈N . By the Spillover Principle, there exists an unlimited K ∞ ∈ ∗ N with K ∞ ≤ H such that for all unlimited k ∈ ∗ N with k ≤ K ∞ (i), (ii) and (iii) are true, in particular, A |Fk | p dμ ⊗ ν m p m ×[0,∞[m | f k | dμ L ⊗ λ . Set F := FK ∞ . Note that F is a lifting of f and F ∈ p m ∗ SL (μ ⊗ ν , B). p m ∗ Now assume thatp f has am C-measurable lifting F ∈ SL (μ ⊗ ν , B). By Lemma 6.3.9, A |F| dμ ⊗ ν is limited. Moreover, Fk is a C-measurable lifting of Fk and
◦ p m p m ◦ | f k | dμ L ⊗ λ = lim |Fk | dμ ⊗ ν ≤ |F| p dμ ⊗ ν m . lim
k→∞ ×[0,∞[m
k→∞
A
A
p
This proves that f ∈ L L μ (C )⊗Lebm (μ L ⊗ λm , B).
Finally assume that f has a C-measurable lifting F ∈ SL1 (μ ⊗ ν m , ∗ B). Then
Fdμ L ⊗ λ = lim m
×[0,∞[m
lim
k→∞
k→∞ ×[0,∞[m
◦
Fk d μ ⊗ ν m =
◦
A
f k dμ L ⊗ λm =
Fd μ ⊗ ν m , A
which can be seen as follows. Assume that this equality is not true. Then there exists a standard ε > 0 such that for infinitely many k,
|F − Fk | d μ ⊗ ν m . ε ≤ (F − Fk ) d μ ⊗ ν m ≤ A
A
By the Spillover Principle, there exists an unlimited k ∈ ∗ N such that
|F − Fk | d μ ⊗ ν m =
ε≤
A
which contradicts the S-integrability of F.
×T m \Tkm
|F| d μ ⊗ ν m ,
6 Measure Theory and Integration
209
The preceding Theorem implies that any Cauchy sequence (Fn )n∈N of nearstandard elements in a complete space produces a limit ◦ F, where F is an FN of an internal extension (Fn )n∈ ∗ N of (Fn )n∈N . Important is that F inherits the common internal properties of the Fn . Here are more details. A nice example of the following result is Theorem 7.2.10. Corollary 6.3.11 Fix a σ-algebra F ⊆ L μ (C) ⊗ Lm containing all ((μ ⊗ ν m )) L nullsets. Let (Fi )i∈N be a sequence of C ⊗ ∗ P (T )-measurable functions Fi : ∗ Nn × m ∗ ◦ × Tm → ∗ B ∈ SL p (cn ⊗ μ ⊗ ν , B) such that Fi is a lifting of a function Fi p n m in L F (c ⊗ μ ⊗ ν ) L , B . Moreover, suppose that lim
◦
i, j→∞
k∈ ∗ Nn
×T m
Fi (k, ·) − F j (k, ·) p dμ ⊗ ν m = 0.
Let (Fi )i∈ ∗ N be an internal extension of (Fi )i∈N . Then there exists an unlimited I∞ ∈ ∗ N such that for all unlimited I ∈ ∗ N with I ≤ I∞ : (a) F := FI ∈ SL p (cn ⊗ μ ⊗ ν m ,∗ B), (b) F is nearstandard cn ⊗ (μ ⊗ ν m) L -a.e. and ◦ F is F-measurable, p (c) (◦ Fi )i∈N converges to ◦ F in L F cn ⊗ (μ ⊗ ν m ) L , B . Moreover, F inherits the common internal properties of the Fi . Proof In order to save indices, set n = m = 1. By the assumption and saturation, there exists a subsequence Fg(i) i∈N of (Fi ) and an unlimited I ∈ ∗ N such that for F := FI F(k, ·) − Fg(i) (k, ·) p dμ ⊗ ν < 1 . (+) i ⊗T ∗ k∈ N
for all i ∈ N. Using the triangle equality and the fact that Fg(i) belong to SL p (c ⊗ μ ⊗ ν,∗ B), it is easy to see that F is also in SL p (c ⊗ μ ⊗ ν,∗ B) . Since k∈N ◦
×T
k∈ ∗ N ×T
◦ Fi (k, ·) − ◦ F j (k, ·) p d (μ ⊗ ν) L = Fi (k, ·) − F j (k, ·) p dμ ⊗ ν →i, j→∞ 0,
there is a limit f of (◦ Fi )i∈N in L F (c ⊗ (μ ⊗ ν) L , B). Let G ∈ SL p (c ⊗ μ ⊗ ν,∗ B) be a lifting of f , according to Theorem 6.3.10. Using (+), p
k∈ ∗ N
thus F is a lifting of f
×T
|F(k, ·) − G (k, ·)| p dμ ⊗ ν 0,
210
H. Osswald
6.3.4 Standard Part of the Conditional Expectation In this section we are concerned with conditional expectation of random variables with values in a separable Hilbert space H with orthonormal basis (ei )i∈N and norm ·. In particular we give an answer to the question: What is the standard part of the internal conditional expectation of F : → ∗ H if f : → H is the standard part of F? Recall that, if (, C, μ) is a probability space and D ⊆ C is a σ-additive and f : → R is a C-measurable μ-integrable function, then ED f denotes the conditional expectation of f under D. It is the μ-a.s. well defined D-measurable function g : → R such that for all A ∈ D
Fdμ = gdμ. A
A
Now let , L μ (C) , μ L be a Loeb probability space and let F be a sub σ-algebra of L μ (C). Reducing the conditional expectation for H-valued random variables to scalar valued ones, we define for any ϕ ∈ L 2 (μ L , H), EF (ϕ) :=
∞ EF "ϕ, ei # ei . i=1
Note that EF (ϕ) ∈ L 2F (μ L , H) is well defined, in particular, it does not depend on the orthonormal basis. Theorem 6.3.12 Let p ≥ 1 and let F be an internal subalgebra of C and assume that F ∈ SL p (μ,∗ H). (a) Then the conditional expectation EF F of F under F belongs to SL p (μ F,∗ H). (b) Let F ∈ SL1 (μ,∗ H) be a lifting of f ∈ L 1 (μ L , H). Then EF F is a lifting of EF ∨N μ L f . p Proof (a) By Jensen’s inequality, EF F ≤ EF F p . Part (a) follows. (b) Let M be the set of all g ∈ L 1 (μ L , H) such that g has a lifting G ∈ SL1 (μ,∗ H) ∈ with EF G is a lifting of EF ∨N μ L g. Obviously, M is a linear space. Fix D L μ (C) and a ∈ H. Let D ∈ C be a μ-approximation of D. Then L := 1 D · ∗ a : 1 → ∗ H ∈ SL1 (μ,∗ H) is a lifting of l := 1 D · a : → H ∈ L (μ L , H). To F ∨N μ L F ◦ F prove that E 1 D is a lifting of E 1D , first note that E 1 D is F ∨ N μ L Then we ∈ F ∨ N μ L and a μ-approximation E ∈ F of E. measurable. Fix E obtain
∩ E = ◦ μ (D ∩ E) = EF ∨N μ L 1 D dμ L = μ L D E
◦
F
◦ F
E (1 D )dμ = E
E (1 D )dμ L =
E
◦ F E
E (1 D )dμ L .
6 Measure Theory and Integration
211
◦ EF (1 ) μ -a.s., thus, EF (L) is a lifting of It follows that EF ∨N μ L 1 D = D L EF ∨N μ L l. Now we will prove that M is complete. Let (gn ) be a Cauchy sequence in M with limn→∞ gn = g ∈ L 1 (μ L , H). Let G n ∈ SL1 (μ,∗ H) be a lifting of gn with EF G n being a lifting of EF ∨N μ L gn . Let G ∈ SL1 (μ,∗ H) be a lifting of g. By Jensen’s inequality, we obtain: ◦
Eμ EF G n − EF G ≤ ◦ Eμ G n − G =
Eμ L ◦ G n − G = Eμ L gn − g →n→∞ 0 and in the same way, lim Eμ L EF ∨N μ L gn − EF ∨N μ L g = 0.
n→∞
Recall that, if f : → H, then ∗ f : → ∗ H with ∗ f (α) := ∗ ( f (α)). It follows that 1 ≤α+β+γ μouter EF G − ∗ EF ∨N μ L g ≥ m can be made arbitrarily small, where 1 ≤ 3m ◦ Eμ EF G n − EF G , α = ◦ μ EF G − EF G n ≥ 3m 1 β = μ L EF (G n ) − EF ∨N μ L gn ≥ = 0, 3m 1 ≤ γ = μ L EF ∨N μ L gn − EF ∨N μ L g ≥ 3m 3m · Eμ L EF ∨N μ L gn − EF ∨N μ L g . It follows that EF G is a lifting of EF ∨N μ L g, thus M is complete. Since M is complete and closed under simple functions, M = L 1 (μ L , H).
6.3.5 Characterization of S-integrability The following powerful characterization of S-integrability, using convex functions, is due to Hoover and Perkins [10]. This characterization together with the famous Burkholder Davis Gundy inequalities will be extensively used in Chap. 7. Let us call
212
H. Osswald
an internal function : ∗ [0, ∞[→ ∗ [0, ∞[ with (0) = 0 strongly increasing if there exists a internal sequence (an )n∈ ∗ N0 with the following properties SI 1 a0 = 0, 1 ≤ a1 , 4an < an+1 for all n ∈ ∗ N, SI 2 an is limited iff n ∈ N, SI 3 (x) = (n + 1) · x − (a1 + · · · + an ) for all x ∈ [an , an+1 [. Note that (x) := [0,x] ϕ(t)dt with ϕ := n∈∗ N0 (n + 1)1[an ,an+1 [ , thus is ∗ continuous. Moreover note that is convex, which means that for all x ≤ y in ∗ [0, ∞[ and all α ∈ ∗ [0, 1] (x + α (y − x)) ≤ (x) + α ((y) − (x)) . The following lemma really shows that is strongly increasing. Lemma 6.3.13 For all unlimited H ∈ ∗ N sup H ≤x
x 0. (x)
Proof We obtain for each unlimited H ∈ ∗ N and all x ≥ H with x ∈ [an , an+1 [ (then n is unlimited) that (n + 1) · x − (a1 + · · · + an ) (x) = ≥ x x 1 k a1 + · · · + an n+1− ≥n+1− . an 4 ∗ k∈ N0
is unlimited, thus x 0. (x)
Theorem 6.3.14 (Hoover and Perkins [10]) Fix an internal measure space (, C, μ) such that μ is limited and a C-measurable function F : → ∗ R with F ≥ 0. (a) The function F is Sμ -integrable iff there exists a strongly increasing : ∗ [0, ∞[→ ∗ [0, ∞[ such that
◦ Fdμ is limited. (b) The function F is Sμ -integrable iff there exists an internal convex monotone increasing function : ∗ [0, ∞[→ ∗ [0, ∞[ with (0) = 0 such that
◦ Fdμ is limited, x under the important assumption that sup H ≤x (x) 0 for all unlimited H ∈ ∗ N.
6 Measure Theory and Integration
213
Proof (a) First we prove that the conditions on are sufficient for the Sμ integrability of F. Fix an unlimited H ∈ ∗ N. Then,
{F≥H }
Fdμ =
{F≥H }
F x · ◦ Fdμ ≤ sup ◦ F (x) H ≤x
◦ Fdμ 0.
Now assume that F is Sμ -integrable. Since limk→∞ ◦ {k≤F} Fdμ = 0, there exists a standard sequence (an )n∈N0 with a0 = 0, 1 ≤ a1 and such that for all n ≥ 1,
1 4an−1 < an and Fdμ ≤ n Fdμ + 1 . (+) 2 {an ≤F} Now (an )n∈N0 can be extended to an internal sequence (an )n∈ ∗ N0 .There exists an unlimited K ∈ ∗ N such that (+) is true for all n ≤ K . Since F is μ-integrable, we may extend (an )n≤K to an internal (an )n∈ ∗ N0 such that (+) is true for all n ∈ ∗ N. Now define (x) = (n + 1) · x − (a1 + · · · + an ) if x ∈ [an , an+1 [. It remains to prove that
◦ Fdμ =
◦ Fdμ
n∈ ∗ N0
n∈ ∗ N0
is limited.
{an ≤F}
{an ≤F
(n + 1) · Fdμ ≤
(n + 1) · F − (a1 + · · · + an )dμ ≤
n∈ ∗ N0
(n + 1)
1 2n
Fdμ + 1 .
Since n∈ ∗ N0 (n + 1) 21n = n∈ N0 (n + 1) 21n ∈ R and Fdμ + 1 is limited, ◦ Fdμ is limited. (b) The proof of (b) is similar. For example, if (x) = x p and ◦ Fdμ = F p dμ is limited for some standard p > 1, F ≥ 0, then F is Sμ -integrable by Part (b). (This result could be also obtained by using Hölder’s inequality, but Part (b) is substantially stronger.) Let us call a strongly increasing function such that ◦ Fdμ is limited a witness for the Sμ -integrability of F.
6.3.6 Keisler’s Fubini Theorem A nice application of Theorem 6.3.14 is a simple proof of the internal version of Keisler’s extreme useful Fubini Theorem, which will be used several times. Since the Loeb space over the internal product of two internal measure spaces is in general
214
H. Osswald
a strict extension of the usual product of the associated Loeb spaces, a new type of Fubini theorem for the “Loeb product” was required. This was achieved by H. Jerome spaces such that μ () Keisler. Let (, C, μ) and ( , C , μ ) be internal and μ ( ) Recall that the Loeb space over × , C ⊗ C , μ ⊗ μ is denoted by are limited. × , L μ⊗μ C ⊗ C , μ ⊗ μ L . Theorem 6.3.15 (Keisler [12]) Let F : × → ∗ R be Sμ⊗μ -integrable. Then: (1) For μ L -almost all ω ∈ , F(ω, ·) is Sμ -integrable. ω → F(ω, ·)dμ is Sμ -integrable. (2) The function μ L -a.e. defined (3) × Fdμ ⊗ μ = F(·, ω )dμ (ω )dμ = F(ω, ·)dμ(ω)dμ . Proof We may assume that 0 ≤ F and that μ () , μ ( ) 0. By normalization, we may also assume that μ and μ are probability measures. Part (3) follows from Fubini’s theorem. Let be a witness for the Sμ⊗μ -integrability of F. Since is ∗ continuous, ◦ F is C ⊗ C -measurable. Since
◦ F(ω, ω )dμ (ω )dμ(ω) = ◦ Fdμ ⊗ μ is limited,
×
by Lemma 6.3.1(a), for μ L -almost all ω ∈ , ◦ F(ω, ·)dμ is limited, thus, is a witness for the Sμ -integrability of F(ω, ·). This proves Part (1). Since is ∗ convex, and μ is a probability measure, by Jensen’s inequality (see Ash [3]),
F(·, ω )dμ (ω ) dμ ≤
◦ F(·, ω )dμ (ω )dμ is limited.
This proves Part (2). Now we will prove Keisler’s Fubini Theorem for internal tensor products of Hilbert spaces. In Sect. 6.3.7 it will be shown that tensor product of separable Hilbert spaces can be represented by tensor products of ∗ finitedimensional spaces. Fix a ∗ finite-dimensional Hilbert space F with norm ·. The d-fold tensor product F⊗d of F, d ∈ N, is the set of all internal functions F : Fd → ∗ R such that F is linear in each component. The norm on F⊗d is also denoted by · and is defined by ∗ finite-dimensional
⎛ F := ⎝
⎞1 2
F(e)2 ⎠ ,
e∈Ed
where E is an internal orthonormal basis of F. Fix a second δ-fold tensor product F⊗δ of F. The norm on F⊗δ is also denoted by ·. Now the product F ⊗ G of functions F ∈ F⊗δ , G ∈ F⊗d is defined on Fd+δ by setting
6 Measure Theory and Integration
215
F ⊗ G(a1 , . . . , ad , b1 , . . . , bδ ) = F(a1 , . . . , ad ) · G(b1 , . . . , bδ ). If F = G, then we will write F ⊗2 instead of F ⊗ F.
Lemma 6.3.16 (a) Suppose that N ⊆ × is a μ ⊗ μ L -nullset. Then for μ L -almost all ω ∈ , the cut N (ω, ·) := ω | (ω, ω ) ∈ N is a μL -nullset.
(b) Let F : → F⊗d and G : → F⊗δ be Sμ -integrable, Sμ -integrable, respectively. Then F ⊗ G : × → F⊗(d+δ) , ω, ω → F (ω) · G ω is Sμ⊗μ -integrable. Proof (a) We have to prove that
outer μouter ω | μ (N (ω, ·)) = 0 = 0,
outer thus it suffices to prove that μouter ω | μ (N (ω, ·)) ≥ k1 < k1 for all k ∈ N. Fix k ∈ N. Since, N is a μ ⊗ μ L -nullset, there exists an A ∈ C ⊗ C with N ⊆ A and μ ⊗ μ (A) < k12 . Then, by Theorem 6.3.15(3), μ
outer
outer 1 ω| μ ≤ (N (ω, ·)) ≥ k
1 μ ω | μ (A(ω, ·)) ≥ ≤ k · μ (A(ω, ·)) dμ ≤ k {ω|kμ (A(ω,·))≥1}
kdμ dμ(ω) = k A(ω,·)
1dμ ⊗ μ = kμ ⊗ μ (A) < A
1 . k
(b) We first proof the C ⊗ C -measurability of F ⊗ G, by using the well known fact that the ∗ σ-algebra generated by the cylinder sets on F⊗d coincides with the set of internal Borel sets on F⊗d (see the standard Proposition 4.6.3 in [21]). Recall that the cylinder sets Z on F⊗d are of the form
f ∈ F⊗d | f (e1 , . . . ed ) < c ,
where the e1 , . . . ed build an internal orthonormal set in F and c ∈ ∗ R. To avoid indices, we assume that d = 1 = δ. Now
216
H. Osswald
(F ⊗ G)−1 [Z ] =
ω, ω ∈ × | F(ω)(e1 ), G ω )(e2 ∈ ·−1 ]−∞, c[ .
Since ·−1 ]−∞, c[ is an open set in ∗ R2 , it has the form ·−1 ]−∞, c[ =
(Im × Jm ) ,
m∈ ∗ N
where Im , Jm are bounded internal open intervals. It follows that (F ⊗ G)−1 [Z ] is a ∗ countable union of sets of the form X × Y with X ∈ C and Y ∈ C , thus F ⊗ G is C ⊗ C -measurable. Let N ∈ C ⊗ C be a μ ⊗ μ L -nullset. Then, by (a), Nω is a μL -nullset for μ L -almost all ω ∈ . Since F is limited μ L -a.e. and G is Sμ -integrable, α : → ∗ R, ω →
G dμ · F (ω) 0 μ L -a.e.. Nω
In order to show that α is Sμ -integrable, set s := max{1, G dμ }. Then s is limited and thus for all unlimited K ∈ ∗ N
αdμ ≤
s · F dμ 0, {ω∈|α(ω)≥K }
ω∈|F(ω)≥ Ks
because F is Sμ -integrable. This proves the Sμ -integrability of α. We obtain by Theorem 6.3.15(3) and Corollary 6.3.5(b),
F ⊗ G dμ ⊗ μ = N
Note that
×
G dμ F (ω)dμ = αdμ 0. Nω
F ⊗ G dμ ⊗ μ is limited.
Using Lemma 6.3.16 and Theorems 6.3.4, 6.3.15, we obtain Corollary 6.3.17 (Keisler [12]) Suppose that f : × → R is μ ⊗ μ L integrable. Then (a) For μ L-almost all ω ∈ , f (ω, ·) is μL -integrable. is μ -integrable. (b) ω → L L f (ω, ·)dμ (c) f (ω, ·)dμ L dμ L (ω) = × f d μ ⊗ μ L . The final result in this section is a Keisler-Fubini Theorem for Bochner integrable functions with values in a separable Hilbert space H. By Theorem 6.3.4 and Keisler’s Fubini Theorem (Theorem 6.3.17), we obtain Theorem 6.3.18 Fix a Bochner integrable f : × → H. By Theorem 6.3.4, there exists a lifting F : × → ∗ H ∈ SL1 (μ ⊗ μ ,∗ H) of f .
6 Measure Theory and Integration
217
(a) For μ L -almost all X ∈ , F(X, ·) ∈ SL1 (μ ,∗ H) is a lifting of f (X, ·), thus f (X, ·) is Bochner integrable. (b) The function X → f (X, ·)dμL is Bochner integrable and the internal X )dμ (X ) belongs to SL1 (μ,∗ H) and is a lifting of X → function F(·, . f (X, ·)dμ L (c) × f d μ ⊗ μ L = f (X, ·)dμL dμ L (X ).
6.3.7 Hyperfinite Representation of the Tensor Product Fix a real separable Hilbert space H with scalar product " · , · # and norm · . For d ∈ N let H⊗d be the d-fold tensor product of H, which means that H⊗d is the set of all continuous functions f : Hd → R such that f is linear in each argument and 2 f E := e∈Ed f (e) < ∞, where E is an orthonormal basis of H. It is well known that e∈Ed f 2 (e) does not depend on the orthonormal basis of H. Let us identify numbers d ∈ N with the set {1, . . . , d}. If d = 0, then Hd = {∅}, thus H⊗d can be identified with R. It is relevant to assume that H is infinite-dimensional. Recall that E (H) denotes the set of all finite-dimensional subspaces of H. If E ∈ E (H) with dimension d, then ∗ E is also d-dimensional: (∗ e1 , . . . ,∗ ed ) is an (orthonormal) basis of ∗ E if (e1 , . . . , ed ) is an (orthonormal) basis of E. Fix F ∈ ∗ E (H) with internal orthonormal basis F. For each d ∈ N and each F ∈ ∗ H⊗d , f ∈ H⊗d we define ⎛ FFd := ⎝
⎞1 2
⎛
F (e)⎠ , f Hd := ⎝ 2
e∈Fd
⎞1 2
f (e)⎠ . 2
e∈Ed
Note that the value does not depend on the orthonormal basis of F, H, respectively. The index Fd or Hd can be dropped if it is clear in which space we are working. Theorem 6.3.19 There exists an F ∈ ∗ E (H) such that (a) ∗ E ⊆ F ⊆ ∗ H for all E ∈ E (H) and (b) f Fd := ∗ f Fd f Hd for all d ∈ N and all f ∈ H⊗d . (c) Each orthonormal basis (ei )i∈N of H can be extended to an internal orthonormal basis (bi )i≤ω of F, i.e., (bi )i≤ω is an orthonormal basis of F and bi = ∗ ei for all i ∈ N. Proof We will first show that there exists an F ∈ ∗ E (H) such that (a) and (+) are true, where ∗ 2 f d ≤ f 2 d for all d ∈ N and all f ∈ H⊗d . (+) F H By saturation, it suffices to show that there exists an F ∈ ∗ E (H) such that (+) is true and ∗ E 1 , . . . ,∗ E k ⊆ F for finitely many E 1 , . . . , E k ∈ E (H): Let (e1 , . . . , em )
218
H. Osswald
be an orthonormal basis of E 1 + · · · + E k and set F := each d ∈ N and each f ∈ H⊗d ∗ 2 f d = F
(i 1 ,...,i d
∗ 2 ∗
f ( ei1 , . . . ,∗ eid ) =
(i 1 ,...,i d
)∈m d
∗E
1
+ · · · + ∗ E k . Then for
f 2 (ei1 , . . . , eid ) ≤ f 2Hd .
)∈m d
Now fix an F ∈ ∗ E (H) such that (a) and (+) are true, and fix an orthonormal basis (ei )i∈N of H. Since for all m ∈ N, (e1 , . . . , em ) := (∗ e1 , . . . ,∗ em ) can be extended to an internal orthonormal basis of F, by saturation, (ei )i∈N can be extended to an internal orthonormal basis (bi )i≤ω of F. This proves (c). We now prove that (b) is true for F: By (+), ∗ f Fd is nearstandard for all d ∈ N and all f ∈ H⊗d . Moreover, since for all m ∈ N
f 2 (ei1 , . . . , eid ) ≤
(i 1 ,...,i d )∈m d
f 2Hd ≤
◦ ∗
◦
∗ 2
f (bi1 , . . . , bid ),
(i 1 ,...,i d )∈ω d
f 2Fd . The assertion (b) follows.
Fix d ∈ N. Recall that F⊗d is the space of all internal ∗ real valued multilinear forms on Fd , endowed with the norm ·Fd . The corresponding scalar product is denoted by "·, ·#Fd , that is "F, G#Fd :=
F(e) · G(e),
e∈Ed
where E is an internal orthonormal basis of F. Note that F⊗1 = F and ·F1 = · on F. For F, G ∈ ∗ H⊗d ∪ F⊗d we define F Fd G if (F − G)2 (e) 0. e∈Ed
An element G ∈ F⊗d is called nearstandard in H⊗d if there exists a g ∈ H⊗d such that ∗ g Fd G. Then g is called the standard part of G. Let us write g Fd G instead of ∗ g Fd G. By Theorem 6.3.19(b), each G has at most one standard part. Therefore, we may denote the standard part of G by ◦ G in case it exists. We often drop the index Fd on . Corollary 6.3.20 For each orthonormal basis E of F, ∗ e∈Ed
(◦ G) − G
2
(e) 0 and ◦ G Hd GFd .
Since the scalar product can be defined in terms of the norm, we obtain for nearstandard F, G ∈ F⊗d "◦ G, ◦ F#Hd "G, F#Fd .
6 Measure Theory and Integration
219
Note that the nearstandard multilinear forms are closed under addition and scalar multiplication with limited elements in ∗ R. We will now show that the nearstandard elements are closed under tensor products. Recall that for any f ∈ H⊗d and g ∈ H⊗m the tensor product f ⊗ g : Hd+m → R is defined by setting f ⊗ g(x, y) := f (x) · g(y) for x ∈ Hd , y ∈ Hm . Obviously, f ⊗ g ∈ H⊗(d+m) . In the same way the tensor product between functions in F⊗d and F⊗m is defined. We obtain the following simple fact: Proposition 6.3.21 If F ∈ F⊗d and G ∈ F⊗m are nearstandard, then F ⊗ G is nearstandard and ◦
(F ⊗ G) = ◦ F ⊗ ◦ G.
Proof To save indices, we assume that d = m = 1. Let g = ◦ G and f = ◦ F. Let E be an orthonormal basis of H and F an extension of E to an internal orthonormal basis of F. Then, by Hölder’s inequality, ⎛
⎝
∗
( f ⊗ g) − (F ⊗ G)
2
⎞1 2
(e, f)⎠ =
(e,f)∈F2 ⎛ ⎝
∗
2
⎞1 2
f (e) · g(f) − F(e) · G(f) ⎠ ≤ ∗
(e,f)∈F2 ⎛ ⎝
(e,f)∈F2
∗ 2
f (e) ·
∗
g−G
2
⎞1 2
⎛
(f)⎠ + ⎝ G 2 (f) · 2 (e,f)∈F
∗
f −F
2
⎞1 2
(e)⎠ =
∗ g − G · ∗ f + GF · ∗ f − F 0, F F F f H , GF ∗ gF gH and ∗ g F G and ∗ f F F. because ∗ f F 2 Recall that FF = i≤ω F 2 (ei ). Let us mention and prove the following consequence of the close connection between F⊗d and H⊗d in a slightly informal way. Corollary 6.3.22 Fix a measurable function f on a finite Loeb space with values in H⊗d . Then there exists a lifting F of f with values in F⊗d . If f is integrable, we may assume that F is S-integrable. Proof First assume that f = a · 1 B , where a ∈ H⊗d and B is measurable in a finite Loeb space. Let C be an internal approximation of B and define A ∈ F⊗d by
220
H. Osswald
setting A := ∗ a Fd . Then a Fd A, thus A · 1C is a lifting of a · 1 B . Therefore, each simple Loeb measurable function has a lifting. Let ( f n )n∈N be a sequence of Loeb measurable simple functions f n with values in H⊗d converging to f in measure and let Fn be a lifting of f n with values in F⊗d . By saturation, we extend the sequence (Fn )n∈N to an internal sequence (Fn )n∈ ∗ N with values in F⊗d and can find an unlimited M ∈ ∗ N such that FM is a lifting of f . The proof for integrable f is similar.
6.3.8 On Symmetric Functions Recall that the functions in F⊗d and in H⊗d need not to be symmetric. The following subtle notion of symmetry for functions F : T d → F⊗d or f : T d → H⊗d is important for the Malliavin calculus later. Since the difference of T d and T=d is a d ν L -nullset (see Proposition 6.2.12), we may assume tacitly that the functions, defined on T d , are identical 0 outside of T=d . Fix an internal function F : T d → F⊗d and define for each permutation σ on {1, . . . , d} Fσ (t1 , . . . , td )(a1 , . . . , ad ) = F(tσ1 , . . . , tσd )(aσ1 , . . . , aσd ) F is called symmetric if Fσ = F for all permutations σ of {1, . . . , d}. Analogous notations are used for functions with values in H⊗d or ∗ H⊗d . We have the following examples of symmetric functions: Examples (I) Let F : T → F be internal. Then F ⊗d : T=d → F⊗d is symmetric. Recall that
(a1 , . . . , ad ) F(t⊗d 1 ,...,td )
:= Ft1 (a1 ) · · · · · Ftd (ad ). (II) Suppose that F is constant, i.e., F(t) = G for all t ∈ T dwith G ∈ F⊗d . Then F is symmetric iff G is symmetric, i.e., G(a1 , . . . , ad ) = G aσ(1) , . . . , aσ(d) for all permutations σ of {1, . . . , d}. (III) Let F : T d → F⊗d be internal. If F is symmetric, then the function t → F(t)Fd ∈ ∗ R is symmetric.
d ⊗d be L (T d )Proposition 6.3.23 Fix a standard νd dd ∈ N. Let f : T → H measurable. Then f is symmetric ν L -a.e. iff f has a symmetric lifting F : T d → F⊗d . Indeed, fix a lifting F of f . Then we obtain: If f is symmetric ν d L -a.e., then 1 the permutations of σ Fσ is a symmetric lifting of f, where σ runs through d! d 1 {1, . . . , d}. Moreover, if F is symmetric, then f = d! f σ ν L -a.e., and of σ 1 course, d! σ f σ is symmetric.
6 Measure Theory and Integration
221
Proof It suffices to prove the results for functions, defined on Tkd with k ∈ N. Let σ be a permutation of {1, . . . , d}. For each internal A ⊆ Tkd we set Aσ := {(t1 , . . . , td ) | (tσ(1) , . . . , tσ(d) ) ∈ A}. Since |Aσ | = |A|, νkd (A) = νkd (Aσ ). Now fix a lifting F : Tkd → F⊗d of f. Then, by Theorem 6.2.5(e), for each ε ∈ R+ there exists an internal A ⊂ Tkd such that
(t1 , . . . , td ) ∈ Tkd | F(t1 , . . . , td ) Fd f (t1 , . . . , td ) ⊂ A and νkd (A) < ε.
Since
(t1 , . . . , td ) ∈ Tkd | Fσ (t1 , . . . , td ) Fd f σ (t1 , . . . , td ) ⊂ Aσ
and νkd (A) = νkd (Aσ ), Fσ is a lifting of f σ . Now let f be symmetric νkd L -a.e. Then for νkd L -almost all (t1 , . . . , td ) ∈ Tkd , 1 1 Fσ (t1 , . . . , td ) Fd f σ (t1 , . . . , td ) = f (t1 , . . . , td ). d! σ d! σ 1 This proves that d! σ Fσ is a symmetric lifting of f. For the converse, let F be symmetric. Then we obtain for νkd L -almost all (t1 , ..., td ) ∈ Tkd ,
f σ (t1 , ..., td ) Fd Fσ (t1 , ..., td ) = F(t1 , ..., td ) Fd f (t1 , ..., td ). Problems (1) Prove that a sequence (Fn )n∈N of Sμ -integrable functions Fn has an extension to an internal sequence (Fn )n∈∗ N in SL1 (μ) iff lim
k→∞
◦
{|Fn |≥k}
|Fn | dμ = 0 uniformly in n ∈ N
(see Hoover and Perkins [10]). (2) Use (1) to prove Vitali’s extension of the dominated convergence theorem: Let ( f n )n∈N be a sequence of μ L -integrable functions, converging to a L μ (C) measurable function f in measure. Suppose that limk→∞ {| fn |≥k} | f n | dμ L = 0 uniformly in n ∈ N. Then f is μ L -integrable and
lim
n→∞
f n dμ L =
Fdμ L .
222
H. Osswald
(3) Use (2) to prove the dominated convergence theorem. (4) If F ∈ SL2 (ν) and G : T → ∗ R is internal and T (F − G)2 dν 0, then G ∈ SL2 (ν) and F G ν L -a.e.
6.4 Internal and Standard Martingales Martingales are an important tool in stochastic analysis. Therefore we will give a brief introduction to this theory. Let us study internal martingales on the internal time line T . The results in this section are ∗ -transforms of well established results on standard discrete martingales, defined on standard finite time lines. Since a standard discrete approach to martingales is used, we refer to the book [21] for all details. Internal martingales on the time line T are very close to martingales, defined on the continuous time line [0, ∞[. We shall describe some techniques, how to convert processes, defined on T , to processes, defined on the continuous timeline [0, ∞[ and vice versa. The reader is also referred to the fundamental articles of Keisler [12], Hoover and Perkins [10] and Lindstrøm [13]. Moreover, we will see below that martingales have càdlàg versions. Fix an internal adapted probability space , C, μ, (Ct )t∈T . For internal F : T → ∗ R and t ∈ T the difference F := F(t) − F(t − ) is called the increment of F to t − t. Here t − is the immediate predecessor of t. Set F( H1 ) := 0. So F 1 = F 1 is H H the first “jump” of F. As usual, we write Ft instead of F(t). Set C0 := {∅, }. Then ECt − F = EF for t = H1 and F ∈ L 1 (μ). An internal process M : × T → ∗ R is called a (Ct )t∈T -martingale if the following conditions are fulfilled. (a) (Mt )t∈T is (Ct )t∈T - adapted, i.e., Mt is Ct -measurable for all t ∈ T . (b) M is μ-integrable, which means that Mt is μ-integrable for all t ∈ T . (c) ECs Ms+ 1 = Ms μ-a.s. if s ∈ T . H
If under (c) we have “≥” instead of “=”, then M is called a (Ct )t∈T -submartingale. By Jensen’s inequality, |M| p with 1 ≤ p < ∞ is a (Ct )t∈T -submartingale if M is a (Ct )t∈T -martingale and M is p-times integrable. If we understand Mt (X ) as the result of the chance X at time t ∈ T , then condition (a) means that the result at time t does not depend on what will happen after time t. Condition (c) means that, under the present state Ct of information, the expected result at the future time t + is identical to the achieved result at the present time t. Let us drop (Ct )t∈T in the phrases martingale or submartingale if it is clear which filtration we mean. ∗ The quadratic variation [M] : × T →2 R of a martingale M is2 defined by 2 [M]t := s≤t (Ms ) = s≤t (Ms − Ms − ) . Recall that [M] 1 = M 1 . H
H
6 Measure Theory and Integration
223
6.4.1 Stopping Times and Doob’s Upcrossing Result Stopping time techniques provide a powerful tool to truncate martingales without losing the martingaleproperty. Define T := T ∪ H + H1 and C H + 1 := C H . A function τ : → T is called H a (Ct )t∈T - stopping time if {τ ≤ t} ∈ Ct for each t ∈ T . Note that, if τ is a (Ct )t∈T stopping time iff {τ = t} ∈ Ct for each t ∈ T . We drop (Ct )t∈T - if it is clear which filtration we mean. Here is a perfect example of a stopping time: Example 6.4.1 Let A : × T → X ∈
∗R
be (Ct )t∈T -adapted. Define for c ∈
∗R
and
τ (X ) := inf {t ∈ T | |At (X )| ≥ c} . (Recall that inf ∅ = H +
1 H .)
Then τ is a stopping time.
Proof We have for each t ∈ T
{τ ≤ t} = A 1 ≥ c ∪ · · · ∪ {|At | ≥ c} ∈ Ct . H
Proposition 6.4.2 Let M be a (Ct )t∈T -martingale and let τ be a stopping time. Then the truncated process M τ : × T → ∗ R, (X, t) → M(X, τ (X ) ∧ t) is a martingale, where τ (X ) ∧ t := min {τ (X ), t}. Proof For all t ∈ T , Mtτ is Ct -measurable, because for all c ∈ ∗ R,
{τ = i ∧ Mi ≤ c} ∪ {τ > t ∧ Mt ≤ c} ∈ Ct . Mtτ ≤ c = i∈Tt
In order to prove the martingale property, fix t ∈ T with t < H and A ∈ Ct . Then, by rules for the conditional expectation and the martingale property, τ τ = E1 A∩{τ >t} Mt+ 1 − Mt = E1 A Mt+ 1 − Mt H
H
EECt 1 A∩{τ >t} Mt+ 1 − Mt = E1 A∩{τ >t} ECt Mt+ 1 − Mt = 0. H
H
Later we will use Doob’s upcrossing result, which is also an application of stopping times in connection with martingales. Fix again a (Ct )t∈T -martingale M and numbers a < b in ∗ R. Define stopping times τ1 , . . . , τn , . . . as follows:
224
H. Osswald
τ1 (X ) := inf {t ∈ T | Mt (X ) ≤ a} , τ2 (X ) := inf {t ∈ T, τ1 (X ) < t | Mt (X ) ≥ b} , τ3 (X ) := inf {t ∈ T, τ2 (X ) < t | Mt (X ) ≤ a} , τ4 (X ) := inf {t ∈ T, τ3 (X ) < t | Mt (X ) ≥ b} , and so on. Recall that inf {·} = H + H1 if {·} = ∅. Let N (X ) be the number of elements i ∈ ∗ N such that τi (X ) ≤ H . Now U[a,b] (X ) :=
N (X ) 2 N (X )−1 2
if N (X ) is even if N (X ) is odd
,
is called the number of upcrossings of the interval [a, b] by M. The proof in standard terms of the following result, due to Doob, can be found in the book of Ash [3] Theorem 7.4.2. We use transfer of the standard result into the nonstandard setting. Theorem 6.4.3 (Doob [8]) EU[a,b] ≤
1 E (M H − a)+ , b−a
where y + := max {y, 0}.
6.4.2 The Maximum Inequality The maximum inequality can be used to prove Doob’s inequality (see the standard proof of Proposition 2.3.1 and Theorem 2.4.2 in [21]). Proposition 6.4.4 (Doob [8]) Fix p ∈ [1, ∞[ and a non-negative submartingale N such that Nt ∈ L p (μ) for all t ∈ T . Then for each c ≥ 0 and for each t ∈ T
c · μ max s∈Tt
p Ns
≥ c ≤ E1maxs∈T
t
p
Ns ≥c
N p. t
6.4.3 Doob’s Inequality Doob’s inequality and the Burkholder Davis Gundy inequalities in the next section belong to the most important tools in stochastic analysis. Here are the results.
6 Measure Theory and Integration
225
Theorem 6.4.5 (Doob [8]) Fix an internal submartingale M : × T → ∗ R+ 0 and p p > 1 and suppose that M H is integrable. Then max Mt ≤ t∈T
where F p is a shorthand for
p
p M H p , p−1
1 F p dμ p .
6.4.4 The Burkholder Davis Gundy Inequalities Recall that we call a function : ∗ [0, ∞[→ ∗ [0, ∞[ with (0) = 0 strongly increasing if there exists a sequence (an )n∈ ∗ N in [0, ∞[ with a0 = 0, 1 ≤ a1 and 4an < an+1 such that for all x ∈ [0, ∞[ and all n ∈ ∗ N, (x) := (n + 1)x − (a1 + · · · + an ) if an ≤ x < an+1 . Moreover, we have used an is limited iff n ∈ N, which is not necessary for the proof of the next result. For a standard proof with all details we refer to the book [21]. Theorem 6.4.6 (Burkholder, Davis and Gandy [5]) Fix p ∈ [1, ∞[. Then there exist standard real constants c p and d p , depending on p, such that for all martingales on a discrete time line and all strongly increasing functions p ∼ p ≤ E ◦ c p · [M] H2 and E ◦ M H p ∼ p E ◦ [M] H2 ≤ E ◦ d p · M H ·
6.4.5 S-integrability of Internal Martingales Let us use the notation of the previous section. It is important to apply the internal versions of Doob’s inequality and the Burkholder Davis Gundy inequalities to internal martingales concerning external properties, like S-integrability and S-continuity. Here is the first example: Theorem 6.4.7 For each internal (Ct )t∈T martingale M and each p ∈ [1, ∞[ we obtain for all σ ∈ N
226
H. Osswald p p [M]σ2 ∈ SL1 (μ) iff Mσ∼ := max |Ms | p ∈ SL1 (μ).
s∈Tσ
p
Proof Suppose that [M]σ2 ∈ SL1 (μ). By Theorem 6.3.14, there exists a witness p
for the Sμ -integrability of c p · [M]σ2 . By Theorem 6.4.6, p p E ◦ Mσ∼ ≤ E ◦ c p · [M]σ2 is limited.
p Therefore, is also a witness for the Sμ -integrability of Mσ∼ . It follows that ∼ p Mσ ∈ SL1 (μ). The proof of the reverse implication is similar.
6.4.6 S-continuity of Internal Martingales In this section we will mention an important result, due to Hoover and Perkins, namely that, under a mild condition, a martingale is S-continuous a.s. if its quadratic variation is S-continuous a.e. To formulate this important theorem is quite simple, but the proof is not at all simple. See the proof of Theorem 10.14.2 in [21]. Theorem 6.4.8 (Hoover and Perkins [10]) Fix a (Ct )t∈T -martingale M such that EMt2 is limited for all limited t ∈ T . If [M] is S-continuous μ L -a.s., then M is S-continuous μ L -a.s. Hoover and Perkins also prove the reverse implication, but we do not need this result.
6.4.7 The Standard Part of Internal Martingales Our aim now is to convert, under mild conditions, internal martingales, defined on the ∗ finite set T to càdlàg standard martingales, defined on the continuous timeline [0, ∞[. Conversely, we lift standard martingales to internal ones. Recall that a function f , defined on [0, ∞[ is called càdlàg if the following conditions hold (C 1) f is continuous from the right, which means that lims↓r f s = fr for all r ∈ [0, ∞[ (C 2) f has left hand limits, which means that lims↑r f s exists for all r ∈ [0, ∞[. We start with a lemma which constructs from a certain internal function F : T → ∗ R a càdlàg function ◦ F : [0, ∞[→ R, which may be called the standard part of F. Lemma 6.4.9 Fix an internal F : T → t ∈ T . Set F0 := 0. Fix r ∈ [0, ∞[.
∗R
such that Ft is limited for all limited
6 Measure Theory and Integration
(a) Then lim◦ s↓r
◦ (F ) s
exists iff ◦
sup
lim
k→∞
227
r<
◦s
≤
(Fs ) ≤ lim
◦
inf
k→∞ r < ◦ s ≤ r + 1 k
r + k1
In the following assertions assume now that lim◦ s↓r [0, ∞[→ R by ◦ F r := ◦lim ◦ (Fs ) .
◦ (F ) s
(Fs ). exists. Define ◦ F :
s↓r
A result similar to (a) can be obtained for the left hand limits, where we define for r > 0 in case lim◦ s↑r ◦ (Fs ) exists,
◦l
F
r−
◦
:= ◦ lim
s↑r,s∈T
(Fs ) and
◦l
F
0−
:= 0.
Pay attention for a short moment to the difference between ◦ F r − :=
lim
s↑r,s∈[0,r [
◦l F r − and
◦ F s.
(b) There exists an r ∈ ∗ ]r, ∞[, r r such that (◦ F)r Fs for all s ∈ T with s≥ r and s r. r ∈ ∗ ]0, r [, r r such that If ◦l F r − exists with r > 0, then there exists an ◦l F r − Fs for all s ∈ T with s ≤ r and s r . (c) ◦ F is càdlàg, provided ◦l F also exists for all r ∈ ]0, ∞[. Then
◦l
F
r−
=
◦ F r− .
Proof The proof of Part (a) is left to the reader. (b) By the assumption and (a), there exists a strictly monotone increasing function g : N → N with g < h such that for all n ∈ N and for all k ∈ N with k > g(n) = ∗ g(n), 1 1 1 ⇒ Fs − ◦ F (r ) < . (+) ∀s ∈ T r + ≤ s ≤ r + ∗ k g(n) n By the Spillover Principle, for each n ∈ N there exists an unlimited K n such that (+) is true for all unlimited K ≤ K n , when we replace k by K . There exists an unlimited K ∞ ≤ K n for all n ∈ N. It follows that (+) is true for all n ∈ N, when we replace r := r + K1∞ . Note that (b) is true for this r . The proof of the second k by K ∞ . Set statement of Part (b) is similar. (c) To prove that ◦ F is right continuous in r , fix a sequence (rn )n∈N in ]r, ∞[, converging to r . Fix r r as in (b). By (b), there are tn ∈ T with tn rn and Ftn (◦ F)rn . By Theorem 2.10.18, there exists and internal extension (tn )n∈ ∗ N of
228
H. Osswald
∗ r ≤ tn for all n ∈ ∗ N. There exists (tn )n∈N with ◦ an unlimited N∞ ∈ N such ◦that tn r for all unlimited n ≤ N∞ . Assume that Frn n∈N does not converge to Fr . Then there exists a k ∈ N such that Ftn − ◦ Fr ≥ k1 for infinitely many n ∈ N. By the Spillover Principle there exists an unlimited N ≤ N∞ with Ft N − ◦ Fr ≥ k1 and t N r . However, r ≤ t N , thus, Ft N ◦ Fr ,which is a contradiction. The proof ◦ that F has left hand limits in ]0, ∞[ with ◦l F r − = (◦ F)r − is similar.
Theorem 6.4.10 (Hoover and Perkins [10]). Let M : × T → ∗ R be an internal (Ct )t∈T -martingale such that Eμ |Mt | is limited for all limited t ∈ T . (a) There exists a set U of μ L -measure 1 such that (◦ M)r (X ), ◦l M r − (X ) exist for all X ∈ U and all r ∈ [0, ∞[. Moreover, ◦ M is a càdlàg process. (b) Fix r ∈ [0, ∞[. Then there exist rl , rr ∈ ∗ [0, ∞[, rl , rr r , such that Mt is a lifting of (◦ M)r for all t ∈ T with t ≥ rr , t r and Mt is a lifting of (◦ M)r − for all t ∈ T with t ≤ rl , t r . Proof (a) Fix σ ∈ N. By Proposition 6.4.4, we have for all n ∈ N, 1 μ max |Ms | ≥ n ≤ Eμ |Mσ | . s∈Tσ n It follows that μ L maxs∈Tσ |Ms | is unlimited = 0. Let Uσ be the set of all X such that Ms (X ) is limited for all s ≤ σ. Fix a < b in Q and let U[a,b] be the number of upcrossings of [a, b] by (Ms )s≤σ . By Theorem 6.4.3, Eμ U[a,b] ≤ 1 + with (Mσ − a)+ = (Mσ − a) ∨ 0. Since Eμ (Mσ − p)+ b−a Eμ (Mσ − a) is limited, U[a,b] is standard finite μ L -a.s. Therefore, we may choose U σ such Uσ . that U[a,b] (X ) is finite for all X ∈ Uσ and for all a < b in Q. Set U = Now fix r ∈ [0, ∞[ and X ∈ U . Assume that lim◦ s↓r By Lemma 6.4.9 there exist a < b in Q such that lim
inf
k→∞ r < ◦ s ≤ r + 1 k
◦
(Ms (X )) < a < b < lim
k→∞
◦ (M
s (X ))
◦
sup r<
◦s
≤
σ∈N
does not exist.
(Ms (X )) .
r + k1
It follows that U[a,b] (X ) is infinite, which is a contradiction. In the same way we see that lim◦ s↑r ◦ (Ms ) exists μ L -a.s. for r > 0. Lemma 6.4.9(c) implies that ◦ M is a càdlàg process. (b) Let Y be a C-measurable lifting of (◦ M)r . Since lim◦ s↓r ◦ (Ms ) = ◦ Y μ L -a.s., we have for all m ∈ N, lim◦ s↓r μ L |◦ Y − ◦ (Ms )| ≥ m1 = 0. It follows that 1 ◦ | |Y lim = 0. ≥ − M μ s ◦ s↑r m Now we proceed in the same way as in the proof of Lemma 6.4.9 and find a strictly monotone increasing functions g : N → N and an unlimited N∞ such that for all m ∈ N
6 Measure Theory and Integration
r+
229
1 1 1 1 ⇒ μ |Y − Ms | ≥ < . <s ≤r+ ∗ N∞ g(m) m m
It follows that μ L {Y Ms } = 0 for all s > r + N1∞ with s r . All these Ms are liftings of (◦ M)r , thus rr := r + N1∞ fulfils Part (b). The proof of the existence of rl is similar. Under the assumptions of Theorem 6.4.10 ◦
M : × [0, ∞[→ R, (X, r ) −→ ◦lim
s↓r
◦
(M(X, s))
is called the standard part of M. ◦ M (X, ·) exists for μ L -almost all X . Corollary 6.4.11 Let M : × T → ∗ R be an internal (Ct )t∈T -martingale such that Mt ∈ SL1 (μ) for all limited t. Let (cr )r ∈[0,∞[ be the standard part of (Ct )t∈T (see Theorem 6.2.15). Then ◦ M is a càdlàg (cr )r ∈[0,∞[ -martingale. Proof Fix r ∈ [0, ∞[ and an r r such that for all s r, r ≤ s, Ms is a lifting of ◦ M . Now ◦ M is μ -integrable, because all these M are S-integrable. It also follows r r L s property, that ◦ Mr is cr -measurable. In order to prove the martingale fix u ∈ [0, ∞[ with u > r , and B ∈ cr . We have to prove that B ◦ Mu dμ L = B ◦ Mr dμ L . We may choose s with s r, r ≤ s such that there is an A ∈ Cs with μ L (AB) = 0. Fix t u such that Mt is a lifting of ◦ Mu . Then we obtain
◦ B
◦
Mu dμ L = A
Mt dμ L
Mt dμ = A
◦
Ms dμ A
Mr dμ L .
B
Now we construct from certain (cr )r ∈[0,∞[ -martingales m internal martingales M, whose standard part ◦ M is m. In particular, ◦ M is a càdlàg version of m, i.e., for each r ∈ [0, ∞[ there exists a set Ur of μ L -measure 1 such that m r (X ) = (◦ M)r (X ) for all X ∈ Ur . Theorem 6.4.12 Suppose that m : → R is integrable. Let M ∈ SL1 (μ) be a lifting of m. Define m r := Ecr m for r ∈ [0, ∞[ and Mt := ECt M for t ∈ T . Then for all r ∈ [0, ∞[ ◦ M r := ◦lim ◦ Ms = m r μ L -a.s. s↓r
Proof Note that (m r )r ∈[0,∞[ and (Mt )t∈T are (cr )r ∈[0,∞[ -martingales, internal (Ct )t∈T -martingales, respectively, and Mt ∈ SL1 (μ) for all t ∈ T . By Theorem 6.4.10, lim◦ s↓r ◦ (Ms ) = (◦ M)r exists for all r ∈ [0, ∞[ μ L -a.s. and defines a càdlàg martingale. It remains for us to prove that (◦ M)r = m r μ L -a.s. Let B ∈ cr . In the previous theorem we have seen that there is an s r such that Ms is a lifting of (◦ M)r and such that there exists an A ∈ Cs with μ L (AB) = 0. We obtain
230
H. Osswald
◦
M
B
dμ L r
Ms dμ = A
Mdμ A
mdμ L = A
m r dμ L . B
Much to my regret, applications of measure and probability theory on Loeb spaces cannot be covered exhaustively. Many beautiful areas, where Loeb spaces have been applied successfully, will not be considered here. Examples are Hoover and Perkins’ general martingale theory and stochastic differential equations, Perkins’ approach to local time, Cutland, Kopp and Willinger’s stochastic approach to financial mathematics (see, however, Chap. 9), Lindstrøm’s work about Brownian motion on fractals, Lindstrøm work on finite-dimensional Lévy processes, Hoover and Keisler’s probability logic, and so on. We now proceed, in the next chapter, to applications of Nonstandard Analysis to the foundations for the Malliavin calculus on abstract Wiener spaces and, as an example, for symmetric Poisson processes instead of more general Lévy processes like in [21]. Using the ∗ -extension ∗ N of the set N of natural numbers, we have extended the notion of “finiteness” so that stochastic analysis, even in the infinitedimensional case, is very similar to the elementary calculus on finite-dimensional Euclidean spaces, extending the approach of Cutland and Ng [7] to one-dimensional Brownian motion. Problems Recall the notation of Sect. 6.4.7. (1) Prove Part (a) of Lemma 6.4.9. (2) Prove that ◦ F has left hand limits in ]0, ∞[ with ◦l F r − = (◦ F)r − . ◦l r ∈ ∗ ]0, r [, (3) Assume that F r − exists with r > 0. Prove that there exists an ◦l r and s r . r r such that F r − Fs for all s ∈ T with s ≤ (4) Let l 1 and l be the measures, defined in Sect. 6.1. Prove that for l L -almost all X ∈ ∗ RT and all r ∈ [0, ∞[ Bl (X, r ) := lim
◦s↓r
◦
B(X, s)
exists and is a càdlàg process. Hint: (X, t) → s≤t X s − El 1 x is an lsquare integrable (Bt )t∈T -martingale, where (Bt )t∈T is defined in the problems to Sect. 6.2.
References 1. S. Albeverio, J.E. Fenstad, R. Høegh Krohn, T. Lindstrøm, Nonstandard Methods in Stochastic Analysis and Mathematical Physics (Academic Press, Orlando, 1986) 2. R.M. Anderson, A nonstandard representation of Brownian motion and Itô integration. Isr. J. Math. 25, 15–46 (1976) 3. R.B. Ash, Real Analysis and Probability (Academic Press, New York, 1972)
6 Measure Theory and Integration
231
4. J. Berger, H. Osswald, Y. Sun, J.L. Wu, On nonstandard product measure spaces. Ill. J. Math. 46, 319–330 (2002) 5. D.L. Burkholder, B.J. Davis, R.F. Gandy, Integral inequalities of convex functions of operators on martingales, in Proceedings of 6th Berkeley Symposium, vol. 2 (University of California Press, Berkeley, 1970), pp. 223–240 6. N. Cutland, Infinitesimals in action. J. Lond. Math. Soc. 35, 202–216 (1987) 7. N. Cutland, S.-A. Ng, A nonstandard approach to the Malliavin calculus, in Advances in Analysis, Probability and Mathematical Physics—Contributions of Nonstandard Analysis, ed. by S. Albeverio, W.A.J. Luxemburg, M.P.H. Wolff (Kluwer Academic Publishers, Dordrecht, 1995), pp. 149–170 8. J.L. Doob, Stochastic Processes (Wiley, New York, 1965) 9. L. Gross, Abstract Wiener spaces, in Proceedings of 5th Berkeley Symposium on Mathematical Statistics Probability Part I (University of California Press, Berkeley, 1965), pp. 31–41 (1988) 10. D.L. Hoover, E.A. Perkins, Nonstandard construction of the stochastic integral and applications to stochastic differential equations I and II. Trans. Am. Math. Soc. 275, 1–58 (1983) 11. A.E. Hurd, P.A. Loeb, An Introduction to Nonstandard Real Analysis (Academic Press, Orlando, 1985) 12. H.J. Keisler, An infinitesimal approach to stochastic analysis, Mem. Am. Math. Soc. 48 (1984) 13. T. Lindstrøm, Hyperfinite stochastic integration I, II, III, and addendum. Math. Scand. 46, 265–333 (1980) 14. T. Lindstrøm, Hyperfinite Lévy processes. Stochastics 76(6), 517–548 (2004) 15. P.A. Loeb, Conversion from nonstandard to standard measure spaces and applications in probability theory. Trans. Am. Math. Soc. 211, 113–122 (1975) 16. P.A. Loeb, Applications of nonstandard analysis to ideal boundaries in potential theory. Isr. J. Math. 25, 154–187 (1976) 17. P.A. Loeb, A functional approach to nonstandard measure theory. Contemp. Math. 26, 251–261 (1984) 18. P.A. Loeb, H. Osswald, Nonstandard integration theory in topological vector lattices. Mh. Math. 124, 53–82 (1997) 19. W.A.J. Luxemburg, A general theory of monads, in Application of Model Theory, Algebra, Analysis and Probability, ed. by W.A.J. Luxemburg (Hold, Rinehart and Winston, New York, 1969), pp. 18–86 20. H. Osswald, Vector valued Loeb measures and the Lewis integral. Math. Scand. 68, 247–268 (1991) 21. H. Osswald, Malliavin Calculus for Lévy Processes and Infinite-Dimensional Brownian Motion, vol. 191, Cambridge Tracts in Mathematics (Cambridge University Press, Cambridge, 2012) 22. K.D. Stroyan, J.M. Bayod, Foundations of Infinitesimal Stochastic Analysis. North-Holland Studies in Logic, vol. 119 (North-Holland Publishing Co., Amsterdam, 1986) 23. Y.N. Sun, A theory of hyperfinite processes, the complete removal of individual uncertainty via exact LLN. J. Math. Econ. 29, 419–503 (1998)
Chapter 7
Stochastic Analysis Horst Osswald
7.1 Introduction In this chapter we apply the Saturation Principle to the profound mathematical theory of stochastic analysis for one-dimensional symmetrical Poisson processes and for Brownian motion with values in abstract Wiener spaces (H, B). Here H is a separable Hilbert space inside the superstructure V with norm · and B is the completion of H with respect to a Gross measurable norm |·| on H (see Sect. 6.2.2). The Borel measure on B is a Gaußian measure, induced by a Gaußian measure on the cylinder sets of H. The prototype of an abstract Wiener space is the Fréchet space of convergent sequences, endowed with the topology of pointwise convergence, over the Hilbert space of square summable sequences. It is well known that for each separable Banach space there exists a Hilbert space H such that (H, B) becomes an abstract Wiener space (see the Lecture Notes of Kuo [22]). Since the Hilbert space is the most important part of an abstract Wiener space, we have established ∗ finite-dimensional representations of separable Hilbert spaces. We have already seen that for each separable Hilbert space there exists a ∗ finite dimensional space F with ∗[H] ⊆ F ⊆ ∗ H. The scalar product ◦ F, ◦ G in H⊗d of the standard parts ◦ F, ◦ G of functions F, G in F⊗d is infinitely close the scalar 6.3.7).If H = R, then F is simply ∗ R. product F, G in F⊗d (see 1 Sect. 2 The time line T := H , H , . . . , H , the sample space FT for our probability spaces and the internal stochastic process B : FT ×T → F with B(X, t) := s≤t X s are always fixed. Recall that FT denotes the set of all internal H 2 -tuples of elements in F. The internal filtration (Bt )t∈T on FT is also fixed forever, where Bt = A × FT \Tt | A is an internal Borel set in FTt . H. Osswald (B) Mathematisches Institut der Universität München, Theresienstr. 39, 80333 Munich, Germany e-mail: [email protected] © Springer Science+Business Media Dordrecht 2015 P.A. Loeb and M.P.H. Wolff (eds.), Nonstandard Analysis for the Working Mathematician, DOI 10.1007/978-94-017-7327-0_7
233
234
H. Osswald
Recall that Tt := {s ∈ T | s ≤ t}. Sometimes we shall use the filtration (Bt − )t∈T on FT with B0 = FT , ∅ . The standard part (br )r ∈[0,∞[ of (Bt )t∈T is defined in Sect. 6.2.7. Notice that (br )r ∈[0,∞[ is also the standard of (Bt − )t∈T . Let us turn now to the case H = R and F = ∗ R. In Sect. 6.1, we have already mentioned, that there exists a large class of internal Borel probability measures p 1 on ∗ R such that the standard part B p of the process B under p 1 exists, which means the following: Let p L be the Loeb measure over the H 2 -fold product p of p 1 on ∗ RT . Then the process B p : ∗ RT × [0, ∞[→ R, (X, r ) := olim
s↓r
◦
(B (X, s))
is p L -almost surely well defined and a one-dimensional càdlàg L évy process. In Sect. 15 of the book [36], where Lindstrøm’s [25] beautiful approach to arbitrary finite-dimensional Lévy processes is modified to our concept, the reader can find a proof of the following fact: for any one-dimensional standard Lévy process L : × [0, ∞[→ R on an arbitrary probability space there exists an internal Borel probability measure l 1 on ∗ R such that L coincides with the standard part of B under l 1 . We have already mentioned in Sect. 6.1 that we identify two Lévy processes if they fulfil the same so called Lévy triplet. Essentially, in his work [25] Lindstrøm has proved the preceding result for Lévy processes of arbitrary finite dimension, not only for the one-dimensional case. However in the book [36] and here stochastic integration for infinite-dimensional Brownian will be studied. T The sample space for d-dimensional Lévy processes with d ∈ N would be ∗ Rd . In case of infinite-dimensional Brownian the sample space is (∗ Rω )T , where ω is an unlimited number in ∗ N. In the nonstandard approach to Malliavin calculus for Lévy processes there is a further interesting advantage: As we will see, it is possible to orthonormalize the increments of a Lévy process, not only the whole process itself as in the literature (see for example the work of Nualart and Schoutens [30]). In case of Brownian motion as a continuous process the increments are always 0. My five years old grandson Silas said “0 is nothing”. However, using infinitesimals representing 0, there exist different kinds of 0, small infinititesimals and large infinitesimals, which are different from 0 except 0 itself. All these infinitesimals collapse to 0 in standard analysis, but they provide the possibility to orthonormalize the increments of a process in the internal setting. In his work [25] Lindstrøm uses large infinitesimals, he called them “splitting infinitesimals”, to establish Lévy triplets and to prove that each Lévy process can be splitted into its continuous and pure jump part (see also [36] in the setting here). Here are two further examples of Lévy processes in addition to the examples in Sect. 6.1.
7 Stochastic Analysis
235
Examples (1) Fix H = R, F = ∗ R and a standard β > 0. Let the internal Borel probability measure π 1 on ∗ R be concentrated on {0, 1, −1}, defined by π 1 ({0}) := 1 −
β 1 β β , π ({1}) := , π 1 ({−1}) := . H 2H 2H
Let π be the H 2 -fold product of π 1 on the Borel sets of ∗ RT . Let us prove that the process B is a (Bt )t∈T -martingale. To this end fix t ∈ T and C ∈ Bt . Then C = A × ∗ RT \Tt for some internal Borel set A in ∗ RTt . Now Bt+ 1 dπ = Bt dπ + X t+ 1 dπ = Bt dπ + xdπ 1 H
C
∗ RT \Tt
C
H
∗R
C
with ∗ R xdπ 1 = 0. This proves that EBt Bt+ 1 = Bt π-a.s. Moreover, we have for H all t ∈ T ⎛ ⎞
Eπ Bt2 = Eπ ⎝ X s2 + Xs · Xt ⎠ = s=r ∈Tt
s∈Tt
t · H · Eπ1 x 2 + t · H · (t · H − 1) Eπ1 x · Eπ1 y = t · H · Eπ1 x 2 = β · t. It follows that Eπ Bt2 is limited for all limited t, thus Bt ∈ SL1 (π). By Corollary 6.4.11, the standard part Bπ of B exists π L -a.s. and is a càdlàg (br )r ∈[0,∞[ -martingale. This process Bπ is a symmetric Poisson process of rate β. The first nonstandard approach to Poisson processes is due to Loeb [26]. (2) Fix an abstract Wiener space (H, B), a ∗ finite representation F of H and an orthonormal basis (ei )i≤ω of F. The scalar product on F is denoted by ·, ·. Let ω be the internal centered Gaußian measure on F of variance H1 , i.e., for all internal Borel sets A in F ω H 2 X, − H2 e ω i d (X ) i≤ω (A) := e := 2π A {(xi )i≤ω ∈ ∗ Rω |
ω
i=1 xi ei ∈A
}
e
− H2
2 i≤ω xi
d (xi )i≤ω
H 2π
ω .
Let be the internal H 2 -fold product of ω on the internal Borel sets of FT . Now the standard part B : FT × [0, ∞[→ B of B is defined for all limited t ∈ T by B (X,◦ t) :=
◦
(B (X, t)) .
B is L -almost surely well defined and a continuous Brownian motion with values in the Banach space B. Here ◦ (B (X, t)) denotes the standard part of B (X, t) with
236
H. Osswald
respect to the Gross measurable norm |·| on B. Moreover, for each continuous function f : [0, ∞[→ B one can construct an X ∈ FT such that f = B (X, ·). The proofs of all these results can be found in the book [36] in Chap. 11. This construction of a Brownian motion extends the work of Cutland and Ng [8] for one-dimensional Brownian motion. The first nonstandard approach to Brownian motion is due to Anderson [3]. Now an answer should be given to the question: what is a Banach space valued Brownian motion? Recall that a process b with values in a d-dimensional Hilbert space G, d ∈ N, is a d-dimensional Brownian motion if each component of d is a Brownian motion and these components are running independently on orthogonal axes. To be more precise, fix an orthonormal basis (ei )i≤d of G. Then b is a Brownian motion if each d, ei is a one-dimensional Brownian motion and d, ei is independent of d, e j for i = j. Let us turn now to the infinite dimensional case. Not only H is dense in B under |·|, the topological dual B of B is dense in H = H under the original norm · on H (see Proposition 4.3.6 in [36]). Now a process b with values in B is a Brownian motion if ϕ ◦ b is a one dimensional Brownian motion for each ϕ ∈ B with ϕ = 1 and ϕ ◦ b and ψ ◦ b are independent for ϕ ⊥ ψ. Note that this notion of infinite-dimensional Brownian motion really extends the finite-dimensional case. The one-dimensional, two-dimensional centered Gaußian measure on ∗ R, ∗ R2 of variance H1 is denoted by 1 , 2 , respectively. Our main concern is to present—at moderate speed—an introduction to some basic facts of the beautiful and powerful theory of stochastic analysis. Following the book [36], we imagine a journey on a slowly moving Brownian particle through the Itô and iterated Itô integral to the chaos decomposition theorem, which is the key to the Malliavin calculus. We study the Brownian motion B and the symmetric Poisson process Bπ . A detailed nonstandard approach to Malliavin calculus for B and for more general Lévy processes can be found in the book [36]. In the standard approach to the Malliavin calculus chaotic representations of Lévy functionals often serve as basis for the Malliavin calculus for Lévy processes. We refer to the books or articles [5, 6, 22, 27, 29, 38, 41–43, 45] for infinite-dimensional Brownian motion, and to [9, 23, 29, 30, 40, 42] for finite dimensional Lévy processes. Smolyanov and von Weizsäcker [39] and Bogachev [5] have developed an approach to the Malliavin calculus, applying differentiability of measures. T. Lindstrøm [24], Keisler [21], Hoover and Perkins [19] have applied nonstandard methods to stochastic analysis for the first time as far as I know. The standard theory of Stochastic Analysis for infinite-dimensional Brownian motion can be found for example in the work of Duncan and Veraiya and Duncan (see [15, 16]).
7 Stochastic Analysis
237
7.2 The Itô Integral for the Brownian Motion In order to define the Itô integral with respect to the Brownian motion B as a continuous process, we first introduce the internal Riemann-Stieltjes integral with respect to the internal Brownian motion B and then prove that this integral is S-continuous, provided that the integrand is S -square integrable and (Bt − )t∈T -adapted. Therefore, it can be converted to a continuous process. We use the notation of the preceding chapter. We drop the index at the norm · and the scalar product ·, · if it is clear, in which space H or ∗ H or F we take the norm or scalar product. Fix an internal : FT × T → F. We define B: FT × T → ∗ R by setting
B (X, t) :=
s (X )(X s ) :=
s∈Tt
s (X ), X s ,
s∈Tt
where we have identified F with its dual in the first equality. Note that B(X, t) = (s,i)∈Tt ×ω s (X ), ei · X s , ei ,where (ei )i∈ω is an internal orthonormal basis of F. Let us identify n ∈ ∗ N with the set {1, . . . , n}. We will now see
that for (Bt − )t∈T -adapted and S-square integrable the standard part ◦ B of B is the Itô integral of the standard part ◦ of with respect to the B-valued Brownian motion B . Moreover, in the non-adapted case, ◦ B becomes the Skorokhod integral. 2
We will also see that, if is (Bt − )t∈T -adapted and
locally in SL ( ⊗Tν, F), then B can be converted to a continuous process d B , defined on F × [0, ∞[ by setting for all limited s ∈ T ,
d B ·, ◦ s :=
◦
B(·, s).
This process is L -a.s. well defined. If, in addition, is a lifting of a standard function ◦ g: FT × [0, ∞[→ H (i.e.,
g (X, t) (X, t) for ( ⊗ ν) L -almost all (X, t)), then the stochastic integral gd B of g is identical to d B . Remark 7.2.1 However, there exist many which are not liftings of standard functions: Note that the following function F is locally in SL2 (ν, F), thus locally in SL2 ( ⊗ ν, F), and is (Bt − )-adapted, but F is not a lifting of a standard function: Let a ∈ H, a = 0. Set ∗ a if t · H is odd F(t) := 0 if t · H is even. Therefore, we have here an extension of the standard integration theory.
238
H. Osswald
7.2.1 The S-Continuity of the Internal Integral The following lemma will be used over and over again. The proof uses the rules for working with the conditional expectation and the fact that for centered real Gaußian measures γ of variance σ x 2k−1 dγ = 0 = x · ydγ 2 and x 2k dγ = σ k (2k)!!, R
R2
R
where n!! = (n − 1) · (n − 3) · (n − 5) · · · · · 1 for even n ∈ N. Let us identify X t with the projection FT → F, X → X t . Fix an internal orthonormal basis E := (ei )i∈ω of F. Lemma 7.2.2 Fix e, f ∈ E. Then we obtain -a.s. for each t ∈ T : (a) (b) (c) (d)
EBt − EBt − EBt − EBt −
(e) EBt − (f) EBt − (g) EBt −
X t , e2k−1 = 0 for all k ∈ ∗ N. (X t , e · X t , f) = 0 if e = f. · X f) = 0 if s < t. This is true for all (e, f) ∈ E2 . (X t , e 2n s , (2n)!! X t , e = Hn . 2 2 X t , e − H1 = H22 . X , e2 − 1 X , f2 − 1 = 0 for e = f. t 2 H1 t 2 H1 X t , e − H X s , f − H = 0 for s < t.
The following lemma is a simple application of this lemma, which will be used over and over again. Lemma 7.2.3 Fix (Bt − )t∈T -adapted , : FT × T → F with t , t ∈ L 2 (, F) for all t ∈ T . Then (a) for all σ ∈ T , E
B
σ
·
B
σ
s , s dν(s) ∈ ∗ R,
=E Tσ
2
in particular, E B σ = E Tσ s 2 dν(s) < ∞ in ∗ R.
(b) B is an internal (Bt )t∈T -martingale. Proof (a) We obtain, using the preceding Lemma, E E
s,t∈Tσ ,(e,f)∈E2
B
σ
·
B
σ
=
s , e · t , f · X s , e · X t , f = α + β + γ,
7 Stochastic Analysis
239
where α=E
s , e · t , f · X s , e · X t , f = 0,
s=t∈Tσ ,(e,f)
because for s < t,
∈E2
E s , e · t , f · X s , e · X t , f = E s , e · t , f · X s , e · EBt − X t , f = 0.
β=E
s , e · s , f · X s , e · X s , f =
s∈Tσ ,e=f∈E
E
s , e · s , f · EBs − X s , e · X s , f = 0.
s∈Tσ ,e=f∈E
γ=E
s , e · s , e · X s , e2 =
s∈Tσ ,e∈E
E
s , e · s , e · EBs − X s , e2 =
s∈Tσ ,e∈E
E
s∈Tσ ,e∈E
s , e · s , e
1 =E H
s , s dν(s). Tσ
This proves Part (a). The proof of Part (b) is similar. We use the following trick, essentially due to Lindstrøm (see [2]): In order to
handle the quadratic variation of B successfully, we modify the timeline T to the timeline T := {(s, i) | s ∈ T, i ∈ {1, . . . , ω}}. On T we use the lexicographic order, denoted by <. We can identify FT with ∗ RT via X −→ (X s , ei )i∈ω,s∈T . Set E i be the internal linear space, generated by {e1 , . . . , ei }, and define for s ∈ T and i ∈ ω, αs,i : FT → FTs − × E i , X → X 1 , . . . , X s − , πF Ei (X s ) , H
where πF Ei denotes the projection from FT onto TE i . Now define a new filtration Bs,i (s,i)∈T on the internal Borel sets B F of F setting −1 B(s,i) := αs,i [A] | A ∈ B FTs − × E i .
240
H. Osswald
−1 Note that αs,i [A] := X ∈ FT | X 1 , . . . , X s − , πF Ei (X s ) ∈ A . Define 1 − TH , 1 := 0 and recall that B := F , ∅ . Note that an internal Borel mea0 H ifand only if F(X ) = F(Y ) for all surable function F: FT → F is Bs,i -measurable X, Y ∈ FT with X =(s,i) Y , i.e., X t , e j = Yt , e j for all (t, j) ≤ (s, i). Moreover, Bs − ⊆ B(s,i) ⊆ B(s,ω) = Bs . We modify each internal : FT × T → F to : FT × T → ∗ R, setting (X, (s, i)) := (X, s), ei . Suppose that is (Bs − )s∈T -adapted. Then is (Bs − )s∈T -adapted. If, in addition,
(·, r ) ∈ L 2 (, F) for all r ∈ T , a new internal integral B: FT × T → ∗ R is defined, setting
B (X, (σ, j)) :=
s (X ), ei · X s , ei .
(s,i)≤(σ, j)
Note that B is a -square integrable (Bs )s∈T -martingale. We call (s, i) limited if s is limited, and we call (s, i) infinitely close to (t, j) if s t. Lemma 7.2.4 Fix a (Bt − )-adapted locally in SL2 ( ⊗ ν, F) and limited σ ∈ T , r ∈ T . Then E maxt∈Tσ
2 B t
, E maxt∈T r
2 B
are limited. t
There exists a set U of L -measure 1 such that for all X ∈ U and all limited t ∈ T , 2
2 s ∈ T , B(X, t) , B(X, s) are limited. Proof Fix σ ∈ N. Then σ ∈ T and (σ, ω) =: ρ ∈ T . By Lemma 7.2.3 and Doob’s inequality, E maxt∈Tσ
2 ≤4·E
B
2 B
t
In the same way we obtain that E maxt∈T ρ
B
is limited. σ
2
is limited. It follows that 2
B t (X ) and there exists a set Uσ of L -measure 1 such that maxt∈Tσ
2 Uσ . Then maxt∈T ρ B t (X ) are limited for all X ∈ Uσ . Set U := t
σ∈N
L (U ) = 1. Since for all limited t ∈ T , s ∈ T , there exists a σ ∈ N with t ≤ σ, s ≤ (σ, ω) , the proof is finished.
7 Stochastic Analysis
241
In the following we fix a (Bt − )-adapted locally in SL2 ( ⊗ ν, F). The proof of the following result is a consequence of Lemmas 7.2.3 and 7.2.4 and Theorem 6.4.8 and is left to the reader:
Lemma 7.2.5 (a) If B(X, then B(X.·) is S-continuous. ·) S-continuous, B of B is S-continuous L -a.s., then (b) If the quadratic variation
-a.s., S-continuous B is L B (σ, j) (X ) = (s,i)≤(σ, j) s (X ), ei 2 · X s , ei 2 . (c) Define
: FT × T → ∗ R, (X, σ) −→
s (X ), ei 2 X s , ei 2 .
(s,i)∈Tσ ×ω
B (X ) is S-continuous.
From this lemma it follows that, in order to prove that B is S-continuous L -a.s., it suffices to show that is S-continuous L -a.s. To this end we define for each m ∈ ∗ N stopping times τm : FT → T ∪ H + H1 , setting
(d) If (X ) is S-continuous, then
2 s dν(s) ≥ m , τm := inf t ∈ T | Tt
where inf ∅ := H +
1 H.
Set for X ∈ FT , t ∈ T and m ∈ ∗ N m t := 1T
1 τm − H
tm :=
(t)t ,
2 2 m s , ei ·s , ei .
(s,i)∈Tt ×ω
The key for the main results in this and the subsequent section is the following lemma: Lemma 7.2.6 Fix m ∈ ∗ N, k, σ ∈ N and r ∈ Tσ . (a) rm is Br − -measurable.
2 m 2 dν(s) = (b) r → rm − Tr m · ·s , ei 2 − H1 is a s i∈ω,s∈Tr s , ei 2
2 -square integrable (Br )r ∈Tσ -martingale and E rk − Tr ks dν(s) 0. 2
2 (c) E maxr ∈Tσ rk − Tr ks dν(s) 0. (d) σk ∈ SL1 (). (e) limk→∞ ◦ Eσk = ◦ Eσ . Proof (a) is true, since is (Br − )r ∈Tσ -adapted. (b) Using Lemma 7.2.2, we obtain (see the proof of Lemma 7.2.3 Part (a)),
242
H. Osswald
⎛ ⎞ 2 2
1 ⎠ 2 2 m2 dν(s) = E ⎝ m E rm − = s s , ei · ·s , ei − H Tr i∈ω,s∈Tr
1 2 2 · ·s , ei − = H
4 m s , ei
E
i∈ω,s∈Tr
E
4 m s , ei
Bs −
·E
i∈ω,s∈Tr
2·E
m s , ei
i∈ω,s∈Tr
where α :=
{(s,t)∈Tr2 |s=t }
1 2 2 ·s , ei F − = H 4 1 ≤ 2Eα, H2
m 2 m 2 2 · dν (s, t). s
t
2 Note that Eα is limited in ∗ R. Now let m := k. Since Tr ks dνr (s) < k, 2 α ∈ SL1 (). Since k ∈ SL1 ( ⊗ νr ), from Theorem 6.3.15 it follows that 2 t → kt ∈ SL1 (νr ) L -a.s., thus, by Lemma 6.3.16 (b) 2 2 (s, t) → ks · kt ∈ SL1 (νr2 ) L -a.s. It follows that α 0 L -a.s., because ν 2 (s, t) ∈ Tr2 | s = t 0. Therefore, Eα 0. The martingale properties follow from Lemma 7.2.2. (c) follows from inequality. (b) andDoob’s 2 k k (d) By (b), E σ − Tσ s dν(s) 0. Therefore, for each N ∈ B, N
σk d ≤ E σk −
Tσ
2 k s dν(s) +
N ×Tσ
2 k s d ⊗ ν(s)
0 if (N ) 0 . is limited
It follows that σk ∈ SL1 (). (e) Assume that (e) is not true. By saturation, there exists a standard ε > 0 and an unlimited M ∈ ∗ N such that
7 Stochastic Analysis
243
ε ≤ A := E − M = σ
E
2 (s )2 (ei ) − sM (ei ) EBs − ·s , ei 2 =
i∈ω,s∈Tσ
E Tσ
By Theorem 6.3.15, L -a.s. Therefore,
Tσ
2 s 2 − sM dν(s).
s 2 dν(s) ∈ SL1 (), thus,
s 2 dν(s) = Tσ
Tσ
Tσ
s 2 dν(s) is limited
M 2 s dν(s) L -a.s.
This proves that A 0, which is a contradiction.
Theorem 7.2.7 B is S-continuous L -a.s. Proof We use Lemma 6.3.1 and Theorem 6.3.15. There exists a U ∈ L (B) with L (U ) = 1 and such that for all X ∈ U and all σ ∈ N: 2
(i) tk (X ) Tt ks (X ) dν(s) is limited for all k ∈ N, and t ∈ Tσ .
(ii) Tσ s (X )2 dν(s) is limited, thus there exists a k ∈ N with s (X ) = ks (X ) for alls ∈ Tσ . 2
2 (iii) t → kt (X )F ∈ SL1 (νσ ), therefore, Ts \Tt rk (X ) dν(r ) 0 if s t and t ≤ s ∈ Tσ . We obtain for all X ∈ U and all limited s, t ∈ T with s t and s ≤ t: 2 2 k k k s (X ) = s (X ) r (X ) dν(r ) r (X ) dν(r ) t (X ), Ts
Tt
where we may assume that s, t ∈ Tσ . Moreover, t (X ) is nearstandard for all limited
t ∈ T and all X ∈ U . By Lemma 7.2.6, B is S-continuous L -a.s.
7.2.2 The S-Square-Integrability of the Internal Itô Integral Proposition 7.2.8 Fix a (Bt − )t∈T -adapted : FT × T → F locally in SL2 ( ⊗ν, F) and σ ∈ N. Then (a)
max B(·, t) ∈ SL2 (). t∈Tσ
244
H. Osswald
(b)
B FT × Tσ ∈ SL2 ( ⊗ νσ )
Proof (a) We use the notation in the preceding section. Note
that it suffices to prove that maxs∈T ,s≤(σ,ω) Bs ∈ SL2 (). Since B is a (Bt )t∈T 2
martingale and E B (σ,ω) is limited, by Theorem 6.4.7, it suffices to show
that B (σ,ω) ∈ SL1 (). Therefore, it suffices to prove that σ ∈ SL1 (). Now, by Lemma 7.2.6 Parts (d) and (e), limk→∞ ◦ E( − k )σ = 0 and σk ∈ SL1 (). It follows that σ ∈ SL1 () (see Corollary 6.3.3 Part (γ)). (b) follows from (a). Corollary 7.2.9 Fix a (Bt − )t∈T -adapted : FT × T → F in SL2 ( ⊗ ν, F). Then
t∈T
thus,
∈ SL2 (),
B
max
t
B ∈ SL2 ( ⊗ ν) .
Proof Note that, by Doob’s inequality, for σ ∈ N, ◦
E max
−
B
t∈T \Tσ
2
B
σ
t
4 · ◦E
t∈T \Tσ
t 2F
⎛
⎞2
≤ 4 · ◦E ⎝
t (X )(X t )⎠ =
t∈T \Tσ
1 →σ→∞ 0. H
Assume that this convergence fails. Then there exists a standard ε > 0 and infinitely many σ ∈ N with E t∈T \Tσ t 2 H1 ≥ ε. By the Spillover Principle there exists an unlimited S ∈ ∗ N with E t∈T \TS t 2 H1 ≥ ε, which contradicts the S⊗ν square-integrability of . Therefore, for all σ ∈ N,
B
max
A t∈T
max
A t∈Tσ
B t
2
d
2
d
≤
t
1
2
1
2
+
max
A t∈T \Tσ
1
2 B
2
d
.
t
By Proposition 7.2.8(a), the first summand is limited and infinitesimal if (A) 0. The second summand equals
7 Stochastic Analysis
245
max
A t∈T \Tσ
A t∈T \Tσ
B
t
−
B t
σ
B
1
2
+
σ
1
2 B
A
≤
d
2
d σ
B
+
1
2
2
max
−
B
2
d σ
,
which is limited and can be made arbitrarily small standard if (A) 0. This proves the assertion.
7.2.3 Adaptedness and Predictability Recall from Sect. 6.2.7 the construction of the standard part (br )r ∈[0,∞[ of the internal filtration (Bt )t∈T . Notice that (br )r ∈[0,∞[ is also the standard part of (Bt − )t∈T . In order to define the standard Itô integral for (br )r ∈[0,∞[ -adapted square integrable integrands with values in H we use the fact that (br )r ∈[0,∞[ -predictable processes coincide with (br )r ∈[0,∞[ -adapted processes (see Corollary 5.4.2. in [36]). Here are some details. Fix r, s in [0, ∞[ with r ≤ s. Sets of the form C×]r, s], C×]r, ∞[ with C ∈ br or sets C × [0, s], C × [0, ∞[ with C ∈ b0 are called (br )r ∈[0,∞[ -predictable rectangles. Let P be the σ-algebra, generated by the predictable rectangles. A process f , defined on FT × [0, ∞[, is called (br )r ∈[0,∞[ -predictable if f is P-measurable. A measurable set C ∈ L (B) ⊗ Leb[0, ∞[, where B is the internal Borel-algebra on FT , is called (br )r ∈[0,∞[-adapted if for each r ∈ [0, ∞[ the section C(·, r ) := X ∈ FT | (X, r ) ∈ C ∈ br . Let A be the σ-algebra of (br )r ∈[0,∞[ -adapted sets. A process f , defined on FT × [0, ∞[, is called (br )r ∈[0,∞[ -adapted if f is A-measurable. If the predictable or adapted sets are augmented by the L ⊗ λ-nullsets, then for each process f , measurable with respect to the extended σ-algebra, there exists a process, measurable with respect to the coarser σ-algebra, such that f = g L ⊗ λa.e. (See Proposition 5.1.2 in [36]). The proof of the following lifting result uses the equivalence of “adapted” and “predictable” for the filtration (br )r ∈[0,∞[ (see Corollary 5.4.2 in [36]). It is an application of Corollary 6.3.11. Let us prove the theorem in detail, because it is an example of the usefulness of that corollary. Theorem 7.2.10 Let ϕ : FT × [0, ∞[→ H be (br )r ∈[0,∞[ -adapted and (locally) in L 2 ( L ⊗ λ, H). Then ϕ has a (Bt − )t∈T -adapted lifting : FT × T → F (locally) in SL2 ( ⊗ ν, F).
246
H. Osswald
Proof By the standard Corollary 5.4.2 in [36], we may assume that ϕ is (br )r ∈[0,∞[ predictable. In order to prove the result for the locally L 2 -space, fix σ ∈ N and let L σ be the Hilbert space of (br )r ∈[0,σ] -predictable square integrable functions f : FT × [0, σ] → H. Let M be the set of all these functions having a (Bt − )t∈Tσ adapted lifting F: FT × Tσ → F ∈ SL2 ( ⊗ νσ , F). Obviously, M is a linear space. In order to prove that M is complete, fix a Cauchy sequence ( f n )n∈N in M and let Fn be a (Bt − )t∈Tσ -adapted lifting of f n in SL2 ( ⊗ νσ , F). Let G ∈ SL2 ( ⊗ νσ , F) be a lifting of g := limn→∞ f n in L 2 ( L ⊗ λ). Then we obtain, using the shorthand ρ := ⊗ νσ ,
(a) ◦ FT ×Tσ Fn − G2F dρ = FT ×[0,σ] f n − g2H d L ⊗ λ →n→∞ 0,
(b) ◦ FT ×Tσ Fn − Fm 2F dρ = FT ×[0,σ] f n − f m 2H d L ⊗ λ →n,m→∞ 0. Part (a) implies that for all ε∈ R+ , (c) ◦ ρ {Fn − GF ≥ ε} = ◦ ρ ε12 Fn − G2F ≥ 1 ≤
1 ◦ 2 FT ×Tσ Fn − GF dρ →n→∞ 0. ε2 This transition (c) from “measure to integration” is sometimes called Tchebychev inequality. It follows that there exists a strictly monotone increasing h: N → N such that for all k ∈ N and for all n ≥ h(k)
(i) FT ×Tσ Fn − G2F d ⊗ νσ < k1 , 2
(ii) FT ×Tσ Fn − Fh(k) F d ⊗ νσ < k1 , (iii) ⊗ νσ Fn − GF ≥ k1 < k1 . By Theorem 2.10.18, there exists an internal extension (Fn )n∈ ∗ N of (Fn )n∈N such that all Fn are (Bt − )t∈Tσ -adapted. By the Spillover Principle, for all k ∈ N there exists an unlimited n k ∈ ∗ N such that (i), (ii), (iii) are true for all unlimited n ≤ n k . There exists an unlimited N ≤ n k for all k ∈ N. Set F := FN . Then (i), (ii), (iii) are true for all n ∈ N, when we replace Fn by F. By (ii) and Corollary 6.3.3 (γ), F ∈ SL2 ( ⊗ νσ ). By (iii), F is a lifting of f . Since F is (Bt − )t∈Tσ -adapted, M is complete. To prove that M = L σ , let B×]r, u] be a predictable rectangle with B ∈ br and r < A ∈ Bs − of u ∈ [0, σ]. Then there exists an s r, s ∈ Tσ , and a -approximation B. Let v u, v ∈ Tσ . Set Y := A × (]s, v] ∩ Tσ ) and X := B × st −1 ]r, u] ∩ Tσ . Then X Y ⊆ (AB × Tσ ) ∪ FT × (r ∪ u) , is a ( ⊗ ν) L - nullset. Recall that r = {t ∈ Tσ | t r }. It follows that ∗ a ·1Y belongs to SL2 ( ⊗ ν, F) and is a (Bt − )t∈Tσ -adapted lifting of a · 1 B×]r,u] for a ∈ H, thus, a · 1 B×]r,u] ∈ M. In the same way one can see that a · 1 B×[0,u] ∈ M if B ∈ b0 . By the standard Proposition 5.6.1 in [36], M = L σ . It follows that 1FT ×[0,σ] · ϕ has a (Bt − )t∈Tσ -adapted lifting σ in SL2 ( ⊗ νσ , F) for all σ ∈ N. We may assume that σ (a) = σ+1 (a) for all a ∈ FT × Tσ and that σ (a) = 0 for a ∈ / FT × Tσ . It follows that σ is (Bt − )t∈T -adapted.
7 Stochastic Analysis
247
Let (σ )σ∈ ∗ N be an internal extension of (σ )σ∈N such that all σ are (Bt − )t∈T adapted. Then there exists an unlimited N ∈ ∗ N such that for all σ ∈ ∗ N, σ ≤ N , σ = 1FT ×Tσ · N . Note that 1FT ×Tσ · N is a lifting of ϕ locally in SL2 ( ⊗ νσ , F) for all unlimited σ ∈ ∗ N, σ ≤ N . This proves that, if ϕ : FT × [0, ∞[→ H is (br )r ∈[0,∞[ -predictable and locally in L 2 ( L ⊗ λ, H), then ϕ has a (Bt − )t∈T adapted lifting locally in SL2 ( ⊗ ν, F). Now assume that ϕ ∈ L 2 ( L ⊗ λ, H) and (br )r ∈[0,∞[ -predictable. By Theorem 6.3.10 and Corollary 6.3.22, ϕ has a lifting ∈ SL2 ( ⊗ ν, F). Since for σ ∈ N, 2 ◦ 1 T F ×Tσ N − F d ⊗ ν = FT ×T
FT ×[0,∞[
2 FT ×[0,σ] ϕ − ϕ H d L
1
⊗ λ →σ→∞ 0,
2 1FT ×TN∞ · N − d ⊗ ν 0 for some unlimited N∞ ∈ ∗ N, N∞ ≤ F N . Set := 1FT ×TN∞ N and note that is a (Bt − )t∈T -adapted lifting of ϕ in SL2 ( ⊗ ν, F) (use the triangle inequality for L 2 -spaces).
FT ×T
7.2.4 The Standard Itô Integral Fix a (B t − )t∈T -adapted locally in SL2 ( ⊗ ν, F).
By Theorem 7.2.7, we may convert B to a continuous stochastic process d B on the timeline [0, ∞ [: define for L -almost all X ∈ FT and all limited t ∈ T , d B: FT × [0, ∞[ (X, ◦ t) → ◦ B(X, t) . Now fix ϕ and as in Theorem 7.2.10. We define ϕd B := d B .
This process ϕd B is called the Itô integral of ϕ. One has to prove that ϕd B is well defined L -a.s., i.e., it does not depend on the chosen lifting. This follows from: Lemma 7.2.11 Suppose that is locally in SL2 ( ⊗ ν, F) and a lifting of the 0function. Then L -a.s., B 0 for all limited t ∈ T . t
248
H. Osswald
Proof Fix σ ∈ N and an othonormal basis E of F. By Lemma 7.2.2 and Doob’s inequality, we obtain E max t∈Tσ
⎛ 4 · E⎝
2 t
s , e ·s , e⎠ = 4 · E
s∈Tσ ,e∈E
4E
B
⎞2
2
≤4·E
B
σ
=
s , e2 ·s , ei 2 =
s∈Tσ ,e∈E
s , e2 EBs − ·s , e2 = 4E
s 2 dν(s) 0. Tσ
s∈Tσ ,e∈E
By Lemma 6.3.1 (b), maxt∈Tσ for all limited t ∈ T L -a.s.
B
2 t
0 L -a.s. Therefore,
B
2 t
0
We want to define the Itô integral as a random variable for (br )r ∈[0,∞[ -adapted processes ϕ : FT × [0, ∞[→ H ∈ L 2 ( L ⊗ λ, H): If : FT × T → F is a (Bt − )t∈T -adapted lifting of ϕ in SL2 ( ⊗ ν, F), define
V
◦
ϕd B: FT → R, X →
(X, s), X s .
s∈T
V Note that ϕd B is L -a.s. well defined. Now suppose that ϕ : [0, ∞[→ H ∈ L 2 (λ, H), thus, ϕ is deterministic. We set I (ϕ) :=
V
ϕd B
and call I (ϕ) the Wiener integral of ϕ. For internal : T → F define I () :=
(s), X s .
s∈T
I () is called the internal Wiener integral of .
7.2.5 Integrability of the Itô Integral Fix a (br )r ∈[0,∞[ -adapted ϕ locally in L 2 ( L ⊗ λ, H) and a (Bt − )t∈T adapted lifting locally in SL2 ( ⊗ ν, F), according to Theorem 7.2.10.
7 Stochastic Analysis
249
Theorem 7.2.12 Fix σ ∈ N. (a) The Itô integral of ϕ is a continuous (br )r ∈[0,∞[ -martingale, with E sup
2 < ∞.
ϕd B
r ∈[0,σ]
r
(b) Suppose that (ϕk )k∈N is a sequence of (br )r ∈[0,∞[ -adapted functions ϕk : FT × [0, σ] → H, converging to ϕ : FT × [0, σ] → H in L 2 ( L ⊗ λ, H). Then the Itô integral of ϕ exists and lim E sup
k→∞
2 (ϕk − ϕ) d B
r ∈[0,σ]
= 0. r
(c) Suppose that (ϕk )k∈N is a sequence of (br )r ∈[0,∞[ -adapted functions ϕk : FT × [0, ∞[→ H, converging to ϕ : FT × [0, ∞[→ H in L 2 ( L ⊗ λ, H). Then the Itô integral of ϕ exists and
V
lim E
k→∞
2 (ϕk − ϕ) d B
Proof (a) We have already seen that Proposition 7.2.8, E sup
ϕd B
2
is continuous L -a.s. By
E max
ϕd B
r ∈[0,σ]
= 0.
B
t∈Tσ
r
2 is limited. t
B t L -a.s for limited t ∈ T and B t is ϕd B ◦ t Since
Bt -measurable, ϕd B ◦ t is Bt ∨ N L -measurable, thus ϕd B ◦ t is b◦ t measurable. To prove the martingale property, fix r < u in [0, ∞[ and B ∈ br . Then there exists an s ∈ T , s r , and a -approximation A ∈ Bs of B. Let t u. By Propositions 7.2.8, 7.2.3 (b) and Theorem 6.3.4, B
◦
d L =
ϕd B u
d L
B
A
d s
B
d =
B A
t
B A
t
d L .
ϕd B r
It follows that Ebr ϕd B u = ϕd B r . (b) Assume that k : FT × T σ → F ∈ SL2 ( ⊗ νσ , F) is a (Bt − )t∈T -adapted lifting of ϕk . Then limk,l→∞ ◦ FT ×Tσ k − l 2 d ⊗ ν = 0 and, as in the proof of Lemma 7.2.11, we see that
250
H. Osswald
lim
k,l→∞
◦
E max
2 (k − l ) B
t∈Tσ
= 0. t
By Corollary 6.3.11, there is a (Bt − )t∈T -adapted function : FT × Tσ → F in ⊗ νσ , F) such that
(i) limk→∞ ◦ FT ×Tσ k − 2 d ⊗ ν = 0 and
2 (ii) limk→∞ ◦ E maxt∈Tσ (k − )B t = 0. SL2 (
Use the proof of “(a) ⇒ (c)” in the proof of Theorem 7.2.10 to see that is a lifting of ϕ and E sup
r ∈[0,σ]
2 (ϕk − ϕ) d B
◦
= E max r
t∈Tσ
2 (k − ) B
→k→∞ 0. t
The proof of (c) is left to the reader.
7.2.6 The Wiener Measure We take the σ-algebra, generated by the Lévy and Wiener integrals, as a basis for the chaos expansion. First we study the Wiener case. Therefore, recall the notation and the results of Sect. 7.1. Let CB be the Fréchet space of continuous functions from [0, ∞[ into B. In the introduction, Example (2), we have mentioned that the function κ: FT → CB , X → B (X, ·) is a surjective mapping from FT onto CB . The Borel algebra on the Banach space B is generated by the cylinder sets of the form {a ∈ B | ϕ(a) < c}, where ϕ belongs to the topological dual of B and c ∈ R. The Borel σ-algebra B (CB ) is generated by sets of the form { f ∈ CB | f (r ) ∈ D} where r ∈ [0, ∞[ and D is a cylinder set in B. Let WCB the σ-algebra on FT generated by κ, augmented by the L -nullsets, i.e., WCB = κ−1 [B] | B is a Borel set in CB ∨ N L . By Proposition 11.8.6 in [36], WCB is generated by the real random variables ◦ a, Bt where a ∈ H and B is the fixed internal process, defined in the introduction. We may identify a ∈ H with ∗ a ∈ F. Therefore, WCB does not depend on B, WCB only depends on the Hilbert space part H of the abstract Wiener space (H, B). Therefore we may define WCH := WCB .
7 Stochastic Analysis
251
The image measure W := WCB of L by κ is called the Wiener measure on B (CB ). The preceding results confirm the fact that the Hilbert space part of an abstract Wiener is the most important part. Since κ is a measure preserving bijection from FT onto CB , where we take the σ-algebra WCH on FT and the Borel σ-algebra on CB we may identify the L p spaces p p L WC ( L ) and L B(CB ) (W ), because they are canonically isometric isomorphic. The H
p
p
same holds for L WC ⊗Ln ( L ⊗ (ν n ) L ) and L B(CB )⊗Leb[0,∞[n (W ⊗ λn ). However, H p p the space L L T ( L ) is much bigger than L WC ( L ). B (F ) H
Proposition 7.2.13 Let J be the set of Lebesgue-measurable functions f : [0, ∞[→ H ∈ L 2 (λ, H) with compact support. Then WCH = {I ( f ) | f ∈ J} ∨ N L . Proof We first prove “⊆”: Recall that the sets {ϕ ◦ B (·, r ) < c} generate WCH , where ϕ ∈ B ⊆ H, r ∈ [0, ∞[ and c ∈ R. Set g(s) := 1[0, r ] (s) · ϕ. Let t ∈ T with t r and define G(s) := 1Tt (s) · ∗ ϕ. Then G: T → F is a lifting of g in SL(νt , F) and g ∈ J. Now L -a.s., I (g) I (G) =
∗
ϕ, ·s = ∗ ϕ, Bt ϕ ◦ B (·, r ).
s≤t
For the reverse inclusion we prove that each I (g) with g ∈ J and r ∈ [0, ∞[ is WCH -measurable: Since g is the limit in L 2 (λ, H) of simple functions and the Borel sets of [0, ∞[ are generated by sets of the form [0, r ], it suffices to prove that I (g) is WCH -measurable for each g with g(s) = 1[0,r ] (s) · ϕ, where ϕ ∈ H. Since B is dense in H under the Hilbert space norm · on H, ϕ ∈ H can be approximated by functions in B and since, by Theorem 7.2.12 (b), I is continuous, it suffices to prove that I (g) is WCH -measurable for each g with g(s) = 1[0,r ] (s) · ϕ where ϕ ∈ B . But we have already seen that I (g) = ϕ ◦ B (·, r ) L -a.s. Problems
(1) Prove that B is an internal (Bt )t∈T -martingale. (2) Prove that the function F, defined in Remark 7.2.1, is not a lifting of a standard function f : [0, ∞[→ R. (3) Suppose that (ϕk )k∈N is a sequence of (bt )t∈[0,∞[ -adapted functions ϕk : FT × [0, ∞[→ H, converging to ϕ : FT × [0, ∞[→ H in L 2 ( L ⊗ λ, H). Prove that the Itô integral of ϕ exists and lim E
k→∞
(4) Prove Lemma 7.2.5.
V
2 (ϕk − ϕ) d B
= 0.
252
H. Osswald
7.3 The Iterated Integral Following [36] to a large extend, we introduce the iterated Itô integral as the standard part of an internal iterated Itô integral, which can be easily defined. Methods, in order to construct standard objects from internal ones, are sometimes called pushing down techniques. Recall that, vice versa, lifting results construct internal objects from standard ones. The constructions of Brownian motion and the Itô integral and the construction of martingales, defined on the continuous time line, from internal martingales on the discrete time line T in Chap. 6 are examples of pushing down results. Now we are going to push down the internal iterated integral to the standard iterated Itô integral, according to the construction of the Itô integral in Sect. 7.2. We will later see that each functional in L 2 (WCB ) can be written as an orthogonal series of iterated Itô integrals. This result is the key in order to introduce the Malliavin calculus. In the following we use the notation of Sect. 6.3.7.
7.3.1 The Definition of the Iterated Integral First of all we introduce a mixture of deterministic and random functions in connection with iterated integrals: Fix an internal F : T n+m → F⊗(n+m) with n, m ∈ N0 . Define In,m (F): FT × T m → F⊗m , setting In,m (F)(X, s) :=
Ft,s (X t1 , . . . , X tn , ·) =
t∈T
F(t, s, X t1 , . . . , X tn , ·),
t∈T
Fix an orthonormal basis E of F. The following characterization of In,m (F)(X, s) will be often used:
Ft,s (e1 , . . . , en , ·) · X t1 , e1 · . . . · X tn , en . In,m (F)(X, s) = t∈T
If n = 0, then In,m (F) = F. If n = m = 0, then F ∈ ∗ R. The following result is an application of Lemma 7.2.2: Proposition 7.3.1 Fix internal functions F : T n+m → F⊗(n+m) and G: T k+m → F⊗(k+m) . Then 0 if n = k In,m (F), Ik,m (G) d ⊗ ν m =
n+m if n = k. F, Gdν n m T m T ×T F ×T <
The function In,m (F) is called the internal iterated integral of F with m parameters. If m = 0, then we write In (F) instead of In,0 (F) and call In (F) the
7 Stochastic Analysis
253
internal iterated integral of F. Recall from Sect. 7.2 that for n = 1 we have I (F) := I1 (F) = FB (·, H ). show that all standard moments of In (F): FT → ∗ R are limited if
Now 2we will n T
s
≤ (s!!)
n+1
= 0,
2 1 t∈T
s
2
if s is odd ,
.
if s is even
Proof Fix an internal orthonormal basis (ei )i∈ω of F. We use the following shortn n hand: Let t ∈ T < and i ∈ ω . Set Ft (ei ) = Ft1 ,...,tn (ei1 , . . . , einn ) andn X t , ei = X t1 , ei1 · · · · · X tn , ein . Let be the lexicographic order on T< × ω . We apply Lemma 7.2.2, in particular Part (d), and the hyperfinite-dimensional binomial formula (the corresponding standard result can be found in the book of Heuser [18] page 69): ⎛ ⎞s
X t , ei · Ft (ei )⎠ = E (In (F))s = E ⎝ (t,i)∈T
k=1 m 1 ,...,m k ∈N, (t 1 ,i 1 )...(t k ,i k )∈T
Ft 1 (ei 1 )
m 1
s! · m1! · · · · · mk !
m m m · · · · · Ft k (ei k ) k · E X t 1 , ei 1 1 · · · · · X t k , ei k k .
We compute the expected value in the preceding equality. It is 0 if one of the m j is odd. So we may assume that all m j are even, thus also s is even. Using the fact that m m X t 1 , ei 1 1 , . . . , X t k , ei k k are independent if the t j , e j are pairwise different, we obtain, E (In (F))s ≤ s
k=1 m 1 ,...,m k ∈2N, (t 1 ,i 1 )...(t k ,i k )∈T
s! (s!!)n m1! . . . mk !
1 H
m1 2
· ··· ·
1 H
mk 2
n ·
254
H. Osswald
s n 2
1 H
s 2
Ft 1 (ei 1 )
m · · · · · Ft k (ei k ) k =
k=1 m 1 ,...m k ∈N, (t 1 ,i 1 )...(t k ,i k )∈T
m 1
s n 2
s!!
Ft21 (ei 1 )
m 1
s! ((s!!))n · (2m 1 )! . . . (2m k )!
m k . . . Ft2k (ei k ) ≤
s!! 2s ! · m1! . . . mk !
s
H2
k=1 m 1 ,...,m k ∈N, (t 1 ,i 1 )...(t k ,i k )∈T
s
n+1
(s!!)
2
Ft21 (ei 1 )
m 1
m k · · · · · Ft2k (ei k ) ≤
k=1 m 1 ,...,m k ∈N, (t 1 ,i 1 )...(t k ,i k )∈T
⎛ (s!!)n+1 ⎝
Ft21 (ei 1 )
(t,i)∈T
m 1
s 2!
m1! . . . mk !
n
1 s
H2
·
m k · · · · · Ft2k (ei k ) =
⎞s s 2 2 1 Ft 2 dν n Ft2 (ei ) n ⎠ = (s!!)n+1 . H T
Corollary 7.3.4 Assume that In (F) ∈ SL p ().
T
Ft 2 dν n is limited. Then for all p ∈ [1, ∞[,
In order to define the iterated integral with parameters, Theorem 7.3.6 below is crucial, where we use the following lemma, which is a straightforward application of Jensen’s inequality. Lemma 7.3.5 Fix D: FT × T → F ∈ SL2 ( ⊗ ν, F). Then A : (·, s) → EBs − D(·, s) ∈ SL2 ( ⊗ ν, F). Moreover, the close relationship between Loeb and Lebesgue measure is used. Recall the definition of Lm ⊆ L ν m in Sect. 6.2.6. The elements in L 2Lm ( ⊗ ν m ) L , H⊗m can be identified with the elements in L 2 L ⊗ λm , H⊗m . Recall Corollary 6.3.22, where we have shown that liftings of H⊗d -valued functions may be functions with values in F⊗d .
7 Stochastic Analysis
255
Theorem 7.3.6 Fix f : [0, ∞[n+m → H⊗(n+m) ∈ L 2 λn+m , H⊗(n+m) . Let F : T n+m → F⊗(n+m) ∈ SL2 ν n+m , F⊗(n+m) be a lifting of f (see Theorem 6.3.10). (a) Then In,m (F) ∈ SL2 ⊗ ν m , F⊗m and the standard part ◦ In,m (F) exists in L 2 L ⊗ λm , H⊗m = L 2Lm ( ⊗ ν m ) L , H⊗m . (b) Let m = 1. Then t → EBt − In,1 (F)(·, t) ∈ SL2 ( ⊗ ν, F). Proof (a) To save indices, let n = m = 1. Let M be the set of all g ∈ L 2 λ2 , H⊗2 such that there exists a lifting G ∈ SL2 ν 2 , F⊗2 of g with I1,1 (G) ∈ SL2 ( ⊗ ν, F) and ◦ I1,1 (G) exists in L 2 ( L ⊗ λ, H). Then M is a linear space and complete, which is an application of Theorem 6.3.10 and Corollary 6.3.11 and Proposition 7.3.1 see also Theorem 7.5.5 later. To prove that M = L 2 λ2 , H⊗2 , let B1 , B2 be bounded Lebesgue-measurable subsets of [0, ∞[ and a ∈ H2 . By Theorems 6.2.9 and 6.2.5, there exist limited internal subsets A1 , A2 of T with ν L (Ai st −1 [Bi ]) = 0. Set A := A1 × A2 , B := B1 × B2 , ∗ a := (∗ a1 , ∗ a2 ). Then G := 1 A ⊗ [∗ a] : T 2 → F⊗2 , (t, b) → 1 A (t) ·
∗
a1 , b1 · ∗ a2 , b2
is in SL2 (ν 2 , F⊗2 ) and a lifting of g := 1 B ⊗ [a] ∈ L 2 λ2 , H⊗2 . Moreover, I1,1 (G)(X, s) = I1 (1 A1 ⊗
∗ a1 ) ⊗ (1 A2 (s) ⊗ ∗ a2 )(X, ·).
By Corollary 7.3.4, I1 (1 A1 ⊗ ∗ a1 ) ∈ SL2 () and, therefore, its standard part exists in L 2 ( L ). Moreover, 1 A2 ⊗ ∗ a2 is in SL2 (ν, F) and is a lifting of 1 B2 ⊗ [a2 ]. By 2 ( ⊗ ν, F) and its standard part exists. Since M is Lemma 6.3.16 (b), I1,1 (G) 2∈ SL 2 ⊗2 a linear subspace of L λ , H , closed under simple functions with measurable rectangles, M = L 2 λ2 , H⊗2 (see the standard Proposition 5.6.1 in [36]). Now fix a lifting G ∈ SL2 ν 2 , F⊗2 of f with I1,1 (G) ∈ SL2 ( ⊗ ν, F) and ◦ I (G) exists in L 2 ( ⊗ λ, H). By Proposition 7.3.1, 1,1 L FT ×T
I1,1 (F − G)2 d ⊗ ν =
T2
F − G2 dν 2 0.
Part (a) follows. Part (b) follows from Part (a) and Lemma 7.3.5. Now fix f ∈ L 2 λn+m , H⊗(n+m) and a lifting F ∈ SL2 ν n+m , F⊗(n+m) and define In,m ( f ) = In,m (◦ F) :=
◦
In,m (F): FT × [0, ∞[m → H⊗m ∈ L 2 L ⊗ λm , H⊗m .
By Theorem 7.3.6, In,m ( f ) is well defined. From Proposition 7.3.1 we obtain, using liftings of f and g and the simple fact that a, bHm = S-square-integrable 1 2 2 2 2 aHm + bHm − a − bHm ,
256
H. Osswald
Proposition 7.3.7 Fix f ∈ L 2 λn+m , H⊗(n+m) and g ∈ L 2 λk+m , H⊗(k+m) Then In,m ( f ), Ik,m (g) Hm d L ⊗ λm = FT ×[0,∞[m
[0,∞[n≤ ×[0,∞[m
0 f, gHn+m dλn+m
if n = k if n = k.
Corollary 7.3.8 The integral with m parameters is a continuous operator, iterated which means, if f j j∈N is a sequence, converging to f in the space L 2 λn+m , H⊗(n+m) , then lim In,m ( f j ) = In,m ( f ) in L 2 L ⊗ λm , H⊗m .
j→∞
The function In,m ( f ) is called the iterated integral with m parameters. Recall that this process is equivalent to a process g : FT × T m → H⊗m ∈ L 2L (B)⊗Lm ( ⊗ ν m ) L , H⊗m , which is also denoted by In,m ( f ). Now assume that m = 0, i.e., f ∈ L 2 λn , H⊗n . By Theorem 6.3.10 f has a lifting F : T n → F⊗n in SL2 ν n , F⊗n . Then we set In ( f ) = In
◦ F :=
◦
In (F)
and call In ( f ) the n-fold iterated Itô integral of f . Note that it is well defined L -a.s.
7.3.2 On Products of Iterated Integrals In this section fix an internal symmetric function F : T=n → F⊗n and G : T → F. We use the following common notation. If K : T=n+1 → F⊗(n+1) is internal and symmetric in the first n arguments, define K (t1 ,...,tn+1 ) (a1 , . . . , an+1 ) := n+1
j=1
K (t1 ,...,t j−1 ,t j+1 ,...,tn+1 t j ,) (a1 , . . . , a j−1 , a j+1 , . . . , an+1 , a j ).
7 Stochastic Analysis
257
Then K is symmetric. Define
T
Fs , G s dν(s) : T=n−1 → F⊗(n−1) , setting
Fs , G s dν(s) T
T
(a1 , . . . , an−1 ) :=
(t1 ,...,tn−1 )
F(t1 ,...,tn−1, s) (a1 , . . . , an−1 , ·) , G s (·) dν(s) =
T i∈ω
F(t1 ,...,tn−1, s) (a1 , . . . , an−1 , ei ) · G s (ei )dν(s),
where (ei )i∈ω is an internal orthonormal basis of F. Theorem 7.3.9 Assume that F ∈ SL2 ν n , F⊗n is symmetric and G belongs to SL2 (ν, F). Then In (F) · I (G) In+1 ( F ⊗ G) + In−1
Fs , G s dν(s)
in L 2 () .
(+)
T
In particular, for F:= G ⊗n := G ⊗ · · · ⊗ G (n-times) we have: In (G
⊗n
) · I1 (G) (n + 1)In+1 (G
⊗(n+1)
) + In−1 (G
⊗(n−1)
G dν .
)·
2
T
Proof Since the proof is quite technical, we recommend, to study first the case n = 2. We may assume that F: T=n → F⊗n . Fix S ∈ N and set F S := F TSn and G S := G TS . Note that In (F )(X ) · I (G )(X ) = A + S
S
n
j=1
where A=
Bj +
n
Cj,
j=1
FtS (X t1 , . . . , X tn )G sS (X s ) =
t∈T
S ⊗ G S ). FtS1 ,...,[t j ],...tn+1 (X t1 , . . . , [X t j ], . . . X tn+1 )G tSj (X t j ) = In+1 ( F
j=1 t∈T
Bj =
(t,i)∈T
Note that
X t1 , ei1 · · · X tn , ein FtS (ei1 , . . . , ein ) X t j , ek G tSj (ek ).
258
H. Osswald
E B 2j ≤
S 2 S 2 Ft · G t j t∈T
2 2 S S Ft · G s
{(t,s)∈T
1 H n+1
=
1 0, H n+1
because (t, s) ∈ T
S
Cj =
F
s F
S
2 X t1 , ei1 · · · X t j , ei j · · · X tn , ein FtS (ei1 , . . . , ein )G tSj (ei j ).
t∈T
Define "j = C
t∈T
Since E
X s , ei 2 −
1 X t1 , ei1 . . . . . . X tn , ein FtS (ei1 , . . . , ein )G tSj (ei j ). H
1 2 H
=
2 , H2
"j 2 0. Since F S is we have E C j − C
symmetric, we see that in L 2 () n
j=1
Cj
n
C j = In−1
# T
j=1
$ FsS , G sS dν(s) .
It follows that Equation (+) is true for F := F S and G = G S with S ∈ N. Using the Spillover principle, there exists an unlimited S ∈ ∗ N such that Equation (+) is true for F = F S and G = G S . Using the Cauchy-Schwarz inequality, Theorem 7.3.3, the S-square integrability of F and G and finite combinatorics, we obtain 2 E In (F) · I (G) − In (F S ) · I (G S 0, 2 S ⊗ GS) ⊗ G) − In+1 ( F 0, E In+1 ( F E In−1
T
Fs , G s F dν(s) − In−1
# T
FsS , G sS
2
$ F
dν(s)
0.
The proof of Equation (+) is finished. It should be mentioned that we need the preceding result only in the case F = F S and G = G S. with S ∈ N.
7 Stochastic Analysis
259
Using Theorem 7.3.9 and the standard part map, one can prove by induction on the degree of polynomials the following crucial result: Corollary 7.3.10 Fix a polynomial p in R and S ∈ N. (a) Suppose that G : TS → F ∈ SL2 (ν S , F). Then p(I (G)) is infinitely close in L 2 () to a standard finite linear combination of internal iterated integrals of the form In (G ⊗n ) with limited scalars. (b) Suppose that g ∈ L 2 (λ, H). Then p(I (g)) is a linear combination of iterated Itô integrals, where the integrands are tensor products of g.
7.3.3 The Continuity of the Standard Iterated Integral Process Now we study the n-fold iterated Itô integral as a continuous stochastic process. Fix an internal F: T n → F⊗n . The internal process InM (F): FT × T → ∗ R, defined by InM (F)(X, s) =
F(t1 ,...,tn ) (X t1 , . . . , X tn )
t1 <···
is called the internal iterated integral process of F. Note that InM (F) is an internal stochastic integral with (Bt − )t∈T -adapted integrands: InM (F)(X, s) =
r ≤s
⎛ ⎝
r → EBr − In−1,1 (F)(·, r )B (X, s) = ⎞ Ft1 ,...,tn−1 ,r (X t1 , . . . , X tn−1 , ·)⎠ (X r ).
t1 <···
By Theorem 7.3.6, Proposition 7.2.8 and Theorem 7.2.7, we obtain: Theorem 7.3.11 Let f be locally in L 2 λn , H⊗n and let F be a lifting of f locally in SL2 ν n , F⊗n . Then In−1,1 (F) and r → EBr − In−1,1 (F)(·, r ) are locally in SL2 ( ⊗ ν, F) and for all σ ∈ N, max InM (F)(X, s) ∈ SL2 () , InM (F) ∗ RT × Tσ ∈ SL2 ( ⊗ ν) . s∈Tσ
Moreover, InM (F) is S-continuous L -a.s. Fix f, F as in Theorem 7.3.11. Then, by Theorem 7.2.12, the process InM ( f ): × [0, ∞[→ R, defined for all limited t by
∗ RT
InM ( f )(·,◦ t) :=
◦
InM (F)(·, t)
260
H. Osswald
is a continuous square integrable (br )r ∈[0,∞[ -martingale. The result is an extension of a result due to Bouleau and Hirsch [6] for the classical Wiener space. The process InM ( f ) is called the continuous iterated integral of f . Using suitable S-square integrable liftings and Proposition 7.3.1 we obtain: Corollary 7.3.12 Fix r ∈ [0, ∞[ and f locally in L 2 λn , H⊗n and g locally in L 2 λm , H⊗m . Then, E
InM
f (·, r ) ·
ImM g
[0,r ]n
(·, r ) =
f s , gs 2 dλn (s), 0
if m = n . if m = n
7.3.4 The WCH -Measurability of the Iterated Itô Integral Fix r ∈ N and f ∈ L 2 λrn , H⊗n , the space of square integrable functions with support [0, r ]n . Let F be a lifting of f in SL2 νrn , F⊗n . We will now see that In ( f ) is WCH -measurable. A modified iterated Itô integral Jn (F) : ∗ RT → ∗ R is defined by setting:
F(t1 ,...,tn ) (X t1 , . . . , X tn ). Jn (F)(X ) := (t1 ,...,tn )∈T=n
Set F(tσ1 ,...,tn ) (X t1 , . . . , X tn ) := F(tσ1 ,...,tσn ) (X tσ1 , . . . , X tσn ), where σ is a permuta σ tion of {1, . . . , n}. Note that Jn (F)(X ) = In σ F , where σ runs through all permutations of {1, . . . , n}. We obtain results which are similar to the results for In ( f ), in particular, Jn (F) ∈ SL2 () and E (Jn (F − G))2 0 if F Fn G in L 2 (ν n ). Therefore, we may set Jn ( f ) := ◦ Jn (F). Moreover, Jn is a continuous operator and we obtain the following recursion formula for J: Lemma 7.3.13 Fix F ∈ SL2 (νrn , F⊗n ) and G ∈ SL2 (νr , F), where r is limited. Then in L 2 () Jn+1 (F ⊗ G) Jn (F) · J (G) −
n
# Jn−1
F (·, t), G(t) i
Tr
i=1
$ F
dν(t) ,
where F(ti 1 ,...,tn−1 ,t) (a1 , . . . , an−1 , a) := F(t1 ,...,ti−1 ,t,ti ,...,tn−1 ) (a1 , . . . , ai−1 , a, ai , . . . , an−1 ). Using this result and suitable liftings of f ∈ L 2 (λrn , H⊗n ) and g ∈ L 2 (λr , H) we obtain Jn+1 ( f ⊗ g) = Jn ( f ) · J (g) −
n
i=1
# Jn−1 T
f i (·, t), g(t)
$ H
dν L (t) in L 2 ( L ) .
7 Stochastic Analysis
261
Proposition 7.3.14 Jn ( f ) is WCH -measurable. Proof Fix r ∈ [0, ∞[ and let M be the set of functions f ∈ L 2 (λrn , H⊗n ) such that Jn ( f ) is WCH -measurable. Since Jn is a continuous operator on L 2 (λrn , H⊗n ), M is a complete linear space. In order to prove that M = L 2 (λrn , H⊗n ), by Problem (4) it is sufficient to prove that f ∈ M for functions f of the following form: f = 1 B ⊗[h 1 , . . . , h n ]: [0, r ]n → H⊗n with f (t1 ,...,tn ) (a1 , . . . , an ) = 1 B (t1 , . . . , tn )· h 1 , a1 · · · · · h n , an , where B = B1 × · · · × Bn with Bi ∈ Leb [0, r ] and h 1 , . . . , h n ∈ H. We prove this result by induction on n: For n = 1 it is true by Proposition 7.2.13. We set for i = 1, . . . , n 1 B1 ×···[Bi ]···×Bn : (t1 , . . . , tn−1 ) → 1 B1 (t1 ) . . . 1 Bi−1 (ti−1 )1 Bi+1 (ti ) . . . 1 Bn (tn−1 ) and [h 1 , . . . , [h i ] , . . . , h n ] : (a1 , . . . , an−1 ) → h 1 (a1 ) . . . h i−1 (ai−1 )h i+1 (ai ) . . . h n (an−1 ). We obtain
Jn+1 ( f ) = Jn+1 (1 B1 ×···×Bn+1 ⊗ h 1 , . . . , h n+1 ) = Jn (1 B1 ×···×Bn ⊗ [h 1 , . . . , h n ]) · J1 (1 Bn+1 ⊗ [h n+1 ])− n
Jn−1 1 B1 ×···[Bi ]···×Bn ⊗ [h 1 , . . . , [h i ] , . . . , h n ] ·
i=1
[0,r ]
1 Bi (t) · 1 Bn+1 (t) · h i , h n+1 H dλ(t)
is WCH -measurable by the induction hypothesis. It follows that Jn ( f ) is WCH measurable for all f ∈ L 2 (λrn , H⊗n ), thus, for all f ∈ L 2 (λn , H⊗n ). Corollary 7.3.15 In ( f ) belongs to L 2WC ( L ). H Proof Since in In ( f ) only increasing n-tuples are involved, we may assume that f 1 Jn ( f ) ∈ L 2WC ( L ). is symmetric. Now In ( f ) = n! H
Using the proof of Theorem 7.3.6 and Corollary 7.3.15, we obtain a measurability result for iterated integrals with parameters: Corollary 7.3.16 Fix f : [0, ∞[n+m → H⊗(n+m) in L 2 λn+m , H⊗(n+m) . Then In,m ( f ) is in L 2WC ⊗Lebm L ⊗ λm , H⊗m . Equivalently, we obtain for the time H line T : If F is a lifting of f , then ◦
In,m (F) : (X, t) → In,m ( f ) ·,◦ t ∈ L 2WC ⊗Lm ⊗ ν m L , H⊗m . H
262
H. Osswald
7.3.5 InM ( f ) is a Continuous Version of the Standard Part of InM (F) In the preceding section we have constructed a continuous (br )r ∈[0,∞[ -martingale InM ( f ) from InM (F), where F is locally in SL2 (ν n , F⊗n ) and is a lifting of an f , locally in L 2 (λn , H⊗n ). Since InM (F) is a (Bt )t∈T -martingale and since, by Theorem 7.3.11, InM (F) (·, t) ∈ SL2 () for all limited t, we also can take the standard part i nM ( f ) of InM (F), according to Corollary 6.4.11, to obtain a càdlàg (br )r ∈[0,∞[ martingale. Recall that i nM ( f )(·, r ) = ◦lim ◦ InM (F) (·, t) . t↓r
Let f, g be processes, defined on FT × [0, ∞[. Recall that f is called a version of g if for all r ∈ [0, ∞[, gr = fr L -a.s., where the exceptional nullset may depend on r . Theorem 7.3.17 (a) InM ( f ) is a continuous version of i nM ( f ). (b) Ebr (◦ In (F)) = InM ( f ) (·, r ) = i nM ( f )(·, r ) L -a.s. for r ∈ [0, ∞[. The exceptional nullset depends on r . Proof Fix r ∈ [0, ∞[. (a) By Theorem 6.4.10 (b), there exists a t ∈ T with t r such that InM (F) (·, t) is a lifting of i nM ( f )(·, r ). It follows that L -a.s., InM ( f ) (·, r ) =
◦ M In (F) (·, t)
= i nM ( f ) (·, r ) .
(b) Let (tk )k∈N be a decreasing sequence in the limited part of T such that limk→∞ ◦ tk = r and ◦ tk > r . By Theorem 6.2.15, br = k∈N Btk ∨ N L . By the Martingale Convergence Theorem, lim EBtk ∨N L
k→∞
◦
In (F) = Ebr ◦ In (F) .
On the other side, limk→∞ ◦ InM (F) (·, tk ) = i nM ( f )(·, r ) L -a.s. Now note that ◦ InM (F) (·, tk ) = ◦ EBtk (In (F)) = EBtk ∨N L (◦ In (F)) L -a.s. to finish the proof of (b). Instead of InM (F) (·, t) we could have taken InM (F) ·, t − and the filtration (Bt − )t∈T , leading to a slight modification of Theorem 7.3.11: Recall that we have − defined G( H1 ) := 0. Lemma 7.3.18 Fix a lifting F, locally in SL2 ν n , F⊗n , of some f , locally in L 2 λn , H⊗n . The mapping T t → InM (F)(·, t − ) is S-continuous L -a.s.
7 Stochastic Analysis
263
Proof We have to prove InM (F)(·, H1 ) 0 L -a.s. Since InM (F)(·, H1 ) = 0 for all n > 1, we may assume that n = 1. Now, since F is locally in SL2 (ν, F), 2
E I1M (F)(·, H1 ) = 1 F2 dν 0. This proves that InM (F)(·, H1 ) 0 L H
a.s.
7.3.6 Continuous Versions of Iterated Integral Processes In the preceding two sections we have extensively used the fact that the integrand F in InM (F) is a lifting of a standard function. Now we take an arbitrary F, locally in SL2 ν n , F⊗n . We will show that InM (F) has a continuous version. The techniques in this section can be used to obtain continuous versions of Skorokhod integral processes on finite chaos levels (see [36]). Define for all t ∈ T
α(t) :=
Ft
1 ,...,tn
t1 <···
2 1 n F Hn .
Since F is locally S-square integrable, and thus, α is S-continuous, we may define a continuous function ◦ α: [0, ∞[→ [0, ∞[ by setting ◦
α(◦ t) :=
◦
(α(t)) for limited t ∈ T .
By Theorem 7.3.3, E InM (F)2 (·, r ) is limited for all limited r ∈ T . Since InM (F) is a (Bt )t∈T martingale, we may apply Theorem 6.4.10 and Corollary 6.4.11, to obtain the standard part m = ◦ M of M := InM (F). Recall that m is a càdlàg (br )r ∈[0,∞[ -martingale. Our aim is to prove that m has a continuous version. By Theorem 6.4.10, for each r ∈ [0, ∞[ there exists an sr ∈ T, sr r, such that Msr is a lifting of m r . Moreover, there exists a tr ∈ T, tr r, such that Mtr is a lifting of m r − . Since Msr ∈ SL2 p (), 2 p for all p ∈ N. see Corollary 7.3.4, we have E L (m r − m r )2 p = ◦ E Msr − Msr Lemma 7.3.19 Fix limited r, r ∈ T with r < r and p ∈ N. Then E (Mr − Mr )2 p ≤ n 2 p ((2 p)!!)n+1 (αr − αr ) p . In particular, if r r , then E (Mr − Mr )2 p 0, thus, Mr Mr L -a.s. Proof By the triangle inequality in L 2 p (), n 1 1
2p 2p 2p E Ai ≤ , E (Mr − Mr )2 p i=1
264
H. Osswald
where Ai :=
2p
E Ai
t1 <···
⎛ ≤ ((2 p)!!)n+1 ⎝
Ft1 ,...,tn (X t1 , . . . , X tn ). By Theorem 7.3.3,
Ft
t1 <···
⎞p 2 1 ⎠ . 1 ,...,tn Fn Hn
This proves that E (Mr − Mr )2 p ≤ n 2 p ((2 p)!!)n+1 (αr − αr ) p . A process f : ∗ RT × [0, ∞[→ R is called continuous in probability in r ∈ [0, ∞[ if for each ε > 0 there exists a δ > 0 such that for all s ∈ [0, ∞[ with |s − r | < δ L {| f s − fr | ≥ ε} < ε. Corollary 7.3.20 (a) For all r < r in [0, ∞[ E L (m r − m r )2 p ≤ n 2 p ((2 p)!!)n+1
◦
αr − ◦ αr
p
.
(b) m has a continuous version. (c) m is continuous L -a.s. on each countable subset D ⊆ [0, ∞[. (d) m is continuous in probability. Proof (a) follows from Lemma 7.3.19 and (b) follows from (a) and a slight modification of the Kolmogorov Continuity Theorem (see I. Karatzas and S. Shreve [20]). For the details see Theorem 13.7.2 in [36]. (c) For r ∈ D choose sr , tr ∈ T, sr , tr r such that Msr is a lifting of m r and Mtr is a lifting of m r − . Since E L |m r − m r − | E Msr − Mtr 0, m r = m r − L -a.s., thus m is continuous in r L -a.s. The result (c) follows. The proof of Part (d) is left to the reader. Problems (1) (2) (3) (4)
Prove Proposition 7.3.1. Prove Lemma 7.3.2. Prove that m is continuous in probability. Let (ei )i∈N be an orthonormal basis of H. Prove that the functions of the form [e1 , . . . en ] build an orthonormal basis of H⊗n , where for all (a1 , . . . , an ) ∈ Hn , [e1 , . . . en ] (a1 , . . . , an ) := e1 , a1 · · · · · en , an .
7.4 Beginning of Malliavin Calculus The nonstandard approach to Stochastic Analysis for the Brownian motion B ends with some first basic ideas for the Malliavin calculus. For more details we refer to the book [36]. We use the so called chaos decomposition of square integrable
7 Stochastic Analysis
265
random variables under the Wiener measure. This means that these random variables can be completely characterized by a sequence of deterministic functions. These deterministic functions are infinitely close to smooth function, in fact, they are close to polynomials in several variables.
7.4.1 Chaos Decomposition Norbert Wiener has called measures with values in random variables chaos. For example, if B: FT × [0, ∞[→ B is the Brownian, introduced in Sect. 7.1, then we can define a kind of measure ρ on the intervals [s, r ] in [0, ∞[, setting ρ ([s, r ]) := B (·, r ) − B (·, s). This “measure” depends on chance and thus, in the terminology of Wiener, ρ is an example of chaos. Since random variables are characterized by deterministic functions in the chaos decomposition theorem, this result should better called chaos homogenization theorem, following Wiener’s famous article [44] with the title “On homogeneous chaos”. Therefore, the chaos representation property is in fact a chaos avoiding property. By the way the paintings of Jackson Pollock remind me of homogeneous chaos. Recall the terminology in Sect. 7.1. Although we shall work on the ∗ finitedimensional space FT , endowed with the measure L , we obtain standard results for the equivalent space CB of continuous functions f : [0, ∞[→ B, endowed with the Wiener measure W . In order to get this equivalence, we choose on FT the σalgebra W = WCH , generated by the Brownian motion. Note that CB is not a Banach space, but a separable Fréchet space. As we have already mentioned, Malliavin calculus is based on the chaos homogenization property. Wiener random variables ϕ ∈ L 2 (W ) can be characterized by a sequence ( f n )n∈N0 of square integrable deterministic functions f n: [0, ∞[n → H⊗n . It should be mentioned that each function F : T
266
H. Osswald
(W 3) Hn is a linear space over R. (W 4) Hn ⊆ F ∈ SL2 (ν n , F⊗n ) | ◦ F ∈ L 2Ln ((ν n ) L , H⊗n ) = L 2 (λn , H⊗n ) . (W 5) ◦ Hn := {◦ F | F ∈ Hn } is a closed subspace of L 2Ln ((ν n ) L , H⊗n ). (W 6) If F ∈ H1 is limited and the support of F is a subset of Tσ for some σ ∈ N, then 1T
∞
◦
In (Fn ) in L 2 ( L ) with ◦ In (Fn ) ∈ L 2W ( L ).
(+)
n=0
Moreover, ◦ I0 (F0 ) = ◦ F0 = E L (ϕ). Recall that ◦ In (Fn ) = In (◦ Fn ). It is crucial to assume that 1T
B
where ϕ+ = ϕ ∨ 0 and ϕ− = (−ϕ) ∨ 0. (Recall that a ∨ b := min {a, b}.) We − have to prove the equality of the finite measures + L and L , defined on W. By Proposition 7.2.13, W is generated by the linear space of functions I ( f ), where f ∈ L 2 (λ, H) is bounded with compact support. Therefore, we have to prove that +I ( f ) −I ( f ) and L of + and − the image measures L L under each I ( f ) coincide. L +I ( f ) + −1 (C) := L I ( f ) [C] for all Borel sets C in R. By Corollary Recall that L 7.3.10, each polynomial p(I ( f )) is a linear combination of iterated integrals with integrands of the form f ⊗ · · · ⊗ f , n-times. Let F ∈ SL2 (ν, F) be a bounded lifting of f . Since F ∈ H1 and therefore, by (W 6), 1T
ϕ+ · p(I ( f ))d L =
FT
ϕ− · p(I ( f ))d L .
(++)
7 Stochastic Analysis
267
We obtain R
e
i·x
+I ( f ) d L (x)
n i k · x k +I ( f ) d L = lim (x) = n→∞ R k! k=0
lim
n→∞
n
i k · I ( f )k + ϕ d L = (by (++)) k! FT k=0
n
i k · I ( f )k − −I ( f ) ϕ d L = ei·x d L (x). T k! F k=0 R
lim
n→∞
+I ( f )
−I ( f )
This proves that L = L , because the Fourier transformations of these measures are the same, thus ϕ = 0 L -a.s. 2 ◦ ◦ ◦ Let ϕ = ∞ n=0 In (Fn ) ∈ L W ( L ). Since E L In (Fn ) = E In (Fn ) = 0 for each n ≥ 1, we have E L (ϕ) = ◦ I0 (F0 ) =
◦
F0 ∈ R.
Recall that W denotes the Wiener measure on CB . Since the space L 2 (W ) can be identified with the space L 2W ( L ) via the canonical mapping ϕ → (X → ϕ(bB (X, ·))), we have also a chaos decomposition for the standard space of square integrable standard functionals ϕ : CB → R. The functions Fn and ◦ Fn in the chaos decomposition (+) of ϕ are called the kernels of ϕ, they are uniquely determined by ϕ in the following sense: Proposition 7.4.2 If in L 2 ( L ) ϕ=
∞
◦
In (Fn ) with symmetric Fn ∈ SL2 (ν n , F⊗n )
n=0
and ϕ=
∞
◦
In (G n ) with symmetric G n ∈ SL2 (ν n , F⊗n ),
n=0
then G n − Fn 0 in L 2 (ν n ) and (ν n ) L -a.e. Therefore, ◦ G n − ◦ Fn = 0 in L 2 (λn ) if F or G have standard parts.
268
H. Osswald
Proof By Proposition 7.3.1 and Corollary 7.3.4, we obtain 0=E
∞
n=0
◦
In (Fn ) −
∞
2 ◦
In (G n )
n=0
=E
∞
2 ◦
In (Fn − G n )
=
n=0
∞ ∞
2 ◦ E ◦ In (Fn − G n ) = E (In (Fn − G n ))2 = n=0
n=0 ∞
◦
n=0
T
Fn − G n 2 dν n .
By the symmetry of Fn and G n , G n − Fn 0 in L 2 (ν n ) for each n ∈ N0 . It follows that ν Ln a ∈ T n | G n (a) Fn (a) = 0, which proves the uniqueness. Proposition 7.4.2 shows that the sequence (Hn )n∈N0 with the properties (W 1),…, (W 6) is uniquely determined in the following sense: (W 1),…,(W Corollary 7.4.3 Suppose that (Hn ) and (Hn ) both fulfil the conditions 6). Then for each F ∈ Hn there exists an F ∈ Hn with F − F 0 in L 2 (ν n ), thus (ν n ) L -a.e. 2 7.4.1, ◦ In (F) has a Proof Fix F ∈ Hn . Since ◦ In (F) ∞∈ ◦L W ( L ), by Theorem ◦ In (F) = n=0 In (Fn ) with Fn ∈ Hn . By Proposition 7.4.2 chaos decomposition F − F 0 in L 2 (ν n ) and therefore (ν n ) L -a.e. n
We end this section with the following remark and an example of a sequence (Gn ) fulfilling the properties (W 1),…,(W 6). Remark 7.4.4 Of course, we obtain a simpler approach to the Malliavin calculus, when we take the compact time-line [0, σ], σ ∈ N, instead of [0, ∞[. Let σ be the centered Gaußian measure of variance H1 on FTσ and denote by CBσ the Banach space of continuous functions from [0, σ] into B. In analogy to the Brownian motion B there exists a continuous Brownian motion B σ : FTσ ×[0, σ] → B, thus B σ : FTσ → CBσ . Moreover, B σ is surjective onto CBσ . Let Wσ be the σ-algebra on FTσ , generated by B σ and augmented by the ( σ ) L -nullsets (see the beginning of Sect. 7.2.6). We obtain the following chaos decomposition result: Each ϕ ∈ L 2Wσ σL has the decomposition ∞
◦ ϕ= In (Fn ), n=0
where now Fn ∈ SL2 νσn , F⊗n and Fn is a lifting of a Lebesgue square integrable function f n: [0, σ]n → H⊗n . Note that here
7 Stochastic Analysis
269
In (Fn ): FTσ → ∗ R, X →
Fn (t1 , . . . , tn )(X t1 , . . . , X tn ).
t1 <···
Example 7.4.5 Fix a sequence (Hn )n∈N0 with the properties (W 1),…,(W 6). Set G0 := R. For n ≥ 1 let Gn be the set of all F ∈ Hn such that in L 2W ⊗L (( ⊗ ν) L , H) ◦
← → In−1,1 (F) = (X, t) → Eb◦ t ◦ In−1,1 ( F ) (X, t) .
← → Then (Gn )n∈N0 fulfils (W 1),…,(W 6). It follows that t → Eb◦ t ◦ In−1,1 ( F ) (·, t) is W-measurable for ν L -almost all t ∈ T . Moreover, In−1,1 (F) is a (Bt − )t∈T -adapted ← → lifting in SL2 ( ⊗ ν, F) of r → Ebr ◦ In−1,1 ( F ) (·, r ). ← → Recall that, since ◦ F ∈ L 2 (λn ) = L 2Ln ν Ln exists for F ∈ Hn , ◦ F exists in ← → L 2 (λn ) too. By Theorem 7.3.6, ◦ In−1,1 ( F ) exists. This means that there exists a function g in L 2W ⊗Leb ( L ⊗ λ), identified with L 2W ⊗L (( ⊗ ν) L ), such that ← → ( ⊗ ν) L (X, t) | In−1,1 ( F )(X, t) g(X,◦ t) = g(X, t) = 0. ← → Now g(X, r ) = ◦ In−1,1 ( F )(X, r ) for r ∈ [0, ∞[ L ⊗ λ-a.e., or g(X, t) = ← → ◦I n−1,1 ( F )(X, t) for t ∈ T ( ⊗ ν) L -a.e. Proof Obviously, (W 1),…,(W 4) are true. To prove (W 5), fix a Cauchy sequence (◦ Fm )m∈N in ◦ Gn with limit ◦ F, where F ∈ Hn for the moment. Then by Corollary 7.3.8, lim ◦ In−1,1 (Fm ) = ◦ In−1,1 (F) in L 2W ⊗L (( ⊗ ν) L , H) . m→∞
Moreover, by Jensen’s inequality and Theorems 6.3.10 and 7.3.6, FT ×T
2 ← → ← → b◦ r ◦ In−1,1 ( F ) (·, r ) − Eb◦ r ◦ In−1,1 ( Fm ) (·, r ) d ( ⊗ ν) L (·, r ) ≤ E H
FT ×T ◦
2 ← → ← → ◦ In−1,1 ( F ) (·, r ) − ◦ In−1,1 ( Fm ) (·, r ) d ( ⊗ ν) L = H
FT ×T
2 ← → ← → In−1,1 ( F ) (·, r ) − In−1,1 ( Fm ) (·, r ) d ⊗ ν = F
◦
T
T
← → → ← 2 F − Fm n dν n = F
← → ◦ → ◦ ← 2 F − Fm n d ν n L →m→∞ 0. H
270
H. Osswald
Since ◦ In−1,1 (Fm ) (·, r ) = Eb◦ r
◦I
← →
n−1,1 ( Fm ) (·, r )
in L 2W ⊗L (( ⊗ ν) L , H), we ← → also have ◦ In−1,1 (F) (·, r ) = Eb◦ r ◦ In−1,1 ( F ) (·, r ) in L 2W ⊗L (( ⊗ ν) L , H). To prove (W 6), fix a limited F ∈ G1 with support in Tσ for some σ ∈ N. Note ←−−−−→ that 1T
βi (X, r ) :=
F ⊗(n−1) (t, X t1 , . . . , X tn−1 ).
t1 <···
Then Eb◦ r ◦ In−1,1 (F ⊗n ) (·, r ) =
n
◦
F(r ) · Eb◦ r ◦ βi (·, r ).
i=1
We will prove that Eb◦ r ◦ βi (·, r ) = 0 for i < n. To this end fix D ∈ b◦ r . Then there exists an s r with r < s − and a C ∈ Bs − with L (CD) = 0. Recall that C = A × FT \Ts − for a certain Borel set A in FTs − . Set
F ⊗(n−1) (t, X t1 , . . . , X tn−1 ). αi (X, r, s) := t1 <···
Then
◦ D
2 2 2 βi (·, r )d L βi (·, r )d = αi (r, s, ·)d ≤ C
A
E αi2 (r, s, ·) ≤
T n−2 ×(]r,s]∩T )
⊗(n−1) 2 F n−1 dν n−1 0, F
because T n−2 × (]r, s] ∩ T ) is a νσn−1 L -nullset and F ⊗(n−1) is Sνσn−1 -square integrable. We obtain in L 2W ⊗L (( ⊗ ν) L , H), Eb◦ r ◦ In−1,1 (F ⊗n ) (·, r ) =
◦
In−1,1 (1T
7.4.2 A Lifting Theorem for Functionals in L 2W ( L ) We now apply the Chaos Decomposition Theorem to prove that each ϕ ∈ L 2W ( L ) is infinitely close to an internally smooth function. ∞ ◦ Theorem 7.4.6 Suppose that ϕ ∈ L 2W ( L ) and ϕ = n=0 In (Fn ) according to Theorem 7.4.1. Choose an internal extension (Fn )n∈∗ N0 of (Fn )n∈N0 such that Fn: T=n → F⊗n is symmetric for all n ∈ ∗ N. Then there exists an unlimited number K ∈ ∗ N such that for each unlimited M ∈ ∗ N, M ≤ K ,
7 Stochastic Analysis
:=
M
271
In (Fn ): FT → ∗ R ∈ SL2 () and ϕ L -a.s.
n=0
Proof Let be a lifting of ϕ in SL2 (). Since lim
m→∞
◦
E
m
n=0
2 In (Fn ) −
= lim E L m→∞
m
2 ◦
In ( Fn ) − ϕ
= 0,
n=0
∗ ∗ there exists anunlimited number K 2 ∈ N such that for each unlimited M ∈ N, M 0. Using the triangle inequality for L 2 M ≤ K , E n=0 In (Fn ) − M 2 spaces, := n=0 In (Fn ) ∈ SL (). Since L -a.e. (see the proof of “(a) ⇒ (c)” in the proof of Theorem 7.2.10), is a lifting of ϕ.
7.4.3 Computation of the Kernels Let us now provide an effective recipe for computing the kernels of the chaos decomposition. We will see that the kernels Fn of a Wiener functional ϕ at the point (t1 , . . . , tn ) can be computed by computing the internal expected value of an appropriate lifting of ϕ multiplied by white noise B˙ t1 · · · · · B˙ tn at (t1 , . . . , tn ). Our results are straightforward extensions of corresponding results, due to Cutland and Ng [8]. Fix an internal mapping : FT → ∗ R ∈ L 2 (). Then we call the function n: T
(+)
the nth white noise function of . Equation (+) has the following intuitive meaning. Set t := H1 and recall that X t is the increment Bt to time t of the internal process B. Then the equation (+) can be written in the form n (t1 , . . . , tn )(a1 , . . . , an ) = & % & % Btn Bt1 E · , a1 · · · · · , an = E · B˙ t1 , a1 · · · · · B˙ tn , an , t1 tn where B˙ t may be understood as the “derivative” of B at time t. Cutland and Ng [8] have pointed out that it was the intention of N. Wiener [44], to think of the kernels f n of ϕ = ∞ n=0 In ( f n ) as being given by f n (t1 , . . . , tn ) = E(ϕ · b˙t1 · · · · · b˙tn ),
(+)
272
H. Osswald
although Brownian motion is not differentiable. By using the hyperfinite time line T, we have a way (without using Schwartz’ distributions), to give a correct mathematical meaning to Equality (+). N. Wiener has used chaotic representations, to find better models for telecommunications under noise. (I refer the reader to the marvellous book of Masani [28]). Theorem 7.4.7 Suppose that ϕ ∈ L 2W ( L ). Let ∈ SL2 () be a lifting of ϕ. Then F⊗n ) and n is nearstandard, (a) n ∈ SL2 (ν n , ◦ (b) ϕ = E L ϕ + ∞ n=1 In (n ). ◦ 2 Proof By Theorem 7.4.6, ϕ = E L ϕ + ∞ n=1 In (Fn ) has a lifting ∈ SL () of the form K
In (Fn ) with I0 (F0 ) = E, = n=0
where K ∈ ∗ N0 and Fn: T=n → F⊗n ∈ SL2 (ν n , F⊗n ) is symmetric. Since ! U :=
K
' In (G n ) | G n :
⊗n
→F
T
internal
n=0
is an internally closed subspace of L 2 (), there exists an internal K + 1-tuple (G n )0≤n≤K of internal G n : T
In (G n ) + . = n=0
We will now prove that G n = n for all n = 0, . . . , K : Fix an internal orthonormal basis E of F and internal n-tuples (e1 , .., en ) ∈ En and (t1 , . . . , tn ) ∈ T
K
In (G n ) · X t1 , e1 · · · · · X tn , en
=
n=0
2 2 E G n (t1 , . . . , tn )(e1 , . . . , en ) · X t1 , e1 · · · · · X tn , en = G n (t1 , . . . , tn )(e1 , . . . , en )
1 , Hn
7 Stochastic Analysis
and
273
β = E · X t1 , e1 · · · · · X tn , en = 0.
This proves that G n = n . Now K
0 E ( − ) = E 2
In (G n ) + −
n=0
E
K
K
2 In (Fn )
=
n=0
(In (n − Fn ))2 + E 2 ≥
n=0
E (In (n − Fn ))2 d =
T
n − Fn 2 dν n .
It follows that Fn n in L 2 (ν n , F⊗n ). Note that therefore Fn − n 0 ν Ln a.e. (see the proof of “(a) ⇒ (c)” in the proof of Theorem 7.2.10). Since Fn belongs to SL2 ν n , F⊗n and is nearstandard, n is a nearstandard element of SL2 ν n , F⊗n . This proves (a). Moreover, E (In (n − Fn ))2 0. Therefore, ◦ In (Fn ) = ◦ In (n ) in L 2 ( L ). This proves (b). 1
Example 7.4.8 Let ϕ = e I ( f )− 2
2 T f H dν L
with f ∈ L 2L1 (ν L , H). Fix a lifting 2 T FF dν ∈ SL2 () is a lifting of ϕ. Let F ∈ SL2 (ν, F) of f . Then := e E be an orthonormal basis of F. Transfer of the substitution rule tells us that for all m ∈ N, for all t ∈ T<m and all e ∈ Em I (F)− 21
E · X t1 , e1 · · · · · X tm , em H m = F ⊗m (t) (e) , thus, 1
e I ( f )− 2
2 T f H dνl
=
∞
In ( f ⊗n ) with I0 ( f ⊗0 ) = 1
n=0
The proof is left to the reader. In the following section one can find a second example, which will be used to prove the product and chain rule in a quite constructive way, applying the recipe for the computation of the kernels (see [35]).
7.4.4 The Kernels of the Product of Wiener Functionals Our aim is to compute the kernels of the product ϕ · ψ with ϕ, ψ ∈ L 2W ( L ). Since ϕ · ψ is not square integrable, in general, we assume that ϕ, ψ belong to finite chaos levels. By Theorem 7.4.6 ϕ, ψ have liftings , of the form
274
H. Osswald
=
M
In (Fn ), =
n=0
M
In (G n ) with M ∈ N,
(+)
n=0
where the Fn , G n : T=n → F⊗n are symmetric and belong to SL2 ν n , F⊗n . By Corollary 7.3.4, · ∈ SL2 (), thus ϕ · ψ ∈ L 2W ( L ). By Theorem 7.4.7, the kernels of ϕ · ψ are the standard parts of internal functions K m : T<m → F⊗m , which can be computed by the recipe K m (r1 , . . . , rm ) (a1 , . . . , am ) = H m · E · · X r1 , a1 · · · · · X rm , am . We may assume that the a1 , . . . , am are elements of an orthonormal basis E = (ei )i∈{1,...,ω} of F. Let us use the following shorthand xs,i := X s , ei . In order to compute K m , given by K m (r1 , . . . , rm , eρ1 , . . . , eρm ) = H m E · · xr1 ,ρ1 · · · · · xrm ,ρm , we have to compute all possible a := E xt1 ,τ1 · · · · · xt j ,τ j · xs1 ,σ1 · · · · · xsk ,σk · xr1 ,ρ1 · · · · · xrm ,ρm with t1 < · · · < t j , s1 < · · · < sk , r1 < · · · < rm . Here are some typical examples: (i) Let rm < t := t j = sk . If σ := τ j = σk , then 2 = a = E xt1 ,τ1 . . . xt j−1 ,τ j−1 xs1 ,σ1 . . . xsk−1 ,σk−1 xr1 ,ρ1 . . . xrm ,ρm · EBt − xt,σ E xt1 ,τ1 . . . xt j−1 ,τ j−1 xs1 ,σ1 . . . xsk−1 ,σk−1 xr1 ,ρ1 . . . xrm ,ρm 2 = E x2 = because EBt − xt,σ 1
a = E
1 H.
1 H
,
If τ j = σk , then
xt1 ,τ1 . . . xt j−1 ,τ j−1 xs1 ,σ1 . . . xsk−1 ,σk−1 xr1 ,ρ1 . . . xrm ,ρm · EBt − xt,τ j xt,σk
= 0,
because EBt − xt,τm · xt,σk = E 2 x · y = 0. We may continue in the same manner: for example, let t j−1 < sk−1 = rm , σk−1 = ρm , rm < t j = sk and τ j = σk . Then 1 a = E xt1 ,τ1 . . . xtm−1 ,τm−1 xs1 ,σ1 . . . xsk−2 ,σk−2 xr1 ,ρ1 . . . xrn−1 ,ρn−1 2 . H
7 Stochastic Analysis
275
(ii) Let t j , sk < rm =: r . Then a = E xt1 ,τ1 . . . xt j ,τ j xs1 ,σ1 . . . xsk ,σk xr1 ,ρ1 . . . xrm−1 ,ρm−1 EBr − xr,ρm = 0, because EBr − xr,ρm = E 1 x = 0. (iii) Let t := t j = sk = rm . Then xt1 ,τ1 . . . xt j−1 ,τ j−1 xs1 ,σ1 . . . xsk−1 ,σk−1 xr1 ,ρ1 . . . xrm−1 ,ρm−1 · a = E = 0, EBt − xt,τ j xt,σk xt,ρm because E 1 x 3 = E 2 x 2 · y = E 3 x · y · z = 0. Here is a difference to other Lévy processes. In general, Eμ1 x 3 = 0. To gain control of all these combinations of products, we introduce strictly monotone increasing functions σ, defined on subsets of m = {1, . . . , m}, by identification of m ∈ N0 with the set {1, . . . , m}. It follows that 0 = ∅. Let m ∈ N0 with k ≤ m. If σ is a strictly monotone increasing function from k into m, then we will write σ: k ↑ m. For σ: k ↑ m, let σ: m − k ↑ m \ range(σ). For example, if k = 0, then σ: k ↑ m = ∅ and σ: m ↑ m = idm . Since the Fi , G i are symmetric, we obtain from Theorem 7.4.7, using somehow technical, but elementary finite combinatorics: Theorem 7.4.9 Assume that ϕ, ψ ∈ L 2W ( L ) belong to finite chaos levels and let , be liftings of ϕ, ψ, according to Equation (+). For all (r1 , . . . , rm ) ∈ T
M m
1 · n H n n k=0 σ:k↑m n=0 t∈T< b∈E
Fn+k (t, rσ1 , . . . , rσk , b, aσ1 , . . . , aσk)· G n+m−k (t, rσ1 , . . . , rσm−k , b, aσ1 , . . . , aσm−k). Then K m can be extended to a uniquely determined mapping from T<m into F⊗m , also denoted by K m , and we have ϕ·ψ =
m∈ ∗ N0
◦
Im (K m ) =
Im (◦ K m ).
m∈N0
Recall that the m ∈ ∗ N0 are running through a standard finite set.
276
H. Osswald
7.4.5 The Malliavin Derivative Since our sample space FT is an internal finite-dimensional Euclidean space, the Malliavin derivative is infinitely close to the elementary derivative in finitedimensional Euclidean spaces. Here are the details: In order to define the Malliavin derivative, we look at the derivative of the function In (F) which is defined on a ∗ finite-dimensional space FT . In analogy to standard analysis, an internal function G, defined on FT with values in ∗ R or F is called differentiable at (X, t) ∈ FT × T if there exists an internal linear function L X,t : F → ∗ R or L X,t : F → F, respectively, such that lim
h→0
G(X 1 , . . . , X t−1 , X t + h, X t+1 , . . . , X H ) − G(X ) − L X,t (h) = 0. h
If G is differentiable at (X, t) for each (X, t) ∈ FT × T then DG: (X, t) → L X,t is called the derivative of G at (X, t). We see that D In (F) = In−1,1 (F): FT × T → F (= F ) for symmetric F: T=n → F⊗n . According to Theorem 7.4.1, fix ϕ ∈ L 2W ( L ) with chaos decomposition ϕ = Eϕ +
∞
◦
In (Fn ) = Eϕ +
n=1
∞
In (◦ Fn ).
n=1
By Theorem 7.3.6 and Corollary 7.3.16, we may assume that In−1,1 (Fn ) ∈ SL2 ( ⊗ ν, F) and ◦ In−1,1 (Fn ) ∈ L 2W ⊗L1 (( ⊗ ν) L , H). Now the Malliavin derivative D of ϕ is nothing but finite-dimensional differentiation under the standard part map and the sum, i.e., D is defined on a dense subspace of L 2W ( ) by setting D(ϕ) :=
∞
◦
In−1,1 (Fn )
n=1
◦ for those ϕ ∈ L 2W ( L ) such that ∞ n=1 In−1,1 (Fn ) converges in the Hilbert space 2 L W ⊗L1 (( ⊗ ν) L , H). If D(ϕ) converges, ϕ is called Malliavin differentiable. Note that we may assume that Fn is symmetric and Fn (t1 , . . . , tn ) = 0 implies (t1 , . . . tn ) ∈ T=n . In order to characterize the domain of the Malliavin operator, we need the following lemma. Lemma 7.4.10 Suppose that F = Fn . Then FT ×T
◦ In−1,1 (F)2 d ( ⊗ ν) L = n · H
FT
◦
In (F)
2
d L .
7 Stochastic Analysis
277
Proof Using the fact that In−1,1 (F) belongs to SL2 ( ⊗ ν, F) and In (F) to SL2 (), we obtain from Corollary 6.3.20 and the symmetry of F, FT ×T
◦ In−1,1 (F)2 d ( ⊗ ν) L =: ◦ In−1,1 (F)2 = H (⊗ν)
◦
L
◦
T
n·
◦
F2Fn dν n = n · ◦
(In (F)) d = n · 2
FT
FT
2 In−1,1 (F)⊗ν =
T
F2Fn dν n =
(In (F))2 d L .
Now thanks to the orthogonality of ◦ In−1,1 (F) and ◦ Im−1,1 (G) for m = n (see Proposition 7.3.7), we obtain: Proposition 7.4.11 A function ϕ ∈ L 2W ( L ) is Malliavin differentiable if and only √ ◦ 2 if ∞ n=1 n · In (Fn ) converges in L W ( L ). Therefore, D is densely defined. Since the spaces L 2W ⊗L1 (( ⊗ ν) L , H), L 2W ⊗Leb[0,∞[ ( L ⊗ λ, H) and L 2 (W ⊗ λ, H) can be identified, and also L 2 (W ) and L 2W ( L ) the Malliavin derivative is now also densely defined for functionals in L 2 (W ) and takes its values in L 2 (W ⊗ λ, H). It is a common practice to denote the domain of the Malliavin derivative by D1,2 .
7.4.6 A Commutation Rule for Derivative and Limit The next result reminds of a result in elementary analysis, which says that we can interchange derivative and limit if the sequence of derivatives converges uniformly and the original sequence converges in at least one point. Theorem 7.4.12 Suppose that (ϕk ) is a sequence of functions in D1,2 such that (Dϕk ) converges in L 2W ⊗L1 (( ⊗ ν) L , H) and suppose that (E ϕk ) converges in the real numbers. Then (ϕk ) converges to a function in D1,2 and D( lim ϕk ) = lim Dϕk in L 2W ⊗L1 (( ⊗ ν) L , H). k→∞ k→∞ Proof According to Theorem 7.4.1, ϕk has a chaos decomposition ϕk = In ( f n,k ). Then,
∞
n=0
278
H. Osswald
0 = lim Dϕk − Dϕl 2(⊗ν) L = k,l→∞
∞ 2 In−1,1 ( f n,k − f n,l ) d ( ⊗ ν) L =
lim
k,l→∞ FT ×T
H
n=1
(because of the orthogonality of the In−1,1 ( f n,k − f n,l ), n ∈ N (see the standard proof of Proposition 5.6.5 in [36]) lim
k,l→∞
lim
∞
In−1,1 ( f n,k − f n,l )2 = (⊗ν) L
n=1 ∞
k,l→∞
n=1
T
f n,k − f n,l )2 n dν n . L H
Since all f n,k are symmetric, f n,k k∈N is a Cauchy sequence in L 2Ln ν Ln , H⊗n for all n ∈ N. Let limk→∞ f n,k = f n in L 2Ln ν Ln , H⊗n . It follows that lim In−1,1 ( f n,k ) = In−1,1 ( f n ) in L 2W ⊗L1 (( ⊗ ν) L , H) ,
k→∞
and lim In ( f n,k ) = In ( f n ) in L 2W ( L ) .
k→∞
Because of the orthogonality of the In ( f n ) and of the In−1,1 ( f n ), the series ∞ n=1 2 2 In ( f n ) and ∞ n=1 In−1,1 ( f n ) converge in L W ( L ) , L W ⊗L1 (( ⊗ ν) L , H), respectively, and lim
k→∞
Eϕk +
∞
In ( f n,k ) =
n=1
lim Eϕk +
k→∞
∞
In ( f n ) =: ϕ
n=1
in L 2W ( L ), and in L 2W ⊗L1 (( ⊗ ν) L , H) we have lim
k→∞
∞
In−1,1 ( f n,k ) =
n=1
∞
In−1,1 ( f n ) = Dϕ.
n=1
7.4.7 The Clark-Ocone Formula The Clark-Ocone formula is a martingale representation of a large class of random variables: Each Malliavin differentiable function ϕ is the It ô integral of the condi-
7 Stochastic Analysis
279
tional expectation of its derivative. More general, if ϕ is not Malliavin differentiable, then ϕ can be written as a stochastic integral plus a constant. This formula has been proved by Clark [7] and more general by Ocone [31] for the classical Wiener space CR . In this case, a simple proof, using saturation, can be found in the work of Cutland and Ng [8]. Berger [4] proved the Clark-Ocone formula for the abstract Wiener space, using the Üstünel-Zakai-Itô integral, based on a resolution of the identity on H. In [32] there is a proof of this formula for the space CB . In analogy to Cutland and Ng’s approach to the Clark-Ocone formula for the classical Wiener space, we will now prove this formula for our general setting in a quite simple way. Indeed, in the internal setting the Clark-Ocone formula is obvious, because for all internal functions F: T=n → F⊗n In (F) =
V
t → EBt − In−1,1 (F)(·, t)B ,
V where was defined in Sect. 7.2.4. Note that EBt − In−1,1 (F)(·, t) = In−1,1 (1T
ϕ = E L ϕ +
V
r → Ebr
∞
◦
In−1,1 (Fn )(·, r ) d B .
n=1
(b) If ϕ is Malliavin differentiable, then ϕ = E L ϕ + Since Ebr
◦
V
Ebr Dϕ(·, r )d B .
In−1,1 (Fn ) (·, r ) is W-measurable, we may replace br by br ∩ W.
Proof Using Theorem 7.3.6 and the results in Example 7.4.5, we obtain in L 2 ( L ), ∞
◦
In (Fn ) =
n=1
∞
◦
V
t → In−1,1 (1T
n=1 ∞
V
r → Ebr ◦ In−1,1 (Fn )(·, r ) d B =
n=1
V
r → Ebr
∞
n=1
◦
In−1,1 (Fn )(·, r ) d B .
=
280
H. Osswald
If ϕ is Malliavin differentiable, then
V
br
r → E
∞
◦
In−1,1 (Fn )(·, r ) d B =
V
r → Ebr Dϕ(·, r ) d B .
n=1
◦ 2 Wesee that r → Ebr ∞ n=1 In−1,1 (Fn )(·, r ) exists in L ( L ⊗ λ, H), although ∞ ◦ r → n=1 In−1,1 (Fn )(·, r ) need not to be L ⊗ λ-square integrable.
7.4.8 A Lifting Theorem for the Derivative Using saturation, we obtain the following lifting theorem for the Malliavin derivative. The proof is similar to the proof of Theorem 7.4.6. ∞ ◦ Theorem 7.4.14 Suppose that ϕ = n=0 In (Fn ) is Malliavin differentiable. Choose an internal extension (Fn )n∈∗ N0 of (Fn )n∈N0 . Then there exists an unlimited number K ∈ ∗ N such that for each unlimited M ∈ ∗ N, M ≤ K , :=
M
In (Fn ) ∈ SL2 () and ϕ L -a.s.,
n=0
D : (X, t) →
M
In−1,1 (Fn )(X, t) ∈ SL2 ( ⊗ ν, F),
n=1
and (see Sect. 6.3.7) D Dϕ in F ( ⊗ ν) L -a.e. We may assume that all the kernels Fn of are symmetric and Fn is defined on T=n . Here is an example, in which we compare the usual derivative with the Malliavin derivative. Example 7.4.15 Recall the notation and results in Example 7.4.8. Then := e 1
is a lifting of ϕ := e I ( f )− 2
t∈T,e∈E
2 T f H dν L
Ft (e)·X t ,e− 21
2 T FF dν
. We obtain for the usual derivative
1 2 d() = e t∈T,e∈E Ft (e)·X t ,e− 2 T f H dν L · Fs (e) = (X ) · Fs (e) . d X s , e
7 Stochastic Analysis
281
By Theorem 7.4.14 and Example 7.4.8, ϕ and Dϕ have liftings of the form : X →
Ft1 (X t1 ) · · · · · Ftn (X tn )
n≤M t∈T
D : (X, s, e) →
Ft1 (X t1 ) · · · · · Ftn (X tn ) · Fs (e)
n≤M t∈(T \{s})n<
(X ) · Fs (e) . It follows that are the same.
d() dX s ,e
D (X, s) (e). Thus, the standard parts of both derivatives
7.4.9 The Skorokhod Integral In the preceding section we have seen that the Malliavin derivative D is closely related to the usual derivative in finite-dimensional Euclidian spaces. In order to see that the Skorokhod integral can be defined as the standard part of a Riemann-Stieltjes integral, we will first prove that each ψ ∈ L 2W ⊗L1 (( ⊗ ν) L , H) an orthogonal expansion 2 n+1 , F⊗(n+1) ) is symmetric in the first n ◦ ψ= ∞ n=0 In,1 (Fn ), where Fn ∈ SL= (ν variables and nearstandard. Then for elements ψ of a dense subspace ⊆ L 2W ⊗L (( ⊗ ν) L , H) the Skorokhod integral δ is simply defined by Riemann-Stieltjes integration under the standard part map and under the sum, i.e. δψ(X ) :=
∞
◦
n=0
V
∞
◦ In,1 (Fn )B (X ) := In (Fn (·, t))(X t ). n=0
t∈T
In order to define the Skorokhod integral, which is the reversal of the derivative from several points of view, we now provide a suitable decomposition of the functionals in L 2W ⊗L1 (( ⊗ ν) L , H). Recall the definition of F for functions
F: T=n+1 → F⊗(n+1) , which are symmetric in the first n arguments (see Sect. 7.3.2). Note that F ∈ SL2 (ν n+1 , F⊗(n+1) ) if F ∈ SL2(ν n+1 , F⊗(n+1) ) and note that F is symmetric. The function f for f ∈ L 2 ( ν n+1 L , H⊗(n+1) ) is defined in the same ◦I way. Moreover, by Corollary F) ∈ L W ( L ) if ◦ F exists F) = In+1 (◦ " n+1 ( " n+1 7.3.15, 2 ⊗(n+1) ,H ). We define for each r ∈ T and belongs to L Ln+1 ( ν L 1Tr ·n+1 F (t1 , . . . , tn+1 ) := 1Tr (tn+1 ) · F(t1 , . . . , tn+1 ). Since ν Ln T n \ T=n = 0, we may assume, in order to study S-integrability, that T=n is the support of internal functions, defined on T n . Then
282
H. Osswald
V
In,1 (1Tr ·n+1 F)B(X ) =
F(t1 , . . . , tn , t)(X t1 , . . . , X tn , X t ) =
t≤r t1 <···
·n+1 F . In+1 1Tr
Theorem 7.4.16 Let ϕ ∈ L 2W ⊗L1 (( ⊗ ν) L , H). Then there exists a sequence (Fn )n∈N0 of internal functions Fn: T n+1 → F⊗(n+1) with the following five properties: (a) Fn is symmetric in the first n variables, Fn ∈ SL2= (ν n+1 , F⊗(n+1) ) and ◦ Fn ∈ L 2Ln+1 ( ν n+1 L , H⊗(n+1) ). (b) In,1 (Fn ) ∈ SL2 ( ⊗ ν, F) and ◦ In,1 (Fn ) ∈ L 2W ⊗L1 (( ⊗ ν) L , H). Fn ) ∈ SL2 () and ◦ In+1 (" Fn ) ∈ L 2W ( L ). (c) In+1 (" (d) r → In+1 1Tr ·n+1 F n ∈ SL2 ( ⊗ ν) and r → ◦ In+1 1Tr ·n+1 F n is in L 2W ⊗L1 (( ⊗ ν) L ). 2 ◦ (e) ϕ = ∞ n=0 In,1 (Fn ) converges in L W ⊗L1 (( ⊗ ν) L , H). ∞ ◦ 2 Uniqueness: If ϕ = n=0 In,1 (K n ) converges in L (( ⊗ ν) L , H) and K n ∈ 2 n+1 ⊗(n+1) (ν , F ) is symmetric in the first n variables, then Fn Fn+1 K n SLn+1 ν -a.e. L This decomposition also yields a corresponding decomposition of functionals in L 2W ( L ⊗ λ, H) or in L 2 (W ⊗ λ, H). Proof Let M be the set of all ϕ ∈ L 2W ⊗L1 (( ⊗ ν) L , H) such that there exists a sequence (Fn )n∈N0 of internal functions Fn : T=n+1 → F⊗n+1 with the properties (a),…,(e). We shall prove that M = L 2W ⊗L1 (( ⊗ ν) L , H): First assume that ϕ := 1 B×C ⊗ a = 1 B ⊗ 1C ⊗ a with B ∈ W, C ∈ L1 , C ⊆ TS , where S ∈ N, and a ∈ H. Note that for all X ∈ FT , t ∈ T and x ∈ H = H = H⊗1 , ϕ(X, t)(x) = 1 B (X ) · 1C (t) · a, x . We will prove that ϕ ∈ M. According to Theorem 7.4.1, 1 B has the decomposition 1B =
∞
n=0
◦
In (G n ) =
∞
In (◦ G n ) in L 2W ( L ) with ◦ I0 (G 0 ) = E(1 B ).
(+)
n=0
There exists an internal subset A ⊂ TS such that ν L (A C) = 0. For n ∈ N0 we define Fn: T n × T → F⊗1 , setting Fn := G n ⊗ 1 A ⊗ ∗ a : (t, s, y) → G n (t) · 1 A (s) · ∗ a, y .
7 Stochastic Analysis
283
If t ∈ / T=n+1 , set Fn (t) := 0. By Lemma 6.3.16 (b), Fn ∈ SL2= (ν n+1 , F⊗(n+1) ). Thus, In+1 (" Fn ) ∈ SL2 () (Corollary 7.3.4) and In,1 (Fn ) ∈ SL2 ( ⊗ ν, F) (Theorem 7.3.6 (a)). Since ◦ G n is Ln -measurable and 1 A is L1 -measurable, (a) is true. Since 1 A ⊗ a ∈ L 2L1 (ν L , H), we obtain ◦ In,1 (Fn ) ∈ L 2W ⊗L1 (( ⊗ ν) L , H), and, by Corollary Fn ) ∈ L2W ( L ). Thus, (b) and (c) are true. In a similar way we see 7.3.15, ◦ In+1 (" that (d) is true. Because of (+), 1 B ⊗ 1C ⊗ a = (in L 2W ⊗L1 (( ⊗ ν) L , H)) 1 B ⊗ 1 A ⊗ a = L (B) · 1 A ⊗ a +
∞
◦
In,1 (Fn ) converges in L 2W ⊗L1 (( ⊗ ν) L , H),
n=1
thus (e) is true. This shows that ϕ = 1 B ⊗ 1C ⊗ a ∈ M. Since M is a linear space, it remains for us to show that M is complete: Let (ϕk ) 2 ◦ be a Cauchy sequence in M with ϕk = ∞ n=0 In,1 (Fn,k ) in L W ⊗L1 (( ⊗ ν) L , H) such that (a),…,(d) are true with Fn,k instead of Fn . Then, by Proposition 7.3.7, ϕk − ϕl 2(⊗ν) L
2 ∞ ◦ = In,1 (Fn,k − Fn,l )
FT ×T
:=
(⊗ν) L
n=0
∞ 2 ◦ In,1 (Fn,k − Fn,l ) d ( ⊗ ν) L H
n=0
(because of the orthogonality of the ◦ In−1,1 (Fn,k − Fn,l ), n ∈ N (see the standard proof of Proposition 5.6.5 in [36]), ∞ ∞
◦ In,1 (Fn,k − Fn,l )2 = (⊗ν) L
n=0
n=0
T
◦ Fn,k − ◦ Fn,l 2 n+1 d ν n+1 . H L
Since in the first n variables, (◦ Fn,k )k∈N is a Cauchy in n,k ) is symmetric (F n+1 sequence 2 2 n+1 ⊗(n+1) ⊗(n+1) ). , H ) converging to an f ∈ L ( ν , H L ( ν n L L Ln+1 In the same way, ◦ In,1 (Fn,k ) k∈N converges to a gn ∈ L 2W ⊗L1 (( ⊗ ν) L , H), ◦I Fn,k ) converges to an h n ∈ L 2W ( L ), ◦ In+1 1Tt ·n+1 F n,k conn+1 ( k∈N
k∈N
verges to an i n ∈ L 2W ⊗L1 (( ⊗ ν) L ). By Corollary 6.3.11, we may assume that f n = ◦ Fn , gn =◦ In,1 (Fn ), h n = ◦ In+1 (" Fn ) and i n = t → ◦ In+1 1Tt ·n+1 F n , and that the internal functions behind ◦ fulfil the required S-integrability conditions. Moreover, we may assume that Fn is symmetric in the first n components and Fn is defined on T=n+1 . This proves that M is complete. It follows that M = L 2W ⊗L1 (( ⊗ ν) L , H).
284
H. Osswald
We will now prove the uniqueness. Using similar computations, we obtain, ∞ 2 ◦ 0= In,1 (Fn − K n )
=
(⊗ν) L
n=0
∞
◦
n=0
T
Fn − K n 2Fn+1 dν n+1 .
Fn − K n 2Fn+1 dν n+1 0. Since Fn and K n are symmetric in the first n variables, Fn Fn+1 K n ν n+1 L -a.e. It follows that
T
Now we are able to define the Skorokhod integral. Let ϕ ∈ L 2W ⊗L1 (( ⊗ ν) L , H) n+1 ◦ with ϕ = ∞ → F⊗(n+1) fulfils the conditions (a), n=0 In,1 (Fn ), where Fn : T= ◦ (b), (c), (d) in Theorem 7.4.16, in particular, In+1 (" Fn ) ∈ L 2W ( L ). We define δϕ =
∞
◦
V
In,1 (Fn )B
: X →
n=0
∞
◦
n=0
(In (Fn (·, t))(X )) (X t )
t∈T
if this series converges in L 2W ( L ). Note that
Fn )(X ). (In (Fn (·, t))(X )) (X t ) = In+1 (" t∈T
Therefore, δϕ =
∞
◦
In+1 (" Fn ),
n=0
◦ " 2 Fn ) is W -measurable, δϕ exists iff ∞ and, since ◦ In+1 (" n=0 E( In+1 ( Fn )) converges. Thelinear operator δ is called the Skorokhod integral. Since for each finite sum ψ = kn=0 ◦ In,1 (Fn ) ∈ L 2W ⊗L1 (( ⊗ ν) L , H) the Skorokhod integral δψ of ψ always exists, the domain of δ is a dense subspace of L 2W ⊗L1 (( ⊗ ν) L , H). A function ϕ ∈ L 2W ⊗L1 (( ⊗ ν) L , H) is called Skorokhod integrable (see [38]) if ϕ ∈ . Since the spaces L 2W ⊗L1 (( ⊗ ν) L , H) and L 2 (W ⊗ λ, H) can be identified, and also L 2 (W ) and L 2W ( L ), the Skorokhod integral is densely defined for functionals in L 2 (W ⊗ λ) and takes its values in L 2 (W ). Recall that W is the Wiener measure on the Fréchet space CB of continuous function from [0, ∞[ into B
7.4.10 Product and Chain Rules for the Malliavin Derivative The recipe for the computation of the kernels of the product of two functionals in L 2W ( L ) is now used to prove product and chain rules for the derivative. We have
7 Stochastic Analysis
285
seen that the computation of the kernels is simple and intuitive. It will be seen that it is also effective. However, its application is rather technical, although, one should better say, because it is quite constructive, even in the case of infinite-dimensional Brownian motion In a joint paper together with Sam Sanders [37] we have called this technical procedure “local constructivity”, which means, roughly speaking: move from the standard world to a nonstandard model, a very inconstructive way. In this nonstandard world finite combinatorics is used to obtain desired results. (Recall that the kernels of the chaos decomposition only depend on finitely many arguments, and finitely many of them can be used to approximate square integrable Lévy functionals as close as we want.) Then, by taking standard parts, again very inconstructive, nice standard results can be obtained (recall Sect. 7.4.4 and see also Sects. 7.6.7 and 7.6.8 below). M M In (Fn ), = n=0 In (G n ), We choose ϕ, ψ ∈ L 2W ( L ) with liftings = n=0 n ⊗n 2 are symmetric. Use according to Theorem 7.4.6. Recall that Fn , G n ∈ SL ν , F the notation in Sect. 7.4.4., Let E be an orthonormal basis of F. First assume that ϕ and ψ belong to finite chaos levels. Then M ∈ N and we can assume that D, D, D ( · ) ∈ SL2 ( ⊗ ν, F) are liftings of Dϕ, Dψ, D (ϕ · ψ). In Sect. 7.4.4 we have seen that the kernels of the chaos decomposition of ϕ · ψ are standard parts of K m : T<m → F⊗m , where for each (a1 , . . . , am ) ∈ Em , 2
K m (r1 , . . . rm ) (a1 , . . . , am ) =
m M
A·B·
k=0 σ:k↑m n=0 t∈T
1 Hn
with A = Fn+k (t, rσ1 , . . . rσk , e, aσ1 , . . . , aσk ), B = G n+m−k (t, rσ1 , . . . rσm−k , e, aσ1 , . . . , aσk ). Since the Fn and G n are symmetric, the K m are symmetric on T=m . Note that for s ∈ T , b ∈ E and r1 < · · · < rm−1 the functions K m (·, s, ·, b) with K m (r1 , . . . rm−1 , s) (a1 , . . . , am−1 , b) = H m−1 E D ( · )s (b) · X r1 , a1 · · · · · X rm−1 , am−1 form the kernels of D ( · )s (b). Note that K m (r1 , . . . rm−1 , s, a1 , . . . , am−1 , b) = K mD· + K m·D (r, s, a, b) with K mD· (r1 , . . . rm−1 , s, a1 , . . . , am−1 , b) =
286
H. Osswald m−1
2
M
1 Fn+k+1 (t, rσ1 , . . . rσk , s, e, aσ1 , . . . aσk , b)· Hn n n
k=0 σ:k↑m−1 n=0 t∈T< e∈E
G n+m−1−k (t, rσ1 , . . . rσm−1−k , e, aσ 1 , . . . aσm−1−k ) and
K m·D (r1 , . . . rm−1 , s, a1 , . . . , am−1 , b) = m−1
2
M
1 F (t, rσ1 , . . . rσk , e, aσ1 , . . . aσk )· n n+k H n n
k=0 σ:k↑m−1 n=0 t∈T< e∈E
G n+m−k (t, rσ1 , . . . rσm−1−k , s, e, aσ 1 , . . . aσm−1−k , b). Note that K mD· (r1 , . . . rm−1 , s, a1 , . . . , am−1 , b) = H m−1 E Ds (b) · · X r1 , a1 · · · · · X rm−1 , am−1 and
K m·D (r1 , . . . rm−1 , s, a1 , . . . , am−1 , b) = H m−1 E · Ds (b) · X r1 , a1 · · · · · X rm−1 , am−1 .
It follows that the standard parts of the K mD· (·, s, ·, b) form the kernels of Dϕs (b)· ψ and the standard parts of the K m·D (·, s, ·, b) form the kernels of ϕ · Dψs (b). We obtain: Proposition 7.4.17 Suppose that ϕ, ψ ∈ L 2W ( L ) belong to finite chaos levels. Then D (ϕ · ψ)s = Dϕs · ψ + ϕ · Dψs in L 2 ( L ⊗ λ, H) . 2 ◦ In (◦ Fn ), ψ = ∞ ϕ = ∞ n=0 n=0 In ( G n ) ∈ L W ( L ). We set ϕm = mNow let m ◦ ◦ n=0 In ( Fn ), ψm = n=0 In ( G n ). Using Proposition 7.4.17 and Theorem 7.4.12 and the equivalence of Loeb and Lebesgue measure on Ln we obtain: Theorem 7.4.18 (Product Rule) Assume that E L (ϕm · ψm ) m∈N converges in R. (A) Suppose that the sequences (Dϕm · ψm )m∈N , (ϕm · Dψm )m∈N converge in L 2 ( L ⊗ λ, H). Then (D (ϕm · ψm ))m∈N converges in L 2 ( L ⊗ λ, H) and ϕ · ψ is Malliavin differentiable and (D (ϕ · ψ))s = (Dϕ)s · ψ + ϕ · (Dψ)s in L 2 ( L ⊗ λ, H) . (B) Suppose that (D (ϕm · ψm ))m∈N converges in L 2 ( L ⊗ λ, H). Then ϕ · ψ is Malliavin differentiable and we have L ⊗ λ-a.e.
7 Stochastic Analysis
287
(D (ϕ · ψ))s = (Dϕ)s · ψ + ϕ · (Dψ)s . The equation D ϕkm s = k · ϕk−1 m · D (ϕm )s implies Theorem 7.4.19 (Chain Rule) Fix g : Rn → R and Malliavin differentiable ϕ1 , . . . , ϕn . Assume that the partial derivatives of g exist and that there are polynomials q j in n variables with lim q j = g and lim ∂i q j = ∂i g for i = 1, . . . , n. (A) Fix m ∈ N. Suppose that D q j ϕ1,m , . . . , ϕn,m j∈N converges in L 2 ( L ⊗ λ, H) and E L q j ϕ1,m , . . . , ϕn,m j∈N converges in R. Then g ϕ1,m , . . . , ϕn,m is Malliavin differentiable and we have L ⊗ λ-a.e.,
n
D g ϕ1,m , . . . , ϕn,m s = (∂i g) ϕ1,m , . . . , ϕn,m · Dϕi,m s . i=1
(B) Assume that (A) m ∈ N, and g and ∂i g are continuous. More is true for all over, let D g ϕ1,m , . . . , ϕn,m m∈N , E L g ϕ1,m , . . . , ϕn,m m∈N converge in L 2 ( L ⊗ λ, H), in R, respectively. Then g (ϕ1 , . . . , ϕn ) is Malliavin differentiable and we have L ⊗ λ-a.e. (D(g (ϕ1 , . . . , ϕn )))s =
n
(∂i g) (ϕ1 , . . . , ϕn ) · (Dϕi )s .
i=1
For a more detailed proof we refer to the article [35] or the book [36]. Problems
(1) Compute the kernels of e I ( f )− 2 T f H dν L , according to Example 7.4.8 . (2) Find a recipe for the computation of the kernels of the decomposition of ϕ ∈ L 2W ⊗L1 (( ⊗ ν) L , H), according to Theorem 7.4.16. ◦ (3) Let ϕ = ∞ n=0 In,1 (Fn ) be Skorokhod integrable. Choose an internal extension (Fn )n∈∗ N0 of (Fn )n∈N0 . Prove that there exists an unlimited K ∈ ∗ N such that for each unlimited M ≤ K , M ∈ ∗ N, 1
:=
M
2
In,1 (Fn ) ∈ SL2 ( ⊗ ν, F) and ϕ in F ( ⊗ ν) L -a.e.
n=0
and δ : X →
s∈T
(X, s)(X s ) =
V
B(X ) =
M
n=0
In+1 (" Fn )(X ) ∈ SL2 ()
288
H. Osswald
and δ δϕ L -a.s. in R. We may also assume that all the Fn are symmetric in the first n variables.
7.5 Stochastic Integration for Symmetric Poisson Processes Stochastic integration will be developed now for one-dimensional symmetric Poisson processes. We will see that these processes are from a certain point of view slightly more subtle than one-dimensional Poisson processes or even one-dimensional Brownian motion. They can be used to give hints about stochastic integration for many other Lévy processes in [36] or [34]. The nonstandard approach to a large class of Lévy processes in [36] provides a setting where it is possible to orthogonalize the increments of a Lévy process and not only the process itself as in the literature (see for example the work of Nualart and Schoutens [30]). Here we study the special case of symmetric Poisson processes, where the notation in the preceding sections is used. However, we replace F with ∗ R. Recall the definition of the internal Borel probability measure π 1 on ∗ R in Example 2 -fold product of π 1 on ∗ RT . Note that now (1) in Sect. 7.1 and let π be the H ∗ T ∗ B : R × T → R, (X, t) −→ s∈Tt X s is an internal (Bt )t∈T -martingale for the measure π. The standard part Bπ : ∗ RT × [0, ∞[→ R of B with respect to the Loeb measure π L over π is a symmetric Poisson martingale. Recall from Corollary 6.4.11 that for π L -almost all X Bπ (X, r ) = ◦lim
s↓r
◦
B(x, s) for all r ∈ [0, ∞[
is well defined π L -a.s. and a càdlàg process. Notice that for all n ∈ N, H · Eπ1 x 2n = β and Eπ1 x 2n−1 = 0.
7.5.1 Orthogonal Increments In order to see how the symmetric Poisson processes are embedded into stochastic analysis for quite general Lévy processes, let us start with an arbitrary Borel probability measure μ1 on ∗ R. We assume that H · Eμ1 x n is limited for all n ∈ N, which implies that Eμ (Bt )n is limited for all limited t ∈ T and all n ∈ N. We also assume that H ·Eμ1 x 2 0, in order to obtain Lévy processes different from 0. Notice that π 1 is a special example. Recall that μ denotes the H 2 fold product of μ1 on ∗ RT . Starting from the sequence (x n )n∈N0 we use a slight modification of the Gram-Schmidt orthonormalization procedure, to construct orthogonal polynomials, which serve as increments of the stochastic integral and also of the iterated integral.
7 Stochastic Analysis
289
Set p˙ 0 (x) := 1. The number 0 is called an uncritical exponent. Define p˙ 1 (x) := x − Eμ1 x. Since H · Eμ1 x 2 0, we have H Eμ1 p˙ 12 0. The number 1 is called an uncritical exponent. Assume that p˙ j is defined for j ≤ n − 1. Let 0 = u 0 < · · · < u l ≤ n − 1 be the uncritical exponents below n. Define p˙ n (x) := x − n
l
Eμ1 x n · p˙ u i Eμ1 p˙ u2i
i=0
p˙ u i (x).
E 1 x n · p˙ u Since H Eμ1 x n · p˙ u i is limited and H Eμ1 p˙ u2i 0, we see that μE p˙ 2 i is μ1 u i limited. If H Eμ1 p˙ n2 0, then n is called critical, otherwise n is called uncritical. Set Nμ := {n ∈ N | n is uncritical} . Note that we do not put the uncritical number 0 into Nμ . We have Nμ = {1} in case of Poisson processes and one-dimensional Brownian motion (see Sect. 6.1). For symmetric Poisson processes we have Nμ = {1, 2}. Note that in this case β p˙ 1 (x) = x, p˙ 2 (x) = x 2 − Eπ1 x 2 = x 2 − . H The reader can find more examples in [36], in particular, there are examples with Nμ = N. Now we change the polynomials p˙ n slightly. Fix n ∈ Nμ . Define pn := (
1 H Eμ1 p˙ n2
p˙ n .
Note that, Eμ1 pn = 0 = Eμ pk (·t ) pn (·s ) if k = n or s = t, Eμ pn2 = and that for k ∈ N
1 , H
(+)
H Eμ1 pnk is limited.
(++)
For each k ∈ Nμ M k : ∗ RT × T → ∗ R, (X, t) →
pk (X s )
(+ + +)
s≤t
is a (Bt )t∈T -martingale. Fix r ∈ [0, ∞[ and t ∈ T with t r . Then we obtain: 2
Eμ M k (t, ·) = Eμ pk2 (X s ) = t r . s≤t
290
H. Osswald
In Sect. 6.2.6 we have seen that H1 can be seen as the Lebesgue unit volume. We will see that the preceding equality yields the continuity in measure of the integrals and makes them independent of k ∈ Nμ .
7.5.2 From Internal Random Walks to the Standard Poisson Integral
The integral will be defined as the orthogonal series k∈Nμ f k pk of stochastic
integrals f k pk . Recall that, in the cases of Brownian motion and Poisson processes, Nμ = {1}, so we have only one summand f p there with p = p1 . In general, the integrand f = ( f k )k∈Nμ is a square summable adapted process and the integral
f k pk is the standard part of a suitable internal random walk. We start with the internal integral with respect to the measure π. Fix k ∈ Nπ = {1, 2} here and a (Bt − )t∈T -adapted internal process F: ∗ RT ×T → ∗ R. Define
F(X, s) · pk (X s ). F pk (X, t) := s≤t,s∈T
By (+ + +), pk (X t ) = Mtk (X ). Recall that M0k = 0 by definition. Using the rules for the conditional expectation, the proof of the following lemma is straightforward and is left to the reader. Lemma 7.5.1 Assume that, in addition, F (·, s) ∈ L 2 (π) for all s ∈ T . Then for k ∈ Nπ , 2
1 Eπ F 2 (X, s) . F pk (·, t) = Eπ H s≤t Moreover,
F pk is a (Bt )t∈T -martingale.
In order to use this internal integral to define the standard integral f pk for (br )r ∈[0,∞[ -adapted integrands f : ∗ RT × [0, ∞[→ R locally in L 2 (π L ⊗ λ), we take a (Bt − )t∈T -adapted lifting F : ∗ RT × T → ∗ R of f locally in SL2 (π ⊗ ν), according to Theorem 7.2.10, now with π instead of , R instead of H, ∗ R instead of
that F pk (·, t) ∈ SL1 (π) for all limited t ∈ T . Then the k-th integral
F. It follows ∗ T f pk : R × [0, ∞[→ R of f is defined by setting for each r ∈ [0, ∞[, f pk (·, r ) := ◦ lim
t↓r,t∈T
◦
F · pk (·, t) π L -a.s.
By Theorem 6.4.10 and Corollary 6.4.11, the right hand side of the preceding equality exists and is a càdlàg (br )r ∈[0,∞[ -martingale. The following lemma and again Theorem 6.4.10 and Corollary 6.4.11 are the keys to Theorem 7.5.3, where the continuity in measure follows from Part (b).
7 Stochastic Analysis
291 ∗ RT
Lemma 7.5.2 Fix n ∈ N and a (Bt − )t∈T -adapted F: SL2 (π ⊗ ν) . If k ∈ Nπ = {1, 2}, then (a)
× T → ∗ R locally in
max F pk (·, t) ∈ SL2 (π) . t∈Tn
(b) Fix limited t t ∈ T . Then Eπ F pk (·, t) F pk (·, t) π L -a.s.
F pk (·, t) −
F pk (·, t)
2
0, thus,
Proof (a) By Theorem 6.4.7, it suffices to prove that the quadratic variation at n is in SL1 (π) . Note that *
)
(X ) =
F pk n
F pk
n
Ft2 (X ) pk2 (X t ).
t∈Tn
Fix m ∈ ∗ N. Define F ∧ m(t) = F(t) if |F(t)| ≤ m, Fm (t) := 0, otherwise. Using the limitedness of H Eπ1 pk4 and of F ∧ m for m ∈ N and the following equality, we see that X → t∈Tn (F ∧ m)2t pk2 (X t ) ∈ SL1 (π): ⎛ Eπ ⎝
⎞2
⎛
(F ∧ m)2t pk2 (X t )⎠ ≤ m 4 Eπ ⎝
t∈Tn
⎞2 pk2 (X t )⎠ is limited.
t∈Tn
It remains to prove that limm→∞ ◦ Eπ t∈Tn Ft2 − (F ∧ m)2t ) pk2 (X t ) = 0. Assume that this is not true. Then, by the Spillover Principle, there exists a standard ε > 0 and an unlimited number M ∈ ∗ N such that ε ≤ Eπ
t∈Tn
1
. Ft2 − (F ∧ M)2t pk2 (X t ) = Eπ Ft2 − (F ∧ M)2t H t∈Tn
∗ T 2 However, since F, F ∧ M restricted n , belong to SL (π ⊗ νn ) and thus to R ×T 2 1 2 F = F ∧ M (π ⊗ νn ) L -a.e., Eπ t∈Tn Ft − (F ∧ M)t H 0. This contra
F pk n ∈ SL1 (π) , thus maxt∈Tn F pk t ∈ SL2 (π). diction proves that (b) Note that for t < t ⎞2 ⎛ 2
Eπ Fs pk (X s )⎠ = F pk (·, t) − F pk (·, t) = Eπ ⎝ t<s≤t
Eπ
Fs2 pk2 (X s ) = Eπ
t<s≤t
since F is locally in SL2 (π ⊗ ν).
t<s≤t
Fs2
1 0, H
292
H. Osswald
Theorem 7.5.3 (a) The integral f pk is well defined π L -a.s., i.e., the definition does not depend on the chosen S-square integrable lifting of f , and is a càdlàg
2 (br )r ∈[0,∞[ -martingale with Eπ L sups∈[0,n] f pk (·, s) < ∞ for all n ∈ N. (b) Fix (br )r ∈[0,∞[ -adapted processes f, g: ∗ RT × [0, ∞[→ R locally in L 2 (π L ⊗λ) and k, l ∈ Nπ = {1, 2} . Then for all r ∈ [0, ∞[ Eπ L (c)
f pk (·, r )
∗ RT ×[0,r ] f · gdπ L ⊗ λ, g pl (·, r ) = 0,
if l = k . if l = k
· pk is a continuous operator on the space of (br )r ∈[0,∞[ -adapted processes
in L 2 (π L ⊗ λ). Moreover, f pk is continuous in measure.
In order to obtain the integral independent of k ∈ Nπ , we integrate processes g: Nπ × ∗ RT ×[0, ∞[→ R, locally in L 2 (c ⊗ π L ⊗ λ), i.e., g Nπ × ∗ RT ×[0, n] ∈ L 2 (c ⊗ π L ⊗ λ) for all n ∈ N, where c is the counting measure on Nπ = {1, 2}. For more general Lévy processes it may happen that g: N × ∗ RT × [0, ∞[→ R, thus c is a nontrivial counting measure on N (see the end of Sect. 6.2.6). It is necessary to assume that each gk , k ∈ Nπ , is (br )r ∈[0,∞[ -adapted. Define:
g p :=
gk pk : ∗ RT × [0, ∞[→ R.
k∈Nπ
By Theorem 7.5.3, this integral is an orthogonal series of square integrable
(br )r ∈[0,∞[ -martingales. The operator
· p is linear and continuous and g p is continuous in measure. This process g p is called the integral of g. It is simpler to introduce the integrals not as processes, but as random variables: Fix a (br )r ∈[0,∞[ -adapted process f : ∗ RT × [0, ∞[→ R in L 2 (π L ⊗ λ) and a (Bt − )t∈T -adapted lifting F: ∗ RT × T → ∗ R in SL2 (π ⊗ ν). Define
V
f pk (X ) :=
◦
F(s, X ) pk (X s ).
s∈T
If f ∈ L 2 (c ⊗ π L ⊗ λ) and f k is (bt )t∈[0,∞[ -adapted for all k ∈ Nπ = {1, 2}, set
V
f p :=
k∈Nπ
V
f k pk .
7 Stochastic Analysis
293
7.5.3 Iterated Integrals Let us start with iterated integrals with parameters as in Sect. 7.3. Fix an internal F: T n+m → ∗ R and k := (k1 , . . . , kn ) ∈ Nnπ . Define Ik,m (F): ∗ RT × T m → ∗ R, setting
Ik,m (F)(X, s) := Ft,s · pk1 (X t1 ) · · · · · pkn (X tn ). t∈T
The proof of the following lemma is straightforward: Lemma 7.5.4 For all k ∈ Nnπ , k ∈ Nnπ and internal F: T n+m → ∗ R, G: T n+m → ,
∗R
∗ RT ×T m
Ik,m (F) · Ik,m (G)dπ ⊗ ν m =
T
0 F · Gdν n+m
if k = k if k = k.
Theorem 7.5.5 Fix f ∈ L 2 λn+m and k ∈ Nnπ . Then f has a lifting F ∈ SL2 ν n+m such that Ik,m (F) ∈ SL2 (π ⊗ ν m) and ◦ Ik,m (F) exists in L 2 (π L ⊗ λm ), which means that there exists a function g ∈ L 2 (π L ⊗ λm ) such that Ik,m (F) is a lifting of g. Recall that f is denoted by ◦ F and g by ◦ Ik,m (F). Notice this result is equivalent to statements, which result by replacing the Lebesgue measurable sets in [0, ∞[ with the corresponding sets in L1 and π L ⊗ λ with (π ⊗ ν m ) L . Proof In order to save indices set n = m = 1. Let M be the set of all f ∈ L 2 λ2 such that the assertions of the theorem are true. Obviously, M is a linear space. In order to proof that M is complete, let ( f n ) be a Cauchy sequence in M such that the Fn are liftings of f n according to the theorem. Then, by Theorem 6.3.10, ◦
lim
n,m→∞
Since
∗ RT ×T
Ik,1 (Fn − Fm ) lim
n,m→∞
◦
2
T2
(Fn − Fm )2 dν 2 = 0.
dπ ⊗ ν =
∗ RT ×T
T2
(Fn − Fm )2 dν 2 ,
Ik,1 (Fn − Fm )
2
dπ ⊗ ν = 0.
Let limn→∞ f n = f ∈ L 2 λ2 and limn→∞ ◦ Ik,1 (Fn ) = g ∈ L 2 (π L ⊗ λ) and let f ∈ SL2 (ν 2 ) be a lifting of f and let g ∈ SL2 (π ⊗ ν) be a lifting of g (see Theorem 6.3.10). Since 2 2 Fn − f dν 2 = 0 = lim ◦ Ik,1 (Fn ) − g dπ ⊗ ν, lim ◦ n→∞
T2
we have for all standard ε > 0,
n→∞
FT ×T
294
H. Osswald
◦ 2
ν
Fi − f
2
2 2 1 1 Fi − f ≥ 1 ≤ ◦ ≥ ε = ◦ν2 Fi − f dν 2 →n→∞ 0. ε ε T2
2 In the same way, limi→∞ ◦ (π ⊗ ν) Ik,1 (Fi ) − g ≥ ε = 0. By saturation techniques again, there exists a strictly monotone increasing function h: N → N and an unlimited I ∈ ∗ N such that for F := FI and for all k ∈ N, 2 2 1 1 F − Fh(k) dν 2 < , Ik,1 F − Fh(k) dπ ⊗ ν < , 2 ∗ T k k T R ×T ν
2
F− f
2
1 ≥ k
2 1 1 1 < , π ⊗ ν Ik,1 (F) − g ≥ < . k k k
It follows that F ∈ SL2 (ν 2 ) is a lifting of f and Ik,1 (F) ∈ SL2 (π ⊗ ν) is a lifting of g. This proves the completeness of M. Now assume that A, B are bounded Lebesgue measurable sets in [0, ∞[ and let A, B be approximations of st −1 [A], st −1 [B], respectively. Then, 1 A×B is a lifting of 1 A×B in SL2 ν 2 . We may assume that A, B ⊆ Tσ2 for some σ ∈ N. Since
∗ RT ×T
Ik,1 (1 A×B )
4
⎛ dπ ⊗ ν ≤ σ · Eπ ⎝
⎞4 pk (X t )⎠ is limited,
t∈B
because H · Eπ1 ( pk (x))n is limited for all n ∈ N. Therefore, Ik,1 (1 A×B ) ∈ SL2 (π ⊗ ν) and is a lifting of the function g ∈ L 2L π (B)⊗Lebm (π L ⊗ λ), defined by ⎞ ⎛
g(X, r ) := ◦ ⎝ pk (X t )⎠ · 1 B (r ). t∈A
Since M is closed and contains all simple functions with measurable rectangles, M = L 2 λn+m (see the standard Proposition 5.6.1 in [36]). Now define for f ∈ L 2 λn+m and liftings F ∈ SL2 ν n+m of f and k ∈ Nπ Ik,m ( f ) := Ik,m (◦ F) :=
◦
Ik,m (F).
Note that Ik,m ( f ) is well defined π L ⊗ λm a.e. and belongs to L 2 (π L ⊗ λm ). Using S-square integrable liftings of f, g, we obtain Proposition 7.5.6 Fix f = ◦ F ∈ L 2 λn+m , g = ◦ G ∈ L 2 λn+m , k ∈ Nnπ and k ∈ Nnπ . Then
∗ RT ×[0,∞[
Ik,m ( f ) · Ik,m (g)dπ L ⊗ λm =
◦
T
0 F · Gdν n+m
if k = k if k = k.
7 Stochastic Analysis
295
The function Ik,m ( f ) with k ∈ Nnπ is called the iterated integral of f with m parameters of order k If m = 0, then we write Ik ( f ) instead of Ik,0 ( f ) and call Ik ( f ) the iterated integral of f of order k. If k ∈ Nπ , then I(k) ( f ) is called the Wiener-Lévy integral of f ∈ L 2 (λ) of order k. Now we define martingales from the iterated integral of order k. Fix n ∈ N, an internal F: T
IkM (F) : ∗ RT × T → ∗ R, (X, t) →
Fs · pk1 (X s1 ) · · · · · pkn (X sn ).
s∈T
If n = 0, set I∅M (F) := F for F ∈ ∗ R. It should be mentioned that IkM (F)(·, t) is a smooth function in the sense of the nonstandard model. From the following lemma it follows that IkM (F) can be converted to a martingale, defined on the continuous timeline [0, ∞[ if F is locally S-square integrable. The proof of the following lemma is straightforward. Lemma 7.5.7 Fix internal F: T
Eπ
M M I(◦) (F) (·, t) I(·) (G) (·, t)
=
s∈T
F(s) · G(s) H1n , if (◦) = (·) 0, if (◦) = (·) .
Theorem 7.5.8 Fix f locally in L 2 (λn ) and k := (k1 , . . . , kn ) ∈ Nnπ . (a) There exists a lifting F locally in SL2 (ν n ) of f such that the function
F : ∗ RT ×T (X, s) →
EBs − F(t, s) · pk1 (X t1 ) · · · · · pkn−1 (X tn−1 )
t∈T
is locally in SL2 (π ⊗ ν). Note that EBs − forces pl (X t ) = 0 for t ≥ s. (b) For all σ ∈ N max IkM (F) (·, ρ) ∈ SL2 (π) . ρ∈Tσ
Proof (a) Fix σ ∈ N. Let M be the set of all f ∈ L 2 (λn [0, σ]n ) with lifting F ∈ SL2 νσn such that F ∈ SL2 (π ⊗ νσ ). Of course, M is linear. To prove that M in M with limit f in L 2 (λn [0, σ]n ). is complete, let ( f k )k∈N be a Cauchy sequence 2 n There exist liftings Fk , F in SL νσ of f k , f such that Fk ∈ SL2 (π ⊗ νσ ) for each Fk . We obtain, 2
1 = F − Fk (·, s) · E H s∈Tσ
296
H. Osswald
E
⎛ ⎝
t∈T
s∈Tσ
(F − Fk )2 (t)
n , t∈Tσ,<
⎞2 1 = F − Fk )(t, s) pk1 (X t1 ) · . . . pkn−1 (X tn−1 ) ⎠ H
1 1 2 ≤ (F − F ) (t) ( f − f k )2 dλ2 →k→∞ 0. k n Hn H [0,σ] n t∈Tσ ,
It follows that σF := F ∈ SL2 (π ⊗ νσ ). This proves that M is complete. Now let B1 , . . . , Bn be Lebesgue measurable subsets of [0, σ] and let A1 , . . . , An ⊆ Tσ be internal approximations of st −1 [B1 ], . . . , st −1 [Bn ], respectively. Then 1 A1 ×···×An is a lifting of 1 B1 ×···×Bn in SL2 (ν n ). Note that 1 A1 ×···×An ∈ SL2 (π ⊗ ν).The proof that M = L 2 (λn [0, σ]n ) is finished. Let F be a lifting of f , locally in SL2 (v n ), such that σF ∈ SL2 (π ⊗ νσ ) for all σ ∈ N. Note that each σF is (Bs − )s∈Tσ -adapted. We may assume that for all σ ∈ N, F (a) for all a ∈∗ RT × T and that F (a) = 0 if a ∈ σF (a) = σ+1 / ∗ RT × Tσ . Let σ σ F F σ σ∈ ∗ N be an internal extensions of σ σ∈ ∗ N . Then there exists an unlimited N ∈ ∗ N such that the preceding two equalities are true for all σ ∈ ∗ N, σ ≤ N , F is (B − ) F F F and such that N s s∈TN -adapted. Set := N . Then is locally in SL2 (π ⊗ ν). (b) Let us use the notation in (a). Then
IkM (F)(X, σ) =
F(t) · pk1 (X t1 ) · · · · · pkn (X tn ) =
t∈T
s≤σ
⎛ ⎝
⎞
EBs − F(t, s) · pk1 (X t1 ) · · · · · pkn−1 (X tn−1 )⎠ pkn (X s ) =
t∈T
s → F (·, s) pkn (X, σ).
By Lemma 7.5.2, maxs∈Tσ IkM (F) (·, s) ∈ SL2 (π) . Fix f : [0, ∞[n → R locally in L 2 (λn ) and a lifting F locally in SL2 (ν n ) of f. Using Lemma 7.5.7 and Theorem 7.5.8, we can define the iterated integral process i kM ( f ) of order k = (k1 , . . . , kn ), by setting for r ∈ [0, ∞[ i kM ( f )(·, r ) := ◦ lim
t↓r,t∈T
◦ M Ik (F)(·, t)
π L -a.s..
(See Theorem 6.4.10 and Corollary 6.4.11.) In analogy to Theorem 7.5.3 we have: M(f) Theorem 7.5.9 Fix f : [0, ∞[n≤ → R locally in L 2 (λn ) and (◦) ∈ Nnπ . Then i (◦) is a π L -a.s. well defined càdlàg (bt )t∈[0,∞[ -martingale and for each r ∈ [0, ∞[,
7 Stochastic Analysis
297
M ( f )(·, s) belongs to L 2 (π ) . Moreover, i M is a continuous operator on sups≤r i (◦) L (◦) M ( f ) is continuous in measure. If g: [0, ∞[n → R locally in L 2 λn L 2 (λn ) and i (◦) ≤ and (·) ∈ Nnπ , then
Eπ L
n f · gdλn M M [0,r ]≤ i (◦) ( f ) (·, r ) i (·) (g) (·, r ) = 0
if (◦) = (·) if (◦) = (·) .
7.5.4 Multiple Integrals In order to obtain the integrals with and without parameters of k ∈ Nnπ , n independent n n+m 2 n+m : →R∈L c ⊗ ν we define for functions f : Nπ × T L In,m ( f ) :=
Ik,m ( f k ),
k∈Nnπ
and call it the multiple integral of f with m parameters. Obviously, In,m ( f ) is also defined for f : Nnπ × [0, ∞[n+m → R ∈ L 2 cn ⊗ λn+m . For m = 0 set In ( f ) := In,0 ( f ) and call In ( f ) the multiple integral of f . In analogy to Theorem 7.3.6 we obtain: Theorem 7.5.10 Fix f ∈ L 2Ln+m cn ⊗ ν n+m L . If F ∈ SL2 cn ⊗ ν n+m is a lifting of f, then In,m ( f ) = ◦ In,m (F) in L 2Ln+m (cn ⊗ (π ⊗ ν m ) L ) and thus cm ⊗ (π ⊗ ν m ) L -a.e., where In,m (F) : (X, s) →
F (k, t, s)
k∈ Nnπ t∈T
n +
pki (X ti ) ∈ SL2 π ⊗ ν m .
i=1
We see that In,m ( f ) is a real polynomial in several variables up to an infinitesimal error and a (π ⊗ ν m ) L -nullset. Remark 7.5.11 If Nμ is an infinite set, for some internal measure μ, thus external, then we have to replace k∈ Nnμ in the preceding display by a sum k∈ Mμn over an internal set Mμn , where Mμ is obtained as follows: there exists an internal extension ( pk )k∈ ∗ Nμ of ( pk )k∈Nμ and an unlimited M ∈ ∗ N such that (+) in Sect. 7.4.1 becomes true for all n, k ∈ ∗ Nμ ∩ M and all s, t ∈ T . Then we define Mμ :=
∗
Nμ ∩ M,
where we identify M with the set {1, . . . , M}.
298
H. Osswald
The process i nM ( f ): ∗ RT × [0, ∞[→ R, defined for f locally in L 2 (cn ⊗ λn ) with respect to λn by setting i nM ( f ) :=
i kM ( f k ),
k∈Nnπ
is called the multiple integral process of f . We obtain from Theorem 7.5.9. Theorem 7.5.12 Fix f : Nnπ × [0, ∞[n≤ → R locally in L 2 (cn ⊗ λn ) with respect to λn . Then i nM ( f ) is a π L -a.s. well defined càdlàg (br )r ∈[0,∞[ -martingale and for each r ∈ [0, ∞[, sups≤r i nM ( f )(·, s) belongs to L 2 (π L ). Moreover, i nM is a continuous operator and i nM ( f ) is continuous in measure. If g: Nnπ × [0, ∞[n≤ → R locally in L 2 cn ⊗ λn , then Eπ L
n n f · g dλn , k k∈Nπ [0,r ]≤ k M M i n ( f ) (·, r ) i n (g) (·, r ) = 0,
if n = n if n = n.
7.5.5 The σ-Algebra D generated by the Wiener-Lévy Integrals Recall that the σ-algebra W, generated by the infinite-dimensional Brownian motion, is also generated by the Wiener integrals I ( f ) with f ∈ L 2 (λ, H), where f is bounded and has compact support. Analogously, let D be the σ-subalgebra of the Loeb σ-algebra L π (B), generated by the Wiener-Lévy integrals I(k) ( f ) with k ∈ Nπ = {1, 2} here, f ∈ L 2 (λ), augmented by the π L -nullsets. Since I(k) is a linear and continuous operator, we may assume that its integrands f are simple functions. Following [34], we shall characterize square integrable Lévy random variables ϕ ∈ L 2D (π L ) by a sequence ( f n )n∈N0 of symmetric square summable deterministic functions f n: Nnπ × [0, ∞[n → R. Fix an internal F: Nnπ × T n → ∗ R. This F is called restricted by m if |F| ≤ m and if F(k, t) = 0 for all t ∈ / Tmn . For internal F1 , . . . , Fn: Nπ × T → ∗ R we define the tensor product F1 ⊗ · · · ⊗ Fn: Nnπ × T n → ∗ R by setting (F1 ⊗ · · · ⊗ Fn ) (k, t) := F1 (k1 , t1 ) · · · · · Fn (kn , tn ). It is clear how to define the notion “restricted” for f : Nnπ × [0, ∞[n → R and the tensor product of functions f 1 , . . . , f n: Nπ × [0, ∞[→ R. We have introduced the polynomials pk in order to be able to write any polynomial of a Lévy-Wiener integral as a linear combination of iterated integrals. Theorem 7.5.13 Fix a linear combination C of multiple integrals of the form I(k) ( f ) with restricted f and fix n ∈ N. Then the nth-power C n of C is in L 2 (π L ) a linear
7 Stochastic Analysis
299
combination of iterated integrals with integrands f 1 ⊗· · ·⊗ f m for certain restricted f i ∈ L 2 (λ). Proof We use induction over n. In the induction step we have to show that I(k1 ,...,km ) (g) · I(k) ( f ) has the desired property if the first factor has the desired property. We may assume that the support of f, g belong to [0, N ] with N ∈ N. Set R:= TN . There exist limited liftings F: R → ∗ R of f and G: R m → ∗ R of g. We obtain for π L -almost all X : I(k1 ,...,km ) (g) · I(k) ( f )(X )
G(t)
m t∈R<
m +
pki (X ti ) ·
i=1
F(t) pk (X t ) = A +
t∈R
m
Bi ,
i=1
where A(X ) = m+1
+ + G (t j ) j=i, j≤m+1 F(ti ) · pk j (X t j ) pk j (X t j+1 ) pk (X ti )
m+1 i=1 t∈R<
j
and Bi (X ) =
G(t)F(ti ) ·
+
i≤ j≤m
pk j (X t j ) · pk X ti pki X ti .
j=i
m t∈R<
The standard part of A has the desired property. Let us study Bi . Here pki · pk is a linear combination αki +k p˙ ki +k + · · · + α0 p˙ 0 of polynomials p˙l , such that all the αl are limited and H α0 is limited. It follows that each summand of Bi has the form Dl = αl
G(t)F(ti ) ·
+
pk j (X t j ) p˙l (X ti ),
j=i
m t∈R<
whose standard part has the desired property if l ∈ Nπ . If l ∈ / Nπ , l = 0, then ◦ Dl = 0 in L 2 (π L ), because H Eπ1 p˙l2 0. It remains for us to study the case l = 0. There exists a limited number a0 with a0 = H · α0 . Then ⎛ ⎞
1 ⎝ G(t1 , . . . , ti−1 , r, ti , . . . , tm−1 )F(r ) ⎠ · D0 = a0 H m−1 t
i−1
i
pk1 (X t1 ) · · · · · pki−1 (X ti−1 ) · pki+1 (X ti ) · · · · · pkm (X tm−1 ).
300
H. Osswald
We may assume that G = G 1 ⊗ · · · ⊗ G i−1 ⊗ J ⊗ G i ⊗ · · · ⊗ G m−1 , where all the G i and also J are 1-ary restricted functions, defined on R. Thus, D0 has the following form for suitable l j : D0 = a0
⎛ ⎝
ti−1
m−1 t∈R<
with α = a0
⎞ m−1 1⎠+ J (r )F(r ) G j (t j ) pl j (X t j ) = α − β H
j=1
J · F(r )
m−1 r
β = a0
m−1 r ≤ti−1 t∈R<
m−1 1 + G j (t j ) pl j (X t j ), H j=1
J · F(r )
m−1 1 + G j (t j ) pl j (X t j ). H j=1
Therefore, the standard part of D0 has the desired property. In order to prove that iterated integrals are D -measurable, we use a slightly modified notion of multiple integral. Fix f ∈ L 2 (λn ) and a lifting F ∈ SL2 (ν n ) of f. We define for k = (k1 , . . . , kn ) ∈ Nnπ Jk (F) :=
F(t) pk1 (X t1 ) · · · · · pkn (X tn ), J(·) ( f ) := ◦ J(·) (F).
t∈T=n
The random variable J(·) ( f ) is well defined and belongs to L 2 (π L ). Moreover, J(·) is a continuous operator and J(·) (F) ∈ SL2 (π). If functions in L 2 (λm ) are again identified with their equivalent functions in L 2Lm ((ν m ) L ), we obtain Theorem 7.5.14 Fix k ∈ {1, 2}n . (a) If g ∈ L 2 (λn ), then Ik (g) ∈ L 2D (π L ). (b) If g ∈ L 2 λn+m , then Ik,m (g) ∈ L 2D⊗Lm ((π ⊗ ν m ) L ). It follows that there is an f ∈ L 2D (π L ⊗ λm ), which is equivalent to Ik,m (g). Let us identify f and Ik,m (g). Proof (a) Set f := 1[0,∞[≤ · g. It suffices to prove by induction on n, that J(k1 ,...,kn ) ( f ) is D-measurable. Assume that the result is true for all m < n. Since J(k1 ,...,kn ) is linear and continuous, it suffices to prove the result for functions f,
7 Stochastic Analysis
301
which are tensor products f 1 ⊗ · · · ⊗ f n of restricted functions f 1 , . . . , f n ∈ L 2 (λ) . Let F1 , . . . , Fn be restricted liftings of f 1 , . . . , f n . We obtain J(k1 ,...,kn−1 ) ( f 1 ⊗ · · · ⊗ f n−1 ) · J(kn ) ( f n ) = J(k1 ,...,kn ) ( f 1 ⊗ · · · ⊗ f n ) + B n−1
with B =
i=1
Bi =
◦
Bi and ⎛
n−1 +
⎝
t∈T=n−1
⎞ F j (t j ) pk j (X t j )⎠ Fi (ti )Fn (ti ) pki (X ti ) pkn (X ti ).
j=1, j=i
According to the proof of Theorem 7.5.13, we see that Bi is a finite linear combination and standard parts of functions of of functions of the form J(◦) (g) with (◦) ∈ Nn−1 π the form ⎞ ⎛ n−1
+ 1 ⎝ Ci = F j (t j ) pk j (X t j )⎠ Fn (ti )Fi (ti ) = H n−1 t∈T=
⎛
⎝
t∈T=n−2
s=t1 ,...,tn−2
t∈T=n−2
s∈T
j=1, j=i
⎞ i−1 n−1 + 1 ⎠+ Fi (s)Fn (s) F j (t j ) pk j (X t j ) F j (t j−1 ) pk j (X t j−1 ) H
1 Fi (s)Fn (s) H
j=1
i−1 + j=1
F j (t j ) pk j (X t j )
j=i+1
n−1 +
F j (t j−1 ) pk j (X t j−1 ).
j=i+1
By the induction hypothesis, Bi and thus B is D-measurable. It follows that J(k1 ,...,kn ) ( f 1 ⊗ · · · ⊗ f n ) is D-measurable. (b) Let M be the set of all g ∈ L 2 λn+m such that Ik,m (g) is D ⊗ Lebm measurable. Then M is linear and complete. Since, by (a), Ik,m (1 A×B ) = Ik (1 A ) ⊗ n all 1 B is D ⊗ Lebm -measurable for Lebesgue measurable A ⊆ [0, ∞[ and B ⊆ m, 2 n+m . [0, ∞[ , we have M = L λ Problems (1) Assume that H · Eμ1 x n is limited for all n ∈ N. Prove that Eμ B n (·, t) is limited for all limited t ∈ T and all n ∈ N. (2) Prove Lemma 7.5.1. (3) Let l and a be the probability measures for Loeb’s Poisson process and Anderson’s Brownian motion, respectively, defined in Sect. 6.1. Prove that Na = {1} = Nl . (4) Prove that Nπ = {1, 2}.
302
H. Osswald
7.6 Malliavin Calculus for Poisson Processes In this final section on Stochastic Analysis we present some ideas for the Malliavin calculus for symmetric Poisson processes. For more general Lévy processes we refer to Chap. 23 in [36]. Again the so called chaos decomposition of square integrable random variables under the Poisson measure will be applied. Recall that this means, square integrable random variables can be characterized by a sequence of deterministic functions. Recall the notation for symmetric Poisson processes in Sect. 7.1. The standard theory of Malliavin calculus for Lévy processes with applications can be found, for example, in the book of Di Nunno, Øksendal and Proske [13] and in the articles [10–12].
7.6.1 Chaos Here is the first version of the Chaos Representation Theorem for Lévy processes, when we replace π by suitable internal probability measures (see [36]). Proposition 7.6.1 Fix ϕ ∈ L 2D (π L ), i.e., ϕ ∈ L 2 (π L ) and ϕ is D-measurable. Fix a sequence (Hn )n∈N0 of Hn with the following closure properties. (W 1) H0 = R. (W 2) H1 = {F: T → ∗ R ∈ SL2 (ν) | ◦ F exists in L 2 (λ) = L 2L1 (π L )}. (W 3) Hn is a linear space over R. (W 4) Hn ⊆ {F: T
I(k1 ,...,kn ) (◦ F(k1 ,...,kn ) ) converges in L 2 (π L ) .
n∈N0 (k1 ,...,kn )∈Nnπ
Proof (a) Let M be the set of all ϕ ∈ L 2D (π L ) having such a decomposition. Then M is a linear subspace of L 2D (π L ). Condition (W 5), Theorem 7.5.3, the pairwise orthogonality of the Ik ( f ) tell us that M is closed. In order to prove that M = L 2D (π L ), fix ϕ ∈ L 2D (π L ) with ϕ ⊥ M. It suffices to prove ϕ = 0 in L 2D (π L ) . Fix a finite sum C = mj=1 I(l j ) ( f j ), with restricted functions f j ∈ L 2 (λ) and l j ∈ Nπ . By Theorem 7.5.13 and Condition (W 6) we have for all , polynomials p: p(C) is a linear combination of iterated integrals with kernels in n∈N0 ◦ Hn , thus ϕ ⊥ p(C)
7 Stochastic Analysis
303
− + for all polynomials p. Let π + L , π L be the measures with density ϕ (the positive part − − of ϕ), ϕ (the negative part of ϕ), respectively. We have to prove that π + L = πL . Since D is generated by the π L -integrals I(k) ( f ), it suffices to prove that
∗ RT
Set J:=
m
j=1 I(l j ) f j .
∗ RT
e
e
i·J
dπ + L
=
i·
m
j=1 I(l j ) f j
dπ + L
=
∗ RT
e
i·
m
j=1 I(l j ) f j
dπ − L.
Then
∞
n=0
∞
in · Jn + in · Jn − Eπ L Eπ L ϕ = ϕ = n! n! n=0
∗ RT
ei·J dπ − L.
The proof of Part (a) is finished. In order to obtain the Malliavin derivative and the Skorokhod integral, we use a slight modification of Proposition 7.6.1. Theorem 7.6.2 Fix ϕ ∈ L 2D (π L ) and a sequence (Hn )n∈N0 with the properties (W 1),…,(W 6) in Proposition 7.6.1. The assertion is: (I) There exists a sequence (Fn )n∈N0 of internal symmetric functions Fn : Nnπ × n T= → ∗ R with the following properties: (a) Fn ∈ SL2 (cn ⊗ ν n ) , ◦ Fn exists and belongs to L 2 (cn ⊗ λn ) . (k1 , . . . , kn ) ∈ Nnπ = {1, 2}n , (b) 1T
◦
In (Fn ) with In (Fn ) =
k∈Nnπ t∈T
Fn (k, t)
n +
pki (X ti )
i=1
(see Theorem 7.5.10).
(II) If gn : Nnπ × [0, ∞[n → R is symmetric and ϕ = ∞ n=0 In (gn ), then gn = f n in L 2 (cn ⊗ λn ). Proof We have ϕ = n∈N0 k∈Nnπ Ik ( f n (k, ·), according to Proposition 7.6.1, n n where f n: Nπ × [0, ∞[≤ → R with f n (k, ·) := f k = ◦ Fk =: ◦ Fn (k, ·) for a certain F(k, ·) ∈ Hn . Since k∈Nnπ Ik ( f n (k, ·)) converges in L 2 (π L ), we may assume that Fn ∈ SL2 (cn ⊗ ν n ). We convert Fn to a symmetric function Fn : Nnπ × T=n → ∗ R by setting Fn (k1 , . . . , kn , t1 , . . . , tn ) := Fn (kσ1 , . . . , kσn , tσ1 , . . . , tσn ) if tσ1 < · · · < tσn . This proves Part (I). Part (II) follows from the orthogonality of the multiple integrals.
304
H. Osswald
← → Let F denote the symmetric extension of F ∈ Hn . See Sect. 7.4.1. The following example of a sequence fulfilling the properties (W 1),…,(W 6) is the key for the Clark-Ocone formula. The proof is similar to the proof of the assertion in Example 7.4.5. Example 7.6.3 Fix a sequence (Hn )n∈N0 fulfilling (W 1),…,(W 6). Let G0 := R we have in and for n > 0 let Gn be the set of all F ∈ Hn such that for all k ∈ Nn−1 π L 2D⊗L ((π ⊗ ν) L ), ◦
← → Ik,1 (F) = r → Eb◦ r ◦ Ik,1 ( F ) (·, r ) .
← → Then (Gn )n∈N0 fulfills (W 1),…,(W 6). It follows that Eb◦ r ◦ Ik,1 ( F ) (·, r ) is Dmeasurable for ν L -almost all r . Moreover, Ik,1 (F) is a (Bt − )t∈T -adapted lifting in ← → SL2 (π ⊗ ν) of r → Ebr ◦ Ik,1 ( F ) (·, r ). Proof Obviously, (W 1),…,(W 4) are true. To prove (W 5), fix a Cauchy sequence (◦ F m )m∈N with F m ∈ Gn . There exists an F ∈ Hn with limm→∞ ◦ F m = ◦ F in L 2 ((ν n ) L ). Then lim
m→∞
◦ ∗ RT ×T
Ik,1 (F m ) − ◦ Ik,1 (F)
2
d (π ⊗ ν) L = 0
and, by Jensen’s inequality, lim
m→∞
∗ RT ×T
2 ← → ← → Eb◦ r ◦ Ik,1 ( F m )(·, r ) − ◦ Ik,1 ( F ) (·, r ) d (π ⊗ ν) L (·, r ) = 0.
← → Together with ◦ Ik,1 (F m ) = Eb◦ r ◦ Ik,1 ( F m )(·, r ), this equality is also true for F m instead of F . To prove (W 6), fix restricted F1 , . . . , Fn ∈ G1 and set F := 1T
← → ◦ Fi (r ) · Eb◦ r ◦ βi (·, r ) Eb◦ r ◦ Ik,1 ( F )(·, r ) = n
i=1
with βi (X, r ) := t1 <···
7 Stochastic Analysis
305
The functions ◦ Fn are called the kernels of ϕ. By the following recipe (see [35]) we can compute the kernels of the chaos decomposition, corresponding to Theorem 7.4.7, the proof here is similar to the proof there: Theorem 7.6.4 Fix ϕ ∈ L 2D (π L ) and a lifting ∈ SL2 (π) of ϕ. Define for all n ∈ ∗ N0 , l ∈ Nnπ and s ∈ T
n∈N0
◦
In (n ) =
In
◦
n .
n∈N0
Proof Let ϕ = n∈N0 ◦ In (Fn ) , according to Theorem 7.6.2. Similar to the proof of Theorem 7.4.6, ϕ has a lifting of the form := n≤M In (Fn ) ∈ SL2 (π), where M ∈ ∗ N. Set ⎧ ⎫ ⎨ ⎬ U := In (G n ) | G n : Nnπ × T
Since U is an internally closed subspace of L 2 (π), = + n≤M In (G n ) with ∈ U ⊥ and n≤M In (G n ) ∈ U . Since X → pl1 (X s1 ) · · · · · pln (X sn ) ∈ U for n ≤ M, l1 , . . . , ln ∈ Nnπ , s1 < · · · < sn , we obtain Eπ ( · pl1 (X s1 ) · · · · · pln (X sn )) = G n (l1 , . . . , ln , s1 , . . . , sn ) ·
1 . Hn
This proves that G n = n for n ≤ M. Now continue as in the proof of Theorem 7.4.7.
7.6.2 Malliavin Derivative ∞ Let ϕ ∈ L 2D (π L ) with decomposition ϕ = ◦ n=0 In ( f n ), according to Theorem 7.6.2. We may assume that In ( f n ) = k∈Nnπ Ik Fn (k, ·) , with 1T
∞
n=1
In−1 ( f n (·, l, ·, t))(X ) =
∞
n=1 k∈Nπn−1
Ik,1 ( f n (k, l, ·))(X, t),
306
H. Osswald
if this series converges in L 2 (c ⊗ (π ⊗ ν) L ). The Malliavin D is defined derivative √ n I ( f on a dense subspace of L 2D (π L ) , because Dϕ exists iff ∞ n n ) converges n=1 in L 2 (π L ) . Then ϕ is called Malliavin differentiable. By Theorem 7.5.14, we can convert this derivative Dϕ to an equivalent process in L 2 (c ⊗ π L ⊗ λ) (see Corollary 6.3.7). There exists a D ⊗ Leb[0, ∞[-measurable function D st ϕ : Nπ × ∗ RT × [0, ∞[→ R equivalent to Dϕ, i.e., D st ϕ (k, X,◦ t) = Dϕ(k, X, t) for all k ∈ Nnπ and (π ⊗ ν) L -almost all (X, t) ∈ ∗ RT × T . The densely defined operator D st from L 2D (π L ) to L 2D⊗Leb (c ⊗ π L ⊗ λ) is called the standard Malliavin derivative. Let us identify Dϕ and D st ϕ. Using the Spillover Principle, we obtain the following lifting result for Malliavin differentiable functions. 2 ◦ Theorem 7.6.5 Suppose that ϕ = ∞ n=0 In (Fn ) in L D (π L ), according to Theorem 2 7.6.2. Then ϕ has a lifting ∈ SL (π) of the form :=
M
In (Fn ),
n=0
where M is any unlimited number below some unlimited M∞ ∈ ∗ N. If ϕ is Malliavin differentiable, we may assume that D : (k, X, t) →
M
In−1,1 (Fn )(·, k, ·, t) (X )) ∈ SL2 (c ⊗ π ⊗ ν),
n=1
and
D Dϕ (c ⊗ π ⊗ ν) L -a.e.
7.6.3 Exchange of Derivative and Limit The following commutation rule is similar to the commutation rule Theorem 7.4.12 for the Malliavin derivative for Brownian motion: Proposition 7.6.6 Suppose that ϕi is a sequence of Malliavin differentiable func i tions such that Dϕ converges in L 2L⊗D (c ⊗ (π ⊗ ν) L ) and suppose that (Eπ L ϕi ) converges in the real numbers. Then ϕi converges to a Malliavin differentiable function and D( lim ϕi ) = lim Dϕi in L 2L⊗D (c ⊗ (π ⊗ ν) L ). i→∞
Proof Let ϕi =
∞
n=0 In
i→∞
f ni . By the assumption,
2 0 = lim Dϕi − Dϕ j i, j→∞
c⊗(π⊗ν) L
=
7 Stochastic Analysis
307 ∞
lim
i, j→∞
T
n=1 k∈Nnπ
j
f ni (k, ·) − f n (k, ·)
2
dν Ln .
Since the f ni are symmetric, f ni i∈N is a Cauchy sequence in L 2Ln (cn ⊗ ν Ln ) for all n ∈ N. Let limi→∞ f ni = f n in L 2Ln (cn ⊗ ν Ln ). Then 2
f ni (k, ·) − f n (k, ·) dν Ln = 0. lim i→∞
k∈Nnπ
T
It follows that in L 2D (π L ), L 2L⊗D (c ⊗ (π ⊗ ν) L ) respectively, lim In ( f ni ) = In ( f n ) and lim D In ( f ni ) = D In ( f n ).
i→∞
i→∞
j Since this is true and ϕi and Dϕi are Cauchy sequences and In ( f ni ) ⊥ Im ( f m ), j D In ( f ni ) ⊥ D Im ( f m ) for n = m, the following limits exist lim
i→∞
lim D
i→∞
∞
n=0
∞
In ( f ni ) =
n=1
∞
In ( f n ) in L 2D (π L ) ,
n=1
In ( f ni ) = lim
∞
i→∞
D In ( f ni ) =
n=0
∞
D In ( f n ) = D
n=1
∞
In ( f n )
n=0
in L 2L⊗D (c ⊗ (π ⊗ ν) L ). Define ϕ :=
∞
In ( f n ) + lim Eπ L ϕi .
n=1
i→∞
Then we have limi→∞ ϕi = ϕ and limi→∞ Dϕi = Dϕ.
7.6.4 The Clark-Ocone Formula The Clark-Ocone formula has great importance in finance development, driven by Lévy processes (see Aase et al. [1]). It is an Itô integral representation of Lévy functionals. 2 Theorem ∞ ◦ 7.6.7 Fix a Lévy functional ϕ ∈ L D (π L ) with decomposition ϕ = Eπ L ϕ+ n=1 In (Fn ), according to Theorem 7.6.2. By Example 7.6.3, we may assume that 1T
308
H. Osswald
V
ϕ = Eπ L ϕ +
(l, ·, r ) → Ebr
∞
In−1 (◦ Fn (·, l, ·, r )) p.
n=1
If ϕ is Malliavin differentiable, we obtain
V
ϕ(X ) = Eπ L ϕ +
(l, ·, r ) → Ebr Dϕ(l, ·, r ) p.
Since Ebr In−1 (◦ Fn (·, l, ·, r )) is D-measurable, we may replace br with br ∩ D.
Proof Since · pl is a bounded linear operator, we see that ϕ − Eπ L (ϕ) = ∞
◦
n=1 k∈Nnπ ∞
◦
l∈Nπ n=1 k∈Nπn−1
Fn (k, t) pk1 (X t1 ) . . . pkn (X tn ) =
t∈T
Fn (k, l, t, s) pk1 (X t1 ) . . . pkn−1 (X tn−1 ) pl (X s ) =
s∈T t∈T
∞
◦
V
l∈Nπ n=1 k∈Nπn−1 ∞
V
s → Ik,1 (1T
r → Ebr Ik,1
◦
Fn (k, l, ◦) (·, r ) pl =
l∈Nπ n=1 k∈Nπn−1
V
br
r → E
l∈Nπ
∞
Ik,1
◦
Fn (k, l, ◦) (·, r ) pl =
n=1 k∈Nπn−1
V
r → Ebr
l∈Nπ
V
∞
◦
In−1 (Fn (·, l, ·, r )) pl =
n=1
(l, ·, r ) → Ebr
∞
In−1
◦
Fn (·, l, ·, r ) p =
n=1
in case ϕ is differentiable,
V
(l, ·, r ) → Ebr Dϕ(l, ·, r ) p
7 Stochastic Analysis
309
7.6.5 The Skorokhod Integral In order to define the Skorokhod integral, we need a suitable decomposition of functionals in L 2D⊗Leb (c ⊗ π L ⊗ λ). We also use the following common notation. × T=n+1 → ∗ R is internal and symmetric in the first n Suppose that F : Nn+1 π
components. Then F: Nn+1 × T=n+1 → ∗ R is the symmetric function, derived from π F, by setting for t = (t1 , . . . , tn+1 ) and k = (k1 , . . . , kn+1 ), F(k, t) :=
n+1
F(k1 ,...,ki−1 ,ki+1 ,...,kn+1 ,ki ) (t1 , . . . , ti−1 , ti+1 , . . . , tn+1 , ti ).
i=1
Recall the definition: 1Tβ ·n+1 F (k, t) := 1Tβ (tn+1 ) · F(k, t) with β ∈ T. We may assume that the domain of F(k, ·) is T=n+1 . Then
In,1 (1Tβ ·n+1 F) p :=
l∈Nπ r ≤β
⎛ ⎝
β 1 H
In,1 (F) p := ⎞
F(k, l, t, r ) pk1 (X t1 ) · · · · · pkn (X tn )⎠ pl (X r ) =
k∈Nnπ t∈T
In+1 1Tβ ·n+1 F . Theorem 7.6.8 Fix ϕ ∈ L 2D⊗Leb (c ⊗ π L ⊗ λ). Then there exists a sequence × T=n+1 → ∗ R such that ◦ Fn exists in (Fn )n∈N0 of internal functions Fn : Nn+1 π n+1 n+1 2 ⊗ ν L c with the following properties: L (a) Fn ∈ SL2 cn+1 ⊗ ν n+1 and Fn is symmetric in the first n arguments. (b) In+1 (◦ " Fn ) ∈ L 2D (π L ) . ∈ SL2 (π ⊗ ν) and s → ◦ In+1 1Ts ·n+1 F ·n+1 F is in (c) s → In+1 1Ts n
n
). D⊗L ((π ⊗ ν) L 2 ◦ (d) ϕ (m, X,◦ s) = ∞ n=0 In (Fn (·, m, ·, s)) (X ) in L D ⊗Leb ( (c ⊗ π ⊗ ν) L ). L2
Proof Let M be the set of all functions ϕ in L 2D⊗Leb (c ⊗ π L ⊗ λ) having such a decomposition. Using Corollary 6.3.11, one can see that M is a complete linear space. See the detailed proof of Theorem 7.5.5. Thus, it suffices to prove the result σ ∈ N and for ϕ = 1{l} ⊗ 1 B ⊗ 1C , where l ∈ Nπ , B ∈ D and C ⊂ [0, σ] for some ◦ Lebesgue measurable. Now 1 B has a decomposition 1 B = ∞ n=0 In ( G n ) according
310
H. Osswald
to Theorem 7.6.2. For C there exists an internal A ⊆ T such that ν L (Ast −1 [C]) = n+1 0. Define for k ∈ Nn+1 π , t ∈ T= Fn (k, t) := 1{l} (kn+1 ) · G n (k1 , . . . , kn , t1 , . . . , tn ) · 1 A (tn+1 ) Note that (a),…,(d) are true. The proof of the following simpler result is similar to the proof of the preceding theorem and is left to the reader. Theorem 7.6.9 Fix ϕ ∈ L 2D⊗Leb (π L ⊗ λ). Then there exists a sequence (Fn )n∈N0 of internal functions Fn : Nnπ × T=n+1 → ∗ R such that ◦ Fn exists in L 2 cn ⊗ λn+1 with the following properties: (a) Fn ∈ SL2 cn ⊗ ν n+1 and Fn is symmetric in the first n arguments. (b) (X, s) → In (Fn (·, s)) (X ) ∈ SL2 (π ⊗ ν), where In (Fn (·, s)) (X ) =
k∈Nnπ
(c) ϕ (X,◦ s) =
∞
n=0
◦I
n
Fn (k, t, s)
t∈T
k +
pki (X ti )
i=1
(Fn (·, s)) (X ) in L 2D⊗Leb ((π ⊗ ν) L ).
Using Theorem 7.6.8, we are able to define the Skorokhod integral δ as a densely defined operator from L 2D⊗Leb (c ⊗ π L ⊗ λ) into L 2D (π L ). Suppose that ϕ has the decomposition according to Theorem 7.6.8. Set δϕ :=
∞
◦
In+1 Fn
n=0
for those ϕ ∈ L 2D⊗Leb (c⊗λ⊗π L ) such that δϕ converges in L 2D (π L ), in which case, ϕ is called Skorokhod integrable. Note that the Skorokhod integral is an extension of the Itô integral See Problem (3) below.
7.6.6 Smooth Representations Using chaos decompositions, we can find S-square integrable internally smooth liftings of square integrable random variables or processes. In analogy to Theorems 7.4.6 and 7.4.14 we have: Theorem 7.6.10 Suppose that, according to Theorems 7.6.2, 7.6.8, ϕ ∈ L 2D (π L ), ψ ∈ L 2D⊗L ( (c ⊗ π ⊗ ν) L ) have expansions
7 Stochastic Analysis
ϕ=
∞
311 ◦
In (Fn ), ψ (k, X, t) =
n=0
∞
◦
In (G n (·, k, ·, t)) (X ).
n=0
Let (Fn )n∈ ∗ N0 , (G n )n∈ ∗ N0 be internal extension of (Fn )n∈N0 , (G n )n∈N0 , respectively, such that Fn is symmetric, G n is symmetric in the first n arguments. Then there exists an unlimited K ∞ ∈ ∗ N such that for all unlimited K ∈ ∗ N with K ≤ K∞ K (a) K := n=0 In (Fn ) belongs to SL2 (π) and is a lifting of ϕ. If ϕ is Malliavin differentiable, then we may choose K ∞ such that D K : (k, X, t) →
K
◦
In−1 (Fn (·, k, ·, t)(X ) ∈ SL2 (c ⊗ π ⊗ ν)
n=1
and is a lifting of Dϕ. K In (G n (·, l.·, r ))(X ) belongs to SL2 (c ⊗ π ⊗ ν) and is (b) K : (l, X, r ) → n=0 a lifting of ψ. If ψ is Skorokhod integrable, then we may choose K ∞ such that δ ( K ) : X →
K
"n )(X ) ∈ SL2 (π) In+1 (G
n=0
and is a lifting of δψ (c) Moreover, we obtain for ψ ∈ L 2D⊗L ((π ⊗ ν) L ), according to Theorem 7.6.9: ψ has a lifting ∈ SL2 (π ⊗ ν) of the form (X, r ) =
K
In (G n (·, r ))(X ).
n=0
Using the lifting results it is straightforward to prove that δ is the adjoint operator of the Malliavin derivative in the following sense: Theorem 7.6.11 Let ϕ be Skorokhod integrable and ψ be Malliavin differentiable. Then ϕ, Dψc⊗π L ⊗λ = δϕ, ψπ L .
7.6.7 The Product Rule Following [35], product and chain rules for symmetric processes will be studied now in a quite constructive way, using somehow technical but elementary finite combinatorics. I would recommend the reader, to study first the main lemma (Lemma 7.6.13) in the case m = 3.
312
H. Osswald
Both rules are more complicated than the corresponding rules for Brownian motion. The reasons are: In case of symmetric Poisson processes Nπ is different from {1} and the multiple integrals are only square integrable, in general. It follows that, in contrast to Brownian motion, the product of two Lévy functionals even in finite chaos levels is not necessarily square integrable. The most challenging difference however is the fact that H Eπ1 pi p j pk may be different from 0. This is the case iff the number 2 appears exactly once or three times under i, j, k. In case of Brownian H Ea 1 p13 = H Ea 1 x 3 = 0, where a is defined in Sect. 6.1. In order to overcome the difficulties, caused by the loss of moments than larger ∞ I ( fn ) 2 for the multiple integrals, we use restricted functions. Fix ϕ = n n=0 according to Theorem 7.6.2. For each m ∈ N define ϕm := ϕ m :=
In ( f n m),
n∈N0
where ( f n m) (k, r ) := f n (k, r ) if n ≤ m, | f n (k, r )| ≤ m and r ≤ m. Otherwise, ( f n m) (k, r ) := 0. If ϕ = ϕm or f = f m, then we say that ϕ, f , respectively, are restricted by m. Restricted ϕ ∈ L 2D (π L ) are Malliavin differentiable. Fix ϕ ∈ L 2D (π L ). Since In is a continuous operator and the (In ( f n ))n∈N are pairwise orthogonal, we have lim ϕm = ϕ in L 2 (π L ), and, if ϕ is Malliavin differentiable, then lim Dϕm = Dϕ in L 2 (c ⊗ π L ⊗ λ). The following terms are crucial: Set for l, κ, κ ∈ Nπ σ(κ, κ, l) := H Eπ1 pκ · pκ · pl and α(κ, κ, l) := ◦ σ(κ, κ, l). Theorem 7.6.12 (Product Rule) Fix Malliavin differentiable ϕ, ψ ∈ L 2D (π L ) such that Eπ L (ϕm · ψm ) m∈N converges in R. (A) Suppose that the sequences (Dϕm · ψm )m∈N , (ϕm · Dψm )m∈N converge in L 2 (c ⊗π L ⊗ λ). Then (D (ϕm · ψm ))m∈N converges in L 2 (c ⊗ π L⊗λ) iff ((l, r, X ) → κ,κ∈N. π α(κ, κ, l) · D (ϕm )κ,r (X ) · D (ψm )κ,r (X ) m∈N converges in L 2 (c ⊗ π L ⊗ λ), in which case ϕ · ψ is Malliavin differentiable and (D (ϕ · ψ))(l,r ) = (Dϕ)(l,r ) · ψ + ϕ · (Dψ)(l,r ) +
α(κ, κ, l) · D (ϕ)κ,r · D (ψ)κ,r
κ,κ∈Nπ
in L 2 (c ⊗ π L ⊗ λ). (B) Suppose that (D (ϕm · ψm ))m∈N converges in L 2 (c ⊗ π L ⊗ λ). Then ϕ · ψ is Malliavin differentiable and for all l ∈ Nπ we have π L ⊗ λ-a.e. (D (ϕ · ψ))l,r = (Dϕ)l,r ·ψ +ϕ·(Dψ)l,r +
κ,κ∈Nπ
α(κ, κ, l)·(Dϕ)κ,r ·(Dψ)κ,r .
7 Stochastic Analysis
313
In the proof of this result we use the computation of the kernels (see Theorem 7.6.4) of the terms in the theorem, like ϕm · ψm , D (ϕm · ψm ) , . . . We use the following notation. An internal function : ∗ RT → ∗ R is called a polynomial restricted by S ∈ ∗ N if
(X ) =
Fn (k, t)
n∈ ∗ N0 k∈Nnπ t∈T
n +
pki (X ti ),
i=1
where Fn : Nnπ × T=n → ∗ R is internal and symmetric and and therefore also the Fn are restricted by S. By Theorem 7.6.2, we can assume that each ϕ : ∗ RT → R in L 2D (π L ) has a polynomial lifting : ∗ RT → ∗ R ∈ SL2 (π). If ϕ is Malliavin differentiable, then we can, in addition, assume that D : (l, r, X ) →
Fn (k, l, t, r )
n∈ ∗ N k ∈Nπn−1 t ∈T
n−1 +
pki (X ti )
i=1
belongs to SL2 (c ⊗ ν ⊗ π) and is a lifting of the Malliavin derivative of ϕ. In analogy to the Brownian motion case define: Fix m ∈ N, ρ ∈ m ∪ {0}, a strictly increasing ρ-tuple β1 < · · · < βρ in m and i ∈ {ρ, . . . , m}. Let τ : i − ρ ↑ m β1 , . . . , βρ be a strictly monotone increasing function from i − ρ into m β1 , . . . , βρ . Then τ denotes the complement of τ , i.e., τ is the uniquely determined strictly monotone increasing function from m − i onto m \ range(τ ) ∪ β1 , . . . , βρ . Account that τ depends on m. Here is the key to the product- and also to the chain rule: Lemma 7.6.13 Suppose that ϕ, ψ ∈ L 2D (π L ) are restricted by some standard S ∈ N. Then we have in L 2 (c ⊗ π L ⊗ λ): (D (ϕ · ψ))(l,r ) = (Dϕ)(l,r ) · ψ + ϕ · (Dψ)(l,r ) +
α(κ, κ, l) · D (ϕm )κ,r · D (ψm )κ,r .
κ,κ∈Nπ
Proof By Theorem 7.6.2, ϕ and ψ have polynomial liftings and . Since ϕ, ψ are restricted by S, we can assume that and are also restricted by S. Therefore · belongs to SL2 (π) and is a lifting of ϕ · ψ ∈ L 2D (π L ). Moreover, D ( · ) , D · , · D ∈ SL2 (c ⊗ π ⊗ ν) are liftings of D (ϕ · ψ) , Dϕ · ψ, ϕ · 2 Dψ, respectively, and κ,κ∈Nπ σ(κ, κ, ·)· Dκ · Dκ ∈ SL (c ⊗ π ⊗ ν) is a lifting of κ,κ∈Nπ α(κ, κ, ·) · Dϕκ · Dψκ . By Theorem 7.6.4, the kernels of ϕ · ψ are the standard parts of the kernels K m· under · , given by K 0· = Eπ ( · ) and for m ≥ 1, K m· (l, r ) = H m Eπ · · pl1 (X r1 ) · · · · · plm (X rm )
314
H. Osswald
m with l ∈ Nm .π , r ∈ T< . Let Fn , G n , be the kernels of , , respectively. Finite combinatorics tells us that K m· (l, r ) is the finite sum:
K m· (l, r ) = m
m
1 · Hn n n
ρ=0 κ,κ∈Nρπ β∈m ρ< i=ρ τ :i−ρ↑m {β1 ,...,βρ } n∈N0 k∈Nπ t∈T<
with = Fn+i (k, κ, lτ1 , . . . lτi−ρ , t, rβ1 , . . . rβρ , rτ1 , . . . rτi−ρ )· G n+m−i+ρ (k, κ, lτ 1 , . . . lτ m−i , t, rβ1 , . . . rβρ , rτ 1 , . . . rτ m−i )· σ(κ1 , κ1 , lβ1 ) · · · · · σ(κρ , κρ , lβρ ). Note that
K m· (l, r ) = A + B + C(lm , rm )
with A=
m
m
1 · Hn n n
ρ=0 κ,κ∈Nρπ β∈(m−1)ρ< i=ρ τ :i−ρ↑m {β1 ,...,βρ },τ (i−ρ)=m n∈N0 k∈Nπ t∈T<
B=
m
m
1 · Hn n n
ρ=0 κ,κ∈Nρπ β∈(m−1)ρ< i=ρ τ :i−ρ↑m {β1 ,...,βρ },τ (m−i)=m n∈N0 k∈Nπ t∈T<
C(lm , rm ) =
m
m
1 · . Hn n n
ρ=0 κ,κ∈Nρπ β∈m ρ< ,βρ =m i=ρ τ :i−ρ↑m {β1 ,...,βρ } n∈N0 k∈Nπ t∈T< D
In the same way, computing the kernel K m−1lm ,rm for l = (l1 , . . . , lm−1 ) , r = (r1 , . . . rm−1 ), D
K m−1lm ,rm
·
m−1
·
under Dlm ,rm · , we obtain
(l, r ) = H m−1 Eπ Dlm ,rm · · pl1 (X r1 ) · · · · · plm−1 (X rm−1 ) =
m−1
1 · Hn n n
ρ=0 κ,κ∈Nρπ β∈m−1ρ< i=ρ τ :i−ρ↑m−1{β1 ,...,βρ } n∈N0 k∈Nπ t∈T<
Fn+i+1 (k, κ, lτ1 , . . . lτi−ρ , lm , t, rβ1 , . . . rβρ , rτ1 , . . . rτi−ρ , rm )·
7 Stochastic Analysis
315
G n+m−1−i+ρ (k, κ, lτ 1 , . . . lτ m−1−i , t, rβ1 , . . . rβρ , rτ 1 , . . . rτ m−1−i )· σ(κ1 , κ1 , lβ1 ) · · · · · σ(κρ , κρ , lβρ ). D
Note that K m−1lm ,rm
·
(l, r ) = A and
·Dlm ,rm
B = K m−1
, C(lm , rm ) =
D
K m−1η,rm
·Dη,rm
σ(η, η, lm ),
η,η∈Nπ ·D
D
·D
η,rm where K m−1 lm ,rm and K m−1η,rm are the kernels under · Dlm ,rm and under ν -almost all Dη,rm · Dη,rm , respectively. This proves that for all l ∈ Nπ and 3 r∈T : (D ( · ))(l,r ) ≈
(D)(l,r ) · + · (D)(l,r ) +
σ(κ, κ, l) · Dκ,r · Dκ,r .
κ,κ∈Nπ
Taking standard parts, we obtain the desired result. The product rule now follows from Lemma 7.6.13 and Theorem 7.6.6.
7.6.8 The Chain Rule Now assume that Nμ = {1}. This is true for one-dimensional Brownian motion and for Poisson processes. Then α = α(1, 1, 1).. In order to prove the chain rule which is an extension of the product rule for the case Nμ = {1}, we need the following useful lemma Lemma 7.6.14 Suppose that Nμ = {1} . Fix g : Rn → R with g(x1 , . . . , xn ) = x1k1 · .. · xnkn and ϕ1 , . . . , ϕn ∈ L 2D (μ L ) restricted by some S ∈ N. Then as an identity in L 2 (λ ⊗ μ L ) D(g (ϕ1 , . . . , ϕn )) =
1 α (g(ϕ1
+ α · Dϕ1 , . . . , ϕn + α · Dϕn ) − g(ϕ1 , . . . , ϕn )), if α = 0 . n if α = 0 i=1 (∂i g) (ϕ1 , . . . , ϕn ) · Dϕi ,
Proof By induction on n, using the product rule. In the case n = 1 apply induction on k1 . Theorem 7.6.15 (Chain Rule) Suppose that Nμ = {1}. Fix g: Rn → R and Malliavin differentiable functions ϕ1 , . . . , ϕn . Assume that the partial derivatives of g exist and that there are polynomials q j in n variables with lim q j = g and lim ∂i q j = ∂i g for i = 1, . . . , n.
316
H. Osswald
(A) Fix m ∈ N. Suppose that D q j ϕ1,m , . . . , ϕn,m j∈N converges in L 2 (μ L ⊗ λ) and E.μ L q j ϕ1,m , . . . , ϕn,m j∈N converges in R. Then g ϕ1,m , . . . , ϕn,m is Malliavin differentiable and we have μ L ⊗ λ-a.e.,
D(g ϕ1,m , . . . , ϕn,m ) r =
1 g(ϕ1,m + α · Dϕ1,m r , . . . , ϕn,m + α · Dϕn,m r ) − g(ϕ1,m , . . . , ϕn,m ) , α n where this fraction is equal to i=1 (∂i g) ϕ1,m , . . . , ϕn,m · Dϕi,m r if α = 0. More(B) Assume that (A) m ∈ N, and g and ∂i g are continuous. is true for all over, let D g ϕ1,m , . . . , ϕn,m m∈N , Eμ L g ϕ1,m , . . . , ϕn,m m∈N converge in L 2 (μ L ⊗ λ), in R, respectively. Then g (ϕ1 , . . . , ϕn ) is Malliavin differentiable and we have μ L ⊗ λ-a.e. (D(g (ϕ1 , . . . , ϕn )))r = g(ϕ1 + α · (Dϕ1 )r , . . . , ϕn + α · (Dϕn ))r − g(ϕ1 , . . . , ϕn ) , α n where this fraction is equal to i=1 (∂i g) (ϕ1 , . . . , ϕn ) · (Dϕi )r if α = 0. In the work of G. Di Nunno, Th. Meyer-Brandis, B. Øksendal, F. Proske [12] on pure jump processes, for example in the case of Poisson processes, the product rule has the form (D (ϕ · ψ))l = (Dϕ)l · ψ + ϕ · (Dψ)l + D (ϕm )l · D (ψm )l . In case of the chain rule, they prove a corresponding formula via the Wick product similar to the formula above for α = 0. Moreover, in that work and also in the work of J.L. Solé, F. Utzet, J. Vives [40], the set R × ∗ RT × [0, ∞[ is the domain of the Malliavin derivative Dϕ of a Lévy functional ϕ, where the measure on R depends on the Lévy process. In our approach Dϕ is defined on Nμ ×∗ RT ×[0, ∞[, the measure on Nμ ×[0, ∞[ is the product of the counting measure on Nμ and Lebesgue measure on [0, ∞[. In their work, Nualart and Schoutens [30] take the power jump processes of a Lévy process to prove a chaos representation result for Lévy functionals. J.A. Léon, J.L. Solé, F. Utzet, J. Vives [23] use their approach to define the directional Malliavin derivative and the directional Skorokhod integral. We refer the reader to the book [36] for further study of a nonstandard approach to stochastic analysis for infinite-dimensional Brownian motion and a large class of Lévy processes. The first part of this book is standard. It contains a detailed proof of the famous Burkholder Gundy Davis inequalities and an introduction to abstract Wiener spaces on the Fréchet space CB of continuous function from [0, ∞[ into any separable Banach space B. This space CB is used in order to have the notion of
7 Stochastic Analysis
317
“non-time-anticipating” as in the classical Wiener space CR . Moreover, following [33], Malliavin calculus is studied there on the space of convergent real sequences, endowed with the topology of pointwise convergence. This is an abstract Wiener Fréchet space over the Hilbert space l 2 of square summable sequences. Problems (1) Prove Theorem 7.6.5. (2) Prove that H Eπ pi p j pk = 0 iff 1 appears under i, j, k exactly twice or never. (3) Fix ϕ ∈ L 2D⊗Leb (c ⊗ π L ⊗ λ) such that each ϕk is (br ∩ D)r ∈[0,∞[ -predictable.
Prove that ϕ is Skorokhod integrable and δϕ = ϕ p. ∗ 2 (4) Let F: ∗ RT × T →
R be (Bt − )t∈T -adapted and let F (·, t) be π-integrable for all t ∈ T . Then F pk is a π-square integrable (Bt )t∈T -martingale. (5) Prove Theorem 7.5.9. (6) Prove Theorem 7.5.10. (7) Prove Theorem 7.4.14. (8) Prove Theorem 7.5.3. (9) Probe Lemma 7.5.4. (10) Prove Lemma 7.5.7.
References 1. K. Aase, B. Øksendal, N. Privault, J. Ubøe, White noise generalizations of the Clark-HausmannOcone theorem with applications to mathematical finance. Prépublications du Département de Mathématiques, (Université de la Rochelle 1999) 2. S. Albeverio, J.E. Fenstad, R. Høegh Krohn, T. Lindstrøm. Nonstandard Methods in Stochastic Analysis and Mathematical Physics, (Academic Press, Orlando 1986) 3. R.M. Anderson, A nonstandard representation of Brownian motion and Itô integration. Isr. J. Math. 25, 15–46 (1976) 4. J. Berger, An Infinitesimal Approach to Stochastic Analysis on Abstract Wiener Spaces, Dissertation, (Ludwig Maximilians Universität München 2002) 5. V.I. Bogachev, Gaussian Measures, Mathematical Surveys and Monographs vol. 62 (American Mathematical Society, Providence, 1998) 6. N. Bouleau, F. Hirsch, Dirichlet Forms and Analysis on Wiener Space (de Gruyter Studies in Mathematics, Walter de Gruyter, Berlin, 1991) 7. J.M.C. Clark, The representation of functionals of Brownian motion by stochastic integrals, Ann. Math. Stat., 41, 1282–1295, (1970), 42, 1778 (1971) 8. N. Cutland, S.-A. Ng, A nonstandard approach to the Malliavin calculus, in Advances in Analysis, Probability and Mathematical Physics—Contributions of Nonstandard Analysis, ed. by S. Albeverio, W.A.J. Luxemburg, M.P.H. Wolff (Kluwer Academic Publishers, Dordrecht, 1995), pp. 149–170 9. G. Di Nunno, Th. Meyer-Brandis, B. Øksendal, F. Proske, Malliavin calculus and anticipative itô formulae for Lévy processes (Infin. Dimen. Anal. Quantum. Prob. Rel, Top, 2005) 10. G. Di Nunno, B. Øksendal, F. Proske, White noise analysis for Lévy processes. J. Funct. Anal. 206(1), 109–148 (2004) 11. G. Di Nunno, On orthogonal polynomials and the Malliavin derivative for Lévy random stochastic measures. Stoch. Stoch. Rep. 76, 517–548 (2004)
318
H. Osswald
12. G. Di Nunno, Th. Meyer-Brandis, B. Øksendal, F. Proske, Malliavin calculus and anticipative Itô formulae for Lévy processes. Infin. Dimens. Anal. Quantum Probab. Relat. Top. 8, 235–258 (2005) 13. G. Di Nunno, B. Øksendal, F. Proske, Malliavin Calculus for Lévy Processes with Applications to Finance (Springer, Berlin, 2009) 14. J.L. Doob, Stochastic Processes (Wiley, New York, 1953) 15. T. Duncan, P. Varaiya, On the solutions of a stochastic control system. SIAM J. Control 13(5), 1077–1092 (1975) 16. T. Duncan, Fréchet valued martingales and stochastic integrals. Stochastics 1, 269–284 (1976) 17. L. Gross, Measurable functions on Hilbert space. Trans. Am. Math. Soc. 105, 372–390 (1962) 18. H.Heuser, Lehrbuch der Analysis, Teil 2, (Teubner Verlag Stuttgart, 1990) 19. D.L. Hoover, E.A. Perkins, Nonstandard construction of the stochastic integral and applications to stochastic differential equations I and II. Trans. Am. Math. Soc. 275, 1–58 (1983) 20. I. Karatzas, S.E. Shreve, Brownian Motion and Stochastic Calculus (Springer, Berlin, 1988) 21. H.J. Keisler, An infinitesimal approach to stochastic analysis. Mem. Am. Math. Soc. 48 (1984) 22. H.H. Kuo, Gaussian Measures on Banach Spaces, Lecture Notes in Mathematics, vol. 463 (1975) 23. J.A. Léon, J.L. Solé, F. Utzet, J. Vives, On Lévy processes. Malliavin calculus and market models with jumps. Financ. Stoch. 6(2), 197–225 (2002) 24. T. Lindstrøm, Hyperfinite stochastic integration I, II, III, and addendum, Math. Scand. 46, 265–333 (1980) 25. T. Lindstrøm, Hyperfinite Lévy processes. Stochastics 76(6), 517–548 (2004) 26. P.A. Loeb, Conversion from nonstandard to standard measure spaces and applications in probability theory. Trans. Am. Math. Soc. 211, 122–131 (1975) 27. P. Malliavin, Stochastic calculus of variations and hypoelliptic operators, in Proceedings of the International Symposium on Stochastic Differential Equations, Kyoto 1976, (1978), pp. 195–263 28. P.R. Masani, Norbert Wiener, Vita Mathematica. vol. 5, (Birkhäuser Verlag, Basel) 29. D. Nualart, The Malliavin Calculus and Related Topics (Springer, Berlin, 1995) 30. D. Nualart, W. Schoutens, Chaotic and predictable representations for Lévy processes. Stoch. Process. Appl. 90, 109–122 (2000) 31. D. Ocone, Malliavin Calculus and stochastic integral representation of diffusion processes. Stochastics 12, 161–185 (1984) 32. H. Osswald, On the Clark-Ocone formula for the abstract Wiener space. Adv. Math. 176, 38–52 (2003) 33. H. Osswald, Malliavin calculus on product measures of RN based on chaos. Stochastics 77(6), 501–514 (2005) 34. H. Osswald, A smooth approach to Malliavin calculus for Lévy processes. J. Theor. Probab. 22, 441–473 (2009) 35. H. Osswald, Computation of the kernels of Lévy functionals and applications. Ill. J. Math. 55(3), 815–833 (2011) 36. H. Osswald, Malliavin Calculus for Lévy Processes and Infinite-dimensional Brownian Motion, Cambridge Tracts in Mathematics. vol. 191, (Cambridge University Press, Cambridge 2012) 37. H. Osswald, S. Sanders, Local constructivity in nonstandard analysis, in preparation 38. A.V. Skorokhod, On a generalization of a stochastic integral. Theo. Probab. Appl. 20, 219–233 (1975) 39. O.G. Smolyanov, H.v. Weizsäcker, Smooth Probability Measures and Associated Differential Operators, to appear in Inf. Dim. Anal. Quantum Probab 40. J.L. Solé, F. Utzet, J. Vives, Canonical Lévy process and Malliavin calculus. Stoch. Process. Appl. 117, 165–187 (2007) 41. A.S. Üstünel, M. Zakai, Transformations of Wiener measure under anticipative flows. Probab. Theory Relat. Fields 93, 91–136 (1992) 42. A.S. Üstünel, M. Zakai, Embedding the abstract Wiener space in a probability space. J. Funct. Anal. 171, 124–138 (2000)
7 Stochastic Analysis
319
43. A.S. Üstünel, M. Zakai, Transforms of Measure on a Wiener Space Springer Monographs in Mathematics, (Springer Berlin 2000) 44. N. Wiener, The homogeneous chaos. Am. J. Math. 60, 879–936 (1938) 45. M. Zakai, The Malliavin calculus. Acta Appl. Math. 3 (1985)
Chapter 8
New Understanding of Stochastic Independence Yeneng Sun
8.1 The General Context The aim of this chapter is to illustrate how some special properties of nonstandard constructions based on conventional mathematics can be combined with ordinary mathematical methods to reveal completely new mathematical phenomena about stochastic independence in a systematic way. The particular nonstandard construction to be exploited here is a special type of standard probability space, the Loeb space, introduced by Peter Loeb in [24]. The reader is referred to Chap. 6 of this book for this basic external object in nonstandard analysis. As noted by Keisler in [20], most previous applications of Loeb spaces fit the following pattern. First, lift the original classical problem to a hyperfinite setting. Second, make some hyperfinite computations. Third, take standard parts of everything in sight to obtain the desired classical result (called pushing-down). Many theorems have been proved or strengthened using this idea; see, for example, Chap. 6 of this book and the books [1, 5, 30, 31]. Recently, Keisler and Fajardo (see [11, 21]) extracted the essential features in such a nonstandard proof in developing a theory of neocompact sets to give a deeper understanding of the following question: Why can nonstandard methods be effectively used to go from a sequence of approximations to the existence of a “limit” of such a sequence? Note that the general transfer principle in Chap. 2 says that hyperfinite computations are equivalent to approximate computations in a standard, large but finite setting. Thus, the essential point in the hyperfinite approach is still what we shall call “from approximate to limit”. On the other hand, the approach to be taken here may be summarized as “from limit to limit”. We simply work with the special limiting entities arising from nonstandard constructions themselves throughout a typical proof by applying conventional Y. Sun (B) Department of Mathematics, National University of Singapore, 10 Lower Kent Ridge Road, Singapore 119076, Republic of Singapore e-mail: [email protected] © Springer Science+Business Media Dordrecht 2015 P.A. Loeb and M.P.H. Wolff (eds.), Nonstandard Analysis for the Working Mathematician, DOI 10.1007/978-94-017-7327-0_8
321
322
Y. Sun
mathematical techniques. In particular, we combine the power of existing measuretheoretic methods with some special measure-theoretic properties of Loeb product spaces. The exact results obtained this way are new from two points of view. First, they are new as measure-theoretic results, which not only have no known classical analogues but also cannot be obtained in the classical measure-theoretic framework. Second, the exact results can be translated to some new approximate results in the discrete setting through the routine procedures of lifting, pushing-down and transfer. A natural question regarding the approach “from limit to limit” is whether the proof of an exact result can also be routinely translated into a proof for the corresponding approximate result. The answer is in general “no”, because the proof of the exact result usually uses some standard continuum mathematics that has no natural discrete analogue, whence backwards transfer in the direction “from limit to approximate” breaks down. This means that the relevant approximate results, though stated in the standard universe, may be difficult to prove directly in the discrete framework (or an entirely different proof may be needed). This explains in part why the approach “from limit to limit” can be useful for the discovery of completely new phenomena. From a logical point of view, the use of external sets does give additional proof-theoretic power for nonstandard analysis (see the work of Henson and Keisler in [16]). Since essentially every step of a proof in the approach “from limit to limit” involves external sets, the results exploit in a stronger sense the strength of nonstandard analysis. Note that the earlier developed nonstandard approach “from approximate to limit” often leads to special limiting objects that are expressible in conventional terms, although they may not be available in a traditional mathematical framework. Thus the approach “from limit to limit” may provide unique opportunities for mathematicians to apply some conventional mathematical operations to those special objects to make strikingly new conceptual connections in mathematics and its applications.
8.2 The Specific Problems Stochastic independence (or simply independence) has long been the central concept in probability theory. Enormously many results have been obtained with the help of this notion or its variations. Do we, however, really know most of its fundamental connections with other basic probabilistic concepts and properties? The temptation is to give an immediate positive answer. If one opens up an elementary probability textbook, one may observe that even the notion of independence has been defined in different ways, such as pairwise independence and various versions of independence involving a multiple number of random variables. The condition of independence also implies a number of multiplicative properties of random variables, such as those involving generating functions, characteristic functions, maximum of random variables, etc. Note that for a finite collection of random variables, these independence notions or multiplicative properties are all very different. Thus, these concepts seem to be inherently distinct.
8 New Understanding of Stochastic Independence
323
However, as demonstrated by the subject of probability theory itself, quite new properties arise whenever mass phenomena are studied. One might ask if a large number of random variables are considered, how much difference remains among the different notions of independence and among the multiplicative properties? In this chapter, we shall adopt the nonstandard approach “from limit to limit” to establish, among other results, a number of fundamental connections between these basic probabilistic concepts. Of course, the concept of independence has already been intensively studied in the discrete setting. The most natural example is the law of large numbers for a sequence of independent random variables. However, even in this simple case, independence is only known as a sufficient condition for the validity of the law, and nothing has been said about the possible converse relationship. On the other hand, since continuum methods are often more powerful for the qualitative analysis of various problems, it is natural to ask why the concept of independence has not been studied extensively in the setting of a continuum of random variables. The difficulty with such a continuum approach is the so-called measurability problem. In particular, it was pointed out by Doob (see [9], p. 67) that if the random variables of a continuous parameter process are independent and have a common distribution (not concentrated at a single point), then the process is not jointly measurable and even has no measurable standard modification with respect to the relevant product measure. More generally, Proposition 8.3.3 shows that independence and joint measurability with respect to the usual measure-theoretic product are never compatible with each other except for some trivial cases. Thus, to study independence in a continuum setting, one has to go beyond the usual measuretheoretic framework to solve first this incompatibility problem. The essential idea to be used here is to adopt the approach “from limit to limit” by applying various simple measure-theoretic and probabilistic operations to a limiting object in nonstandard analysis, the Loeb product space. Here a Loeb product space is the Loeb space of the internal product of two internal probability spaces (see Sect. 6.2.4). In particular, the Loeb product space is shown to be rich enough for the solution of the incompatibility problem on joint measurability and independence (see Proposition 8.4.1). An important property of the Loeb product space is the so-called Keisler’s Fubini theorem for Loeb measures (see Sect. 6.3.6). This property and the richness of Loeb product spaces are essential for conducting the relevant measuretheoretic and probabilistic operations, which are not applicable in the traditional framework due to the incompatibility of joint measurability and independence. The rest of the chapter is organized as follows. Section 8.3 discusses the measurability difficulty with a continuum of independent random variables, and Sect. 8.4 provides a resolution using Loeb product spaces. In Sect. 8.5, some exact versions of the law of large numbers are presented that provide a rigorous foundation for the exact cancellation of idiosyncratic risks underlying many economic models. Section 8.6 includes several results on the converse law of large numbers, i.e., characterization of uncorrelatedness or independence in terms of the satisfiability of the law for large subsystems. In this way, the pervasive scientific phenomena of individualspecific randomness can be characterized by the almost versions of uncorrelatedness
324
Y. Sun
or independence. The almost equivalence of pairwise and mutual independence is shown in Sect. 8.7. Section 8.8 establishes the duality of independence with another basic probabilistic concept, exchangeability. Section 8.9 unifies various seemingly unrelated multiplicative properties of random variables. In Sect. 8.10, we illustrate how the routine procedure of lifting, pushing-down and transfer can be used to obtain asymptotic results from results on Loeb spaces. If two concepts are related in an exact way in the ideal setting, then they are asymptotically related in the discrete setting. In the final Sect. 8.11, we present some incomplete notes on related work not discussed in the main body of this chapter. Except for Proposition 8.3.1, all the results presented in this chapter are taken from [32–34].
8.3 Difficulties in the Classical Framework Let = R[0,1] be the space of real-valued functions on T = [0, 1]. Let λ be Lebesgue measure on T , and let P be a product measure on constructed from a non-Dirac probability distribution on R. Note that the existence of such a continuum product measure is guaranteed by the Kolmogorov existence theorem (see [4], p. 230). For t ∈ T , let f t be the tth coordinate function, i.e., f t (ω) = ω(t), where ω ∈ is a function on T . Then, f is a real-valued process with a continuum of independent and identically distributed (iid) random variables. For a fixed ω, the function f (·, ω) on T is denoted by f ω , called a sample function. Since the classical law of large numbers holds for a sequence of iid random variables, one may hope an exact version of the law of large numbers is valid for f so that the equality of sample and theoretical distributions can be claimed. However, it was already shown by Doob [8] (see Theorem 2.2, p. 113) that the sample functions may not be Lebesgue measurable. For the convenience of the reader, this result of Doob is reproduced as a proposition below. Proposition 8.3.1 Let h be any real-valued function on [0, 1]. Let Mh = {ω : f ω (t) = h(t) except for countably many t ∈ T }. Then Mh has P-outer measure one. Proof Let B be any measurable set in the continuum product σ-algebra on . Then B is determined on a countable index set C in the sense that for any α and β in , if α(t) = β(t) for all t ∈ C, then α ∈ B if and only if β ∈ B. Now suppose that B contains Mh . Take any ω ∈ . Define ω so that it agrees with ω on C and with h on T \ C. Since ω (t) = h(t) except for countably many t ∈ T , we have ω ∈ Mh , and thus ω ∈ B. On the other hand, since ω and ω agree on C, the property of C implies that ω ∈ B also. This means that B = , and hence the outer measure P ∗ (Mh ) of Mh is one.
8 New Understanding of Stochastic Independence
325
Remark 8.3.2 As noted in [8], if one takes h to be non-Lebesgue measurable, then so is every function in Mh . Thus, the outer measure of all non-Lebesgue measurable samples must be one. On the other hand, if one wants the sample means to be some number a almost surely, then one can take h to be the constant function taking the value a. Since P ∗ (Mh ) = 1, one can trivially extend P to a measure P¯ with ¯ h ) = 1, and then claim that almost all sample means are equal to a. However, P(M the arbitrary choice of a simply makes the statement meaningless. Note that the claim is actually based on a more absurd underlying statement, namely, that almost all sample functions take an arbitrary constant value a. As commented on in [18], this appears to be a weak straw to clutch (for those who use such statements to claim the complete removal of individual risks, i.e., the validity of the exact law of large numbers). Note that the lack of joint measurability for the process f itself can never be resolved by extending the measures on the sample space, since, as has been noted by Doob, such a process even has no measurable standard modification with respect to the relevant product measure (see [9], p. 67). More generally, the following proposition shows that no matter what kind of measure spaces are taken as the parameter and sample spaces of a process, independence and joint measurability with respect to the usual measure-theoretic product are never compatible with each other except for the trivial case. This also means that even if one chooses a very large σ-algebra on the parameter space of a nontrivial independent process so that all the sample functions are measurable, the process itself is still not jointly measurable. Proposition 8.3.3 Let (I, I, μ) and (X, X , ν) be any two probability spaces with a complete product probability space (I × X, I ⊗ X , μ ⊗ ν), and let f be a function from I × X to a separable metric space. If f is jointly measurable on the product probability space, and for μ⊗μ-almost all (i 1 , i 2 ) ∈ I ×I , f i1 and f i2 are independent (this condition is called almost sure pairwise independence), then, for μ-almost all i ∈ I , f i is a constant random variable, where f i is the function on X defined by f (i, ·). Proof We give a proof for the case that f is real-valued and bounded. The general case can then be established by using the composition with f of a Borel measurable injection of X into [0, 1]. (This still satisfies the condition of almost sure pairwise independence.) Let A be any measurable set in I. By the Fubini Theorem, it is easy to establish the following identities:
2 ( f (i, x) − E f i )dμ(i)
dν(x) = ( f (i 1 , x) − E f i1 )dμ(i 1 ) ( f (i 2 , x) − E f i2 )dμ(i 2 )dν(x) A X A = ( f (i 1 , x) − E f i1 )( f (i 2 , x) − E f i2 )dν(x)dμ ⊗ μ(i 1 , i 2 ), X
A
A×A
X
326
Y. Sun
which is zero by the condition of almost sure pairwise independence. Hence, for = 0. ν-almost all x ∈ X , A ( f (i, x) − E f i )dμ(i) Thus, for any measurable set B in X , A×B ( f (i, x)−E f i )dμ⊗ν = 0. This means that the signed measure τ defined on (I × X, I ⊗ X ) by integrating f (i, x) − E f i on sets in I ⊗ X agrees on rectangles with the zero measure. Note that the product algebra I ⊗ X is generated by all the rectangles A × B, and the collection of rectangles is also closed under finite intersections, i.e., it is a π-system. By applying Dynkin’s π − λ Theorem (see [7], p. 44 and [10], p. 404), we obtain the fact that the signed measure τ is equal to the zero measure. Thus, both f (i, x) − E f i and 0 are Radon-Nikodym derivatives of the same measure. By the uniqueness of RadonNikodym derivatives, we have f (i, x) = E f i for μ ⊗ ν-almost all (i, x) ∈ I × X . Therefore, for μ -almost all i ∈ I , f i is the constant random variable E f i . Remark 8.3.4 If μ is atomless and the process f has mutually independent random variables, then the condition of almost sure pairwise independence is obviously satisfied. The above result is still valid when μ has an atom A; one can simply observe that the almost sure pairwise independence condition implies the essential constancy of the random variables f i for almost all i ∈ A.
8.4 The Resolution Proposition 8.3.3 shows the general incompatibility of independence and joint measurability. Thus, in order to study independence in a continuum setting, one has to go beyond the usual measure-theoretic framework by resolving this incompatibility problem. The solution is achieved in the following universality result. The result shows, in particular, that the Loeb product σ-algebra L λ⊗P (T ⊗ A) is rich enough to support almost surely pairwise independent processes with any variety of distributions when the associated marginal Loeb measures λ L and PL are atomless. Here (T, T , λ) and (, A, P) are internal probability spaces, and (T × , T ⊗ A, λ ⊗ P) is the relevant internal product space. Their associated Loeb spaces are respectively (T, L λ (T ), λ L ), (, L P (A), PL ), and (T × , L λ⊗P (T ⊗ A), (λ ⊗ P) L ) (see Chap. 6). It is shown in [23] that the Loeb product space (T × , L λ⊗P (T ⊗ A), (λ ⊗ P) L ) is uniquely determined by its factor Loeb spaces. Proposition 8.4.1 Let X be a complete, separable and metrizable topological space, and let M(X ) be the space of Borel probability measures on X , where M(X ) is endowed with the topology of weak convergence of measures (see [4]). Let μ be any Borel probability measure on the space M(X ). If both λ L and PL are atomless, then there is a process f from (T × , L λ⊗P (T ⊗ A), (λ ⊗ P) L ) to X such that the random variables f t are almost surely pairwise independent (i.e., for (λ ⊗ λ) L almost all (t1 , t2 ) ∈ T × T , f t1 and f t2 are independent), and the distribution on M(X ) induced by the function PL f t−1 from T to M(X ) is the given measure μ. Here, for any given t ∈ T , PL f t−1 is the distribution on X induced by the random variable f t .
8 New Understanding of Stochastic Independence
327
The proof of this result is rather involved. The difficulty comes from the arbitrary choice of the sample space and the variety of distributions taken by the random variables. The interested reader can find the full proof in Theorem 6.2 in [33]. Remark 8.4.2 It is trivial to obtain independent processes on some Loeb product spaces. For example, take a sequence of iid random variables {αn }n∈N with a given distribution ν on the space X . Transfer the sequence to {∗ αn }n∈∗ N . Let βn be the standard part of ∗ αn . Then, for any hyperinteger γ, {βn }1≤n≤γ is an iid process on a relevant Loeb product space with a common distribution ν (see the problem below). In addition, almost all sample means for this process are the same constant, as can be seen by simply transferring the classical law of large numbers. See [1, 19, 27, 31] for some uses of such transfers. However, besides being mathematically the same as its classical counterpart, the transfer of the classical law to the hyperfinite setting has the problem that the classical independence condition now becomes ∗-independence, which is not the usual notion of independence. Problem 8.4.3 Let φ be a random variable taking values in a complete, separable and metrizable topological space, and let ∗ φ be its transfer. Show that φ and the standard part ◦ (∗ φ) have the same distribution. Remark 8.4.4 Since a nontrivial L λ⊗P (T ⊗ A) -measurable process f with almost surely pairwise independent random variables can indeed be constructed in the atomless case, Proposition 8.3.3 says that f cannot be measurable with respect to the usual complete product space (T × , L λ (T ) ⊗ L P (A), λ L ⊗ PL ). This means that the Loeb product algebra L λ⊗P (T ⊗ A) is always strictly bigger than the usual product algebra L λ (T ) ⊗ L P (A) in the atomless case. Note that a special example for such a strict inclusion due to Hoover can be found in [1], where T is a hyperfinite set, its internal power set, and both λ and P are the internal counting probability measures. The following proposition provides a complete characterization of the relationship between the two types of product spaces for Loeb spaces. The proof is left to the interested reader (also see Proposition 6.6 in [33]). Proposition 8.4.5 The usual product σ-algebra L λ (T ) ⊗ L P (A) is strictly contained in the Loeb product algebra L λ⊗P (T ⊗ A) if and only if both Loeb measures λ L and PL are not purely atomic.
8.5 Exact Law of Large Numbers For simplicity, all processes on the Loeb product space (T × , L λ⊗P (T ⊗ A), (λ ⊗ P) L ) are assumed to be measurable with respect to the Loeb product algebra L λ⊗P (T ⊗ A). Definition 8.5.1 A real-valued process f on the Loeb product space is said to satisfy the law of large numbers if for almost all sample realizations ω ∈ , the mean of the sample function f ω on the parameter space T is essentially a constant.
328
Y. Sun
Remark 8.5.2 When f satisfies the law, the relevant constant must be equal to the mean of f viewed as a random variable on the and samples. joint space of parameters This means that for PL -almost all ω ∈ , T f ω (t)dλ L (t) = T × f d(λ ⊗ P) L . The following theorem shows that, by using the Fubini property with respect to the larger Loeb product algebra, it is extremely simple to obtain a version of the exact law of large numbers in terms of sample means. In fact, a similar result even holds in the complex case (see Problem 8.5.4). Note that some of the technical integrability conditions are omitted in the proof; these can be found in the proof given in [33]. Theorem 8.5.3 Let f be a real-valued square integrable process on the Loeb product space. If the random variables f t are almost surely uncorrelated, then f satisfies the law of large numbers, where the almost sure uncorrelatedness condition means that for (λ ⊗ λ) L -almost all (t1 , t2 ) ∈ T × T , f t1 and f t2 are uncorrelated. Proof By Keisler’s Fubini theorem (see Sect. 6.3.6), it is easy to obtain the following identities as in the proof of Proposition 8.3.3:
T×T = =
( f t1 − E f t1 )( f t2 − E f t2 )d PL d(λ ⊗ λ) L
( f (t1 , ω) − E f t1 )( f (t2 , ω) − E f t2 )d(λ ⊗ λ) L d PL ( f (t1 , ω) − E f t1 )dλ L (t1 ) ( f (t2 , ω) − E f t2 )dλ L (t2 )d PL T ×T
T
=
2 T
( f ω (t) − E f t )dλ L
T
d PL ,
which is zero by the assumption of almost sure uncorrelatedness. Hence, for PL almost all ω ∈ , T ( f ω (t) − E f t )dλ L = 0, which implies that E f ω = E f . Problem 8.5.4 Let f be a complex-valued square integrable process on the Loeb product space. If the random variables f t are almost surely uncorrelated, i.e., for (λ ⊗ λ)L -almost all (t1 , t2 )∈ T × T , E f t1 f t2 = E f t1 E f t2 , then for PL -almost all ω ∈ , T f ω (t)dλ L (t) = T × f d(λ ⊗ P) L . Next, let U denote the usual product σ-algebra L λ (T ) ⊗ L P (A). For an integrable real-valued process f on the Loeb product space, let E( f |U ) denote the conditional expectation of f with respect to U. Note that such a construction that involves both a product σ-algebra and a significant extension of it (such as the U here and the Loeb product algebra L λ⊗P (T ⊗ A) ) is unique in the sense that it has no natural counterpart in standard mathematical practice or in nonstandard mathematics using only internal entities. The following theorem shows that the operation of the conditional expectation with respect to U is also related to the exact law of large numbers.
8 New Understanding of Stochastic Independence
329
Theorem 8.5.5 Let f be a real-valued integrable process on the Loeb product space. If the conditional expectation E( f |U ) of f with respect to U is essentially a function of t, then f satisfies the law of large numbers. Proof By the assumption, E( f |U ) = h for some integrable function h on T . It follows from the definition of the conditional expectation that
T ×B
f d(λ ⊗ P) L =
T ×B
E( f |U )d(λ ⊗ P) L =
T ×B
hd(λ ⊗ P) L
for any measurable set B in L P (A). By Keisler’s Fubini theorem,
B
T
f ω (t)dλ L d PL =
h(t)dλ L d PL . B
T
Thus the measure ν on (, L P (A)) defined by ν(B) = B T ( f ω (t)−h(t))dλ L d PL is the zero measure. By the uniqueness of the Radon-Nikodym derivative, we have T ( f ω − h)dλ L = 0 for P-almost all ω ∈ , and hence the sample average E f ω is the constant T h(t)dλ L . The previous two theorems are concerned with the exact law of large numbers in terms of means. We shall now consider the distributional analogues. Definition 8.5.6 Let f be a process from the Loeb product space to a separable metric space X . We say that f satisfies the law of large numbers in distribution (or simply the law in distribution) if for PL -almost all ω ∈ , the empirical distribution of the sample function f ω is the same as the distribution of f as a random variable on the Loeb product space. Theorem 8.5.7 shows that almost sure pairwise independence is a sufficient condition for the validity of the law in distribution. Note that both Theorems 8.5.3 and 8.5.7 are already much stronger than the type of exact law of large numbers for iid processes postulated in the economic literature (see, for example, [3, 18]). In particular, there is no assumption on identical distributions. Theorem 8.5.7 Let f be a process from the Loeb product space to a separable metric space X . If the random variables f t are almost surely pairwise independent, i.e., for (λ ⊗ λ) L -almost all (t1 , t2 ) ∈ T × T , f t1 and f t2 are independent, then f satisfies the law in distribution. Proof Fix a countable open base {On }∞ n=1 for X that is closed (i.e., stable) with respect to finite intersections. By the independence assumption, the random variables (χ f −1 (On ) )t in the process χ f −1 (On ) are almost surely uncorrelated, where χ A denotes the indicator function of a set A. By Theorem 8.5.3, there is a PL -null set Nn such that for any ω ∈ / Nn , λ L ( f ω−1 (On )) = (λ ⊗ P) L ( f −1 (On )). Let N be the union of all the Nn . Then N is still a PL -null set.
330
Y. Sun
For each ω ∈ , let μω be the distribution of the sample function f ω . The distribution of f as a random variable is denoted by μ. Then, for any ω ∈ / N, μω (On ) = λ L ( f ω−1 (On )) = (λ ⊗ P) L ( f −1 (On )) = μ(On ) for all n ≥ 1. Since the class of the sets On generates the Borel algebra, and it is also closed under finite intersections (i.e., it is a π-system), it follows from the result on / N. the uniqueness of measures (see [7], p. 45) that μω = μ for any ω ∈
8.6 Converse Law of Large Numbers The study of some random systems reveals that certain phenomena admit macroscopic stability. This means that stability occurs not only for a whole system but also for macroscopically large subsystems. For example, a gambler cannot change the expectation of his return by timing his betting at particular subsequences. This property has been characterized with the aphorism “No betting system can beat the house”. In economic models, one also considers the case where individual agents face idiosyncratic risks, i.e., risks of gains or losses that are non-negligible on the individual level but can be exactly predicted in the aggregate on the macroscopic level (see [3, 33] for references). In the following definition, we formalize and generalize this intuitive observation of macroscopic stability. Definition 8.6.1 Let f be a real-valued process on the Loeb product space. We say that f satisfies the property of coalitional law of large numbers if for any set A ∈ L λ (T ) with λ L (A) > 0, the process f A on the reduced Loeb product space (A × , L λ⊗P (T ⊗ A) A , (λ ⊗ P) LA ) still satisfies the law of large numbers. Here, f A and L λ⊗P (T ⊗ A) A are, respectively, the restriction of f and L λ⊗P (T ⊗ A) to A × , and (λ ⊗ P) LA denotes (λ ⊗ P) L | A× /λ L (A); it is a probability measure on (A × , L λ⊗P (T ⊗ A) A ) rescaled from (λ ⊗ P) L . It is clear that if a real-valued process f satisfies the sufficient conditions in Theorems 8.5.3 and 8.5.5, then so does a sub-process like f A above. This means that one can strengthen Theorems 8.5.3 and 8.5.5 to claim that f satisfies not only the law of large numbers but also the coalitional law. The next theorem provides a converse to this strengthened version of Theorem 8.5.5. It will follow that for a real-valued integrable process f , the satisfiability of the coalitional law is characterized by the fact that the conditional expectation E( f |U ) is essentially a function of t. Theorem 8.6.2 Let f be a real-valued integrable process on the Loeb product space. If f satisfies the coalitional law, then E( f |U ) = E f t . Proof For clarity, let h denote the mean function E f t on T . Pick any A ∈ L λ (T ) and B ∈ L P (A). By the assumption, we have for PL -almost all ω ∈ ,
8 New Understanding of Stochastic Independence
A
whence
B
A
331
f ω dλ L =
f ω dλ L d PL =
A
B
A
f d PL dλ L =
h(t)dλ L , A
h(t)dλ L d PL . Thus,
A×B
E( f |U )d(λ ⊗ P) L =
hd(λ ⊗ P) L . A×B
The signed measures defined on (T ×, L λ (T )⊗ L P (A)) by integrating E( f |U ) and h, respectively, on sets in L λ (T ) ⊗ L P (A) agree on rectangles. Since the collection of rectangles generates U = L λ (T ) ⊗ L P (A) and is closed under finite intersections, and is thus a π-system, Dynkin’s π − λ Theorem (see [7], p. 44 and [10], p. 404) implies that the two signed measures are equal to each other on U. Thus, both E( f |U ) and h are Radon-Nikodym derivatives of the same measure. By the uniqueness of Radon-Nikodym derivatives, we have E( f |U ) = h. The following theorem is a converse of Theorem 8.5.3; it shows that almost sure uncorrelatedness of random variables f t is the weakest condition needed to assure that f satisfies the coalitional law. Theorem 8.6.3 Let f be a real-valued square integrable process on the Loeb product space. If f satisfies the coalitional law, then the random variables f t are almost surely uncorrelated. Proof By Theorem 8.6.2, E(( f − E f t )|U ) = 0. Let g be another square integrable process. Claim: For (λ ⊗ λ) L -almost all (t , t) ∈ T × T , gt is uncorrelated to f t . Fix t ∈ T such that gt is square integrable. Then E((gt − Egt )( f − E f t )|U ) = (gt − Egt )E(( f − E f t )|U ) = 0. By interchanging the variables t and ω in Theorem 8.5.5, we obtain the equality (gt − Egt )( f t − E f t )d PL = 0 for λ L -almost all t ∈ T . By Keisler’s Fubini Theorem, gt is square integrable for λ L -almost all t ∈ T . Hence, for λ L -almost all t ∈ T , gt is uncorrelated to f t for λ L -almost all t ∈ T . The claim thus follows. The result in the theorem now follows from the claim by taking g = f . The following definition is an analogue of Definition 8.6.1 for the version of the law of large numbers in terms of empirical distributions. Definition 8.6.4 Let f be a process from the Loeb product space to a separable metric space X . We say that f satisfies the coalitional law of large numbers in distribution (or simply the coalitional law in distribution) if for any set A ∈ L λ (T ) with λ L (A) > 0, the process f A from (A × , L λ⊗P (T ⊗ A) A , (λ ⊗ P) LA ) to X satisfies the law in distribution.
332
Y. Sun
Theorem 8.6.5 presents a converse of Theorem 8.5.7; it characterizes macroscopic stability (in terms of empirical distributions) by almost sure pairwise independence. There are no known discrete analogues of the converse law of large numbers in the previous probabilistic literature. Theorem 8.6.5 Let f be a process from the Loeb product space to a separable metric space. If f satisfies the coalitional law in distribution, then the random variables f t are almost surely pairwise independent. Proof Fix a countable open base B for X that is closed under finite intersections. Pick any O1 , O2 ∈ B. Then the process χ O2 ( f ) satisfies the coalitional law. The claim in the proof of Theorem 8.6.3 shows that for almost all (t1 , t2 ) ∈ T × T ,
(χ O1 ( f ))t1 · (χ O2 ( f ))t2 d PL =
(χ O1 ( f ))t1 d PL
(χ O2 ( f ))t2 d PL ,
and hence there exists a (λ ⊗ λ) L -null set N such that for all (t1 , t2 ) ∈ / N, (O1 ) ∩ f t−1 (O2 )) = PL ( f t−1 (O1 )) · PL ( f t−1 (O2 )) PL ( f t−1 1 2 1 2 holds for all O1 , O2 ∈ B. Thus, the joint distribution PL ( f t1 , f t2 )−1 agrees with the product of its marginals on a π-system {O1 × O2 : O1 , O2 ∈ B} for X × X . Hence, PL ( f t1 , f t2 )−1 = PL ( f t1 ⊗ PL f t2 )−1 by a result on the uniqueness of measures (see / N. [7], p. 45). This means that f t1 and f t2 are independent for all (t1 , t2 ) ∈ Remark 8.6.6 The intuitive notion of idiosyncratic uncertainty involves randomness on the individual level, but not on the macroscopic level. Theorems 8.6.3 and 8.6.5 simply characterize idiosyncratic uncertainty by the almost sure versions of uncorrelatedness or pairwise independence. This characterization can be interpreted in different contexts including economic or financial models with individual risks and in physical models involving the random motion of molecules in an equilibrium gas.
8.7 Almost Equivalence of Pairwise and Mutual Independence A classical example of Bernstein shows that there are three events that are pairwise independent but not mutually independent (see [12], p. 126). Similarly, pairwise independence is also strictly weaker than mutual independence for a finite collection of random variables (see [12], p. 220). In this section, we show that pairwise independence is, in fact, almost identical to mutual independence in an ideal setting. To prove this equivalence result, we need two lemmas. The first one shows that uncorrelatedness implies the constancy of joint moment functions for the relevant sample functions. Note that the case for m = 1 in this lemma is Theorem 8.5.3. The second shows that if the second joint moment function of the sample functions is essentially
8 New Understanding of Stochastic Independence
333
constant for each process, then the processes are uncorrelated in m-tuples. Note that stronger forms of the two lemmas with more detailed proofs can be found in Propositions 4.2 and 4.3 in [34]. We note first that the f i of Lemmas 8.7.1 and 8.7.2 is a process itself, not the ith power of a process f . Lemma 8.7.1 Let m be a positive integer. For each i = 1, 2, . . . , m, let f i be a real-valued process on the Loeb productspace with a finite mth moment (with a finite second moment when m = 1), i.e., T × | f i (t, ω)|m d(λ ⊗ P) L < ∞. If for each i with 1 ≤ i ≤ m, the random variables f ti are almost surely uncorrelated, then for (P m ) L -almost all (ω1 , ω2 , . . . , ωm ) ∈ m , E f ω11 f ω22 · · · f ωmm =
T
E f t1 E f t2 · · · E f tm dλ L ,
where (P m ) L is the corresponding Loeb measure for the internal product measure Pm.
Proof Denote T E f t1 E f t2 · · · E f tm dλ L by c. Keisler’s Fubini theorem implies that 2 E f ω11 f ω22 · · · f ωmm − c d(P m ) L (ω1 , . . . , ωm ) m 2 = c − 2c f ω11 (t) f ω22 (t) · · · f ωmm (t)dλ L (t)d(P m ) L m T m m + i=1 f ωi i (t1 )dλ L (t1 ) i=1 f ωi i (t2 )dλ L (t2 )d(P m ) L m T T 2 m = c − 2c i=1 f ti (ωi )d PL (ωi )dλ L T m + i=1 f ti1 (ωi ) f ti2 (ωi )d PL (ωi )d(λ ⊗ λ) L (t1 ,t2 )∈T ×T
= c2 − 2c2 + c2 = 0. The rest is obvious. Lemma 8.7.2 Let m ≥ 2 be a positive integer. For each i = 1, 2, . . . , m, let f i be a real-valued process on the Loeb product space with a finite mth moment. If the joint moment function E f ωi 1 f ωi 2 of the sample functions f ωi is essentially constant on × for each i, then for (λm ) L -almost all (t1 , t2 , . . . , tm ) ∈ T m , f t11 , f t22 , . . . , f tm m
= E f t11 E f t22 · · · E f tm are uncorrelated in m-tuple, i.e., E f t11 f t22 · · · f tm . m m Proof For each 1 ≤ i ≤ m, since the joint moment function E f ωi 1 f ωi 2 is essentially
2 constant, it is easy to check that E f ωi 1 f ωi 2 = t∈T E f ti dλ L for (P ⊗ P) L -almost
2 all (ω1 , ω2 ) ∈ × . Denote t∈T E f ti dλ L by ci . By Keisler’s Fubini theorem, we have
334
Y. Sun
2 1 2 m E f t11 f t22 · · · f tm − E f E f · · · E f d(λm ) L t1 t2 tm m Tm m i m = i=1 f ti (ω1 )d PL i=1 f tii (ω2 )d PL d(λm ) L Tm E f t11 E f t22 · · · E f tm E f t11 f t22 · · · f tm d(λm ) L −2 m m m T 2 E f t11 E f t22 · · · E f tm + d(λm ) L m Tm m = i=1 f i (ti , ω1 ) f i (ti , ω2 )dλ L (ti )d(P ⊗ P) L × T m −2 i=1 f i (ti , ω) f i (ti , ωi )dλ L (ti )d PL (ω)d(P m ) L m T ω∈ 2 m i E f ti dλ L (ti ) +i=1 ti ∈T
m m m ci − 2i=1 ci + i=1 ci = 0. = i=1
The rest is clear. Note that to establish the validity of the above integral identities, we also need some technical integrability conditions; these are formulated as a problem below. The proof is based on the Tonelli theorem for Loeb measures (see [17], p. 204). Problem 8.7.3 Let m be a positive integer greater than or equal to 2. For each i = 1, 2, . . . , m, let f i be a real-valued process on the Loeb product space with a finite mth moment. Then (1) the function G on T m × m defined by m mj=1 f i (ti , ω j ) G(t1 , . . . , tm , ω1 , . . . , ωm ) = i=1
is (λm ⊗ P m ) L -integrable; (2) for (λm ) L -almost all (t1 , . . . , tm ) ∈ T m , f t11 f t22 · · · f tm is integrable over m (, L P (A), PL );
on T m has a finite mth moment with respect to (3) the function E f t11 f t22 · · · f tm m m the measure (λ ) L . Theorem 8.7.4 Let f be a process from the Loeb product space to a separable metric space X . The random variables f t are almost surely pairwise independent if and only if they are almost mutually independent, i.e., for any n ≥ 2, f t1 , f t2 , . . . , f tn are mutually independent for (λn ) L -almost all (t1 , t2 , . . . , tn ) ∈ T n . Proof Fix a countable open base {Ok }∞ k=1 for X that is closed under finite intersections. Take positive integers 1 , 2 , . . . , n . Assume that the random variables f t are almost surely pairwise independent. Then, χ Oi ( f ) has almost surely uncorrelated random variables. By Lemma 8.7.1, the process Eχ Oi ( f ω1 )χ Oi ( f ω2 ) is essentially
8 New Understanding of Stochastic Independence
335
constant. By Lemma 8.7.2, there exists a (λn ) L -null set N1 2 ...n such that for all / N1 2 ...n , (t1 , t2 , . . . , tn ) ∈ PL
n
f t−1 (Oi ) i
n = i=1 PL
f t−1 (Oi ) . i
i=1
Let N be the union of all the null sets N1 2 ...n . Then N is still a (λn ) L -null set, and / N , the above identity holds for all (1 , 2 , . . . , n ) . By the for all (t1 , t2 , . . . , tn ) ∈ / N the fact Extension Theorem in [26] (p. 237), we obtain for all (t1 , t2 , . . . , tn ) ∈ that f t11 , f t22 , . . . , f tnn are mutually independent. The other direction is clear. Remark 8.7.5 Note that pairwise independence is almost the weakest version of independence while mutual independence is almost the strongest. Since their almost versions are shown to be equivalent in the following theorem, we may simply refer to the almost versions as almost independence. Remark 8.7.6 The notion of weak dependence is often presented as follows: If any random variable in a given collection (a sequence or a triangular array) of random variables is approximately independent in some sense to most other random variables in the collection, then this collection is said to be weakly dependent. In our idealized setting, this notion simply means that any random variable is independent of others outside a negligible set; this is precisely the notion of almost sure pairwise independence. It is easy to check that the transferred version of the pervasive mixing conditions (see Sect. 20, [4]) does lead to almost sure pairwise independence. This also means that these mixing conditions are indeed stronger than the asymptotic version of almost sure pairwise independence. Now consider a large population modeled by a hyperfinite process with weakly dependent random variables in the idealized sense. Then by randomly drawing a sequence of random variables from the underlying hyperfinite population, one can certainly expect to obtain a pairwise independent sequence, since the underlying population is almost so. However, it is rather surprising that the resulting sequence is, in fact, mutually independent (see Proposition 8.7.8). That is, sequential draws derive mutual independence from a version of weak dependence. To give a rigorous formulation of this result, we need a suitable σ-algebra on the countable product T ∞ together with a measure. We use Tm to denote the collection of all subsets of T ∞ having the form Am × T ∞ , where Am is some internal set in the internal product algebra T m . Define a set function L(λm ) on Tm by letting L(λm )(Am × T ∞ ) = (λm ) L (Am ). Let T∞ be the union of all the Tm , and let L(λ∞ ) be the set function on T∞ such that L(λ∞ )(C) = L(λm )(C) if C ∈ Tm . By Keisler’s Fubini theorem, L(λ∞ ) is a well defined, finitely additive measure on the algebra T∞ . Problem 8.7.7 The finitely additive measure space (T ∞ , T∞ , L(λ∞ )) can be extended to a countably additive complete measure space (T ∞ , L(T∞ ), L(λ∞ )). We are now ready to present Proposition 8.7.8.
336
Y. Sun
Proposition 8.7.8 Let f be a process from the Loeb product space to a separable metric space X . If the random variables f t are almost surely pairwise independent, then for L(λ∞ )-almost all (t1 , t2 , . . . , tn , . . . ) ∈ T ∞ , the sequence { f tn }∞ n=1 of random variables are mutually independent. Proof For each n ≥ 2, let An be the collection of all the (t1 , t2 , . . . , tn ) ∈ T n such that f t1 , f t2 , . . . , f tn are mutually independent. Then L(λn )(An ) = 1 by Theorem 8.7.4. Let Cn = An × T ∞ and C = ∩∞ n=1 C n . It follows from the fact that one can find an internal set whose symmetric difference with An is null that we have Cn ∈ L(T∞ ) with L(λ∞ )(Cn ) = 1. Hence, C ∈ L(T∞ ) and L(λ∞ )(C) = 1. It is clear that for any (t1 , t2 , . . . , tn , . . . ) ∈ C, the random variables in the sequence { f tn }∞ n=1 are mutually independent.
8.8 Duality of Independence and Exchangeability Another basic concept in probability theory is exchangeability (see [6]). The next theorem shows that the notions of independence and exchangeability are dual to each other in the sense that almost independence (almost exchangeability) of the random variables in a process is equivalent to almost exchangeability (almost mutual independence) of the sample functions of the process. The theorem can also be viewed as a higher order law of large numbers. It claims not only the essential constancy of the distributions of the sample functions but also the essential constancy of the n-variate joint distributions of the sample functions. Theorem 8.8.1 Let f be a process from the Loeb product space to a separable metric space X . Then the following are equivalent: (1) the random variables f t are almost independent; (2) the sample functions f ω are almost exchangeable i.e., for any positive integer n, there is a distribution νn on X n such that the joint distribution of f ω1 , f ω2 , . . ., f ωn is νn for almost all (ω1 , ω2 , . . . , ωn ) ∈ n , where νn (B) = n −1 (B)dλ L (t) for any Borel set B in X n . T PL f t Proof Let {Ok }∞ k=1 be a countable open base of X that is closed under the formation of finite intersections Fix a positive integer n . We assume that (1) holds. Then the random variables f t are almost surely pairwise independent. Let (1 , 2 , . . . , n ) be an n-tuple of positive integers. Then, for each i with 1 ≤ i ≤ n, the random variables in the process χ Oi ( f ) are uncorrelated. By Lemma 8.7.1, we know that there is a (P n ) L -null set N1 2 ...n such that / N1 2 ...n , the joint moment of the sample functions for all (ω1 , ω2 , . . . , ωn ) ∈ χ O1 ( f ω1 ), χ O2 ( f ω2 ), . . . , χ On ( f ωn ) is T
Eχ O1 ( f t )Eχ O2 ( f t ) · · · Eχ On ( f t ) dλ L (t).
8 New Understanding of Stochastic Independence
337
Let N be the union of all the null sets N1 2 ...n . Then N is still a (P n ) L -null set, / N, and for all (ω1 , ω2 , . . . , ωn ) ∈ n λ L ( f ω1 )−1 (O1 ) ∩ · · · ∩ ( f ωn )−1 (On ) = i=1 PL ( f t )−1 (Oi ) dλ L (t) T
integers. Define a distribution νn holds for all n-tuples (1 , 2, . . . , n ) of positive
n on X n by letting νn (B) = T PL ( f t )−1 (B)dλ L (t) for any Borel set B in X n . / N , the joint distribuThe previous identity shows that for all (ω1 , ω2 , . . . , ωn ) ∈ tion of f ω1 , f ω2 , . . . , f ωn and νn agree on all the sets in the π-system for the Borel σ -algebra of X n {O1 × O2 × · · · × On : 1 ≤ 1 , 2 , . . . , n < ∞}, and hence are equal by Dynkin’s π − λ Theorem (see [7], p. 44 and [10], p. 404). Thus, (2) is proven. Next, assume (2) is true. By taking n = 2, we know that for any k ≥ 1, the joint moment function E χ Ok ( f ω1 )χ Ok ( f ω2 ) of the sample functions in the process χ Ok ( f ) is essentially constant. Let (1 , 2 ) be a pair of positive integers. By Lemma 8.7.2, we obtain the fact that for (λ ⊗ λ) L -almost all (t1 , t2 ) ∈ T × T , the random variables χ O1 ( f t1 ) and χ O2 ( f t2 ) are uncorrelated. Thus, there is a (λ⊗λ) L / N1 2 , the random variables χ O1 ( f t1 ) and null set N1 2 such that for all (t1 , t2 ) ∈ χ O2 ( f t2 ) are uncorrelated. Let N be the union of all the null sets N1 2 . Then N is / N, still an (λ ⊗ λ) L -null set, and for all (t1 , t2 ) ∈ PL ( f t1 )−1 (O1 ) ∩ ( f t2 )−1 (O2 ) = PL ( f t1 )−1 (O1 ) PL ( f t2 )−1 (O2 ) holds for all pairs (1 , 2 ) of positive integers. Since the collection {Ok }∞ k=1 generates the Borel σ-algebra of X and is closed under the formation of finite intersections, / N , f t1 and the Extension Theorem in [26] (see p. 237) implies that for all (t1 , t2 ) ∈ f t2 are independent. By Remark 8.7.5, (1) follows. Problem 8.8.2 Formulate and prove the almost equivalence of the pairwise and multiple versions of exchangeability (see [34]). Remark 8.8.3 If the almost independent process f is in addition assumed to be almost identically distributed (such an f is said to be almost iid), then the formula in Theorem 8.8.1 implies that the sample functions are also almost iid. For the general case, let C be the σ-algebra generated by the function PL f t−1 from T to the space M(X ) of distributions on X . Then, for any A ∈ C with λ L (A) > 0, the restricted process f A on the reduced Loeb product space still has almost independent random variables. Thus, for any Borel set B in X n , the equality n PL f t−1 (B)dλ LA λ LA ( f ωA1 , . . . , f ωAn )−1 (B) = A
338
Y. Sun
holds for (P n ) L -almost all (ω1 , . . . , ωn ) ∈ n . This for (P n ) L -almost meansthat n all (ω1 , . . . , ωn ) ∈ n , the product distribution PL f t−1 is simply a regular conditional distribution for ( f ω1 , . . . , f ωn ) given C. Hence, for (P n ) L -almost all (ω1 , . . . , ωn ) ∈ n , f ω1 , . . . , f ωn are iid conditioned on C. This is an analogue of the classical de Finetti theorem (see [6]) in the setting of almost exchangeability. Remark 8.8.4 Since many applied probabilistic models involve not only uncertainty and large number of entities but also time constraints, one is naturally led to use a hyperfinite set to index a large collection of stochastic processes with time and sample parameters. As illustrated in Sect. 8 of [33], many results in this chapter can be routinely extended to such “hyperprocesses”. For example, Theorems 8.5.7, 8.6.5 and 8.7.4 can be restated to the case of general hyperprocesses with continuous or discrete time parameters. Note that the various notions of independence for hyperprocesses should be defined in terms of the finite dimensional distributions of stochastic processes. In particular, such theorems provide a rigorous foundation for the widely used claim (see [33] for some references) that the finite dimensional distributions of almost all empirical processes do not depend on particular sample realizations.
8.9 Grand Unification of Multiplicative Properties Theorem 8.9.2 in this section unifies various seemingly unrelated multiplicative properties of random variables, such as those involving generating functions, characteristic functions, and maxima of random variables. To state the theorem, we need the notion of separating classes for collections of distributions. It is clear from the definition given below that a class is separating for a fixed collection of distributions if and only if it has a countable subclass that is separating for the collection of distributions. Definition 8.9.1 Let D be a collection of Borel probability measures on a separable metric space X , and let E be a class of real or complex valued Borel functions on X such that for any given pair (φ, μ) ∈ E × D, φ is μ-integrable. The class E is said to in E that be separating for D, if there is a sequence {φn }∞ n=1 of functions distinguishes the members of D. That is, μ and ν in D are equal if X φn dμ = X φn dν for all n ≥ 1. Theorem 8.9.2 Let E be a class of real or complex valued Borel functions on a separable metric space X . Let D be a collection of Borel probability measures on X , and let f be a process from the Loeb product space to X . Assume that D contains all the distributions induced by f A and f ωA , A ∈ T with λ L (A) > 0, where f A is the restriction of f to A × with the rescaled Loeb product measure. Let m be a positive integer greater than or equal to two. Assume that E is separating for D and ϕ( f ) is a process with a finite mth moment for each ϕ ∈ E. Then the following are equivalent:
8 New Understanding of Stochastic Independence
339
(1) the random variables f t are almost independent; (2) for almost all (t1 , t2 ) ∈ T × T and for all ϕ ∈ E, E ϕ( f t1 )ϕ( f t2 ) = E(ϕ( f t1 ))E(ϕ( f t2 )); (3) given any integer n with 2 ≤ n ≤ m, if the functions in E are all real-valued,
then for almost all (t1 , t2 , . . . , tn ) ∈ T n , the equality E ϕ( f t1 ) · · · ϕ( f tn ) = Eϕ( f t1 ) · · · Eϕ( f tn ) holds for all ϕ ∈ E; (4) given any integer n with 2 ≤ n ≤ m, if the functions in E are complex-valued, then for (λn ) L -almost all (t1 , t2 , . . . , tn ) ∈ T n , E ϕi1 ( f t1 )ϕi2 ( f t2 ) · · · ϕin ( f tn ) = Eϕi1 ( f t1 )Eϕi2 ( f t2 ) · · · Eϕin ( f tn ) holds for all ϕ ∈ E, where i 1 , i 2 , . . . , i n = 0 or 1, ϕ0 = ϕ, and ϕ1 = ϕ, the complex conjugate of ϕ. Proof For (1) =⇒ (2), simply note that for each ϕ ∈ E, the complex-valued process ϕ( f ) is square integrable with almost surely pairwise independent random variables. Hence, the random variables (ϕ( f t )) are almost surely uncorrelated in the complex case, i.e., (2) holds. We now show (2) =⇒ (1). Take ϕ ∈ E. Then the random variables in the complex-valued process ϕ( f )(t, ω) are almost surely uncorrelated. It follows from Problem 8.5.4 that Eϕ( f ω ) = Eϕ( f ) for PL -almost all ω ∈ . Let μω be the distribution on X induced by the sample function f ω on T , and let μ be the distribution induced by f viewed as a random variable on T × . Then ϕdμ = ϕdμ for PL -almost all ω ∈ . Since E has a countable separating ω X X can find a PL -null subset N of such that for any fixed subclass E0 for D, we ω∈ / N , X ϕdμω = X ϕdμ for all ϕ ∈ E0 . Since μω and μ are in D, the fact that / N . Therefore, f satisfies E0 is separating for D implies that μω = μ for all ω ∈ the law in distribution. The same argument can be used to show that f satisfies the coalitional law in distribution. Using Theorem 8.6.5 and Remark 8.7.5, we obtain (2) =⇒ (1). It is obvious that (3) or (4) implies (2). If (1) holds, then f t1 , f t2 , . . . , f tn are mutually independent for (λn ) L -almost all (t1 , t2 , . . . , tn ) ∈ T n , which implies (3) and (4). Remark 8.9.3 To see the above general unification of multiplicative properties in more concrete situations, we note that both the complex exponentials eiux and the indicator functions of the intervals (−∞, u] form separating classes for all distributions on R, and the class of functions {z x : z ∈ (−1, 1)} is separating for distributions of random variables taking values in the natural numbers. Applying the above proposition to these separating classes leads to the general equivalence of independence and the presence of multiplicative properties involving characteristic functions, maxima of random variables, and generating functions (for precise statements on these specific multiplicative properties, see Sect. 7.4 in [33]). In particular, consider a large
340
Y. Sun
collection of real-valued random variables; if, for essentially every pair of random variables in the collection, the distribution function of the maximum of the pair is the product of the individual distribution functions of the two random variables, then, for essentially every tuple of n random variables chosen from the original collection, the characteristic function of the sum of these n random variables is the product of the individual characteristic functions of the random variables in the tuple. This shows that some notions, though apparently having no relation at all in the finite case, are in fact essentially equivalent in the ideal setting.
8.10 Discrete Interpretations In this section, we consider the discrete interpretations of some results in the ideal setting. Since the procedure for this type of translation usually involves the so-called lifting, pushing-down and transfer operations, and is by now standard, there is no point to attempt the transfer of most results in earlier sections to the large finite setting. Here, we only illustrate the general possibility by presenting two asymptotic results in Propositions 8.10.1 and 8.10.3. We also point out that when an exact result in the measure-theoretic setting is reinterpreted for the discrete case, much of the mathematical elegance may be lost in the process of translation (although the scientific meaning is still retained). We shall now fix some notation for the asymptotic case. Let (, A, P) be a fixed probability space that will be used as the common sample space of the random variables to be considered. For each n ≥ 1, let x1n , x2n , · · · , xnn be random variables from the sample space to a separable metric space X , and let (Tn , Tn , λn ) the finite probability space with Tn = {1, 2, · · · , n} and λn equal to the uniform probability measure defined on power set Tn of Tn . Now integration on (Tn , Tn , λn ) is just the arithmetic average. Define a process f n on Tn × by letting f n (t, ω) = xtn (ω). Such a sequence of processes { f n }∞ n=1 is usually called a triangular array of random variables. For a positive integer m and a separable metric space X , ρm denotes a Prohorov distance on the space of distributions on X m (see [4] for the definition of the Prohorov distance). The following proposition shows that for a triangular array of random variables, asymptotic pairwise independence implies its asymptotic multiple versions; the other implication is clear and omitted here. As noted earlier in Remark 8.7.6, the asymptotic pairwise independence as presented here is simply a version of the usual notion of weak dependence. In particular, it covers the type of mixing conditions discussed in [4]. Proposition 8.10.1 For any tn1 , tn2 , . . . , tnm ∈ Tn , let μtni be the distribution of the random variable f n (tni , ·), and let μtn1 tn2 ...tnm be the joint distribution of the random variables f n (tn1 , ·), f n (tn2 , ·), . . . , f n (tnm , ·). Assume that the collection of distributions
8 New Understanding of Stochastic Independence
341
induced by all the f n on X (viewed as random variables on Tn × ) is tight. For any ε > 0 and n ≥ 1, set m μtni ) ≤ ε}. Tnm (ε) := {(tn1 , tn2 , . . . , tnm ) ∈ (Tn )m : ρm (μtn1 tn2 ...tnm , i=1
If limn→∞ (λn ⊗ λn )(Tn2 (δ)) = 1 for any δ > 0, then limn→∞ (λn )m (Tnm (ε)) = 1 for any ε > 0. Proof We transfer the sequence to the nonstandard universe to obtain a sequence { f n }n∈∗ N of internal processes on the associated sequence {(Tn × ∗ , Tn ⊗ ∗ A, λn ⊗ ∗ P) : n ∈ ∗ N} of internal probability spaces. From the tightness assumption on the processes f n , it follows that for each n ∈ N∞ , the standard part of the f n (tn , ω) exists for almost all (tn , ω) ∈ Tn × ∗ , whence for almost all tn ∈ Tn , f n (tn , ·) has a standard part. Next, we fix n ∈ ∗ N∞ , and for simplicity we omit the subindex n in the rest of this paragraph. From the Spillover Principle, it follows that λ ⊗ λ(T 2 (h)) = 1 for some positive infinitesimal h. Thus, for (λ ⊗ λ) L -almost all (t 1 , t 2 ) ∈ T × T ,
−1 ρ2 (μt 1 t 2 , μt 1 ⊗ μt 2 ) ≤ h. It is easy to see that μt 1 t 2 PL ◦ f t 1 , ◦ f t 2 and μt 1 ⊗ ◦ −1 ◦ −1 μt 2 PL f t 1 ⊗ PL f t 2 , and hence PL
◦
ft 1 , ◦ ft 2
−1
= PL
◦
ft 1
−1
⊗ PL
◦
ft 2
−1
.
Therefore, the random variables ◦ f t are almost surely pairwise independent, and by Theorem 8.7.4, they are also almost surely independent in m-tuple. Now, the topology of weak convergence of distributions on X m restricted to the product measures is simply the m -fold product topology of the topology of weak convergence of distributions on X (see Theorem 3.2 in [4]; p. 21). Therefore, we have for almost all m μ ) 0. m-tuples (t 1 , t 2 , . . . , t m ), ρm (μt 1 t 2 ...t m , i=1 ti Now we resume indicating the subindex n. Fix an ε ∈ R+ . The previous paragraph shows that (λn )m (Tnm (ε)) 1 for any n ∈ ∗ N∞ and for the positive standard real number ε. Hence limn→∞ (λn )m (Tnm (ε)) = 1. Remark 8.10.2 If we start from three random variables that are pairwise independent but not mutually independent, then, by taking independent replicas of the three random variables, we can obtain a sequence in which pairs of random variables are independent but some triples are not mutually independent. This is the usual way of constructing a sequence of pairwise independent but not mutually independent random variables (see, for example, [12], p. 220). Note that most triples in such a sequence are still mutually independent. Proposition 8.10.1 says that this is approximately the general case in the sense that even if one works on a sequence with approximate pairwise independence, “almost all” triples are still approximately mutually independent. That is, one cannot expect the pairs in a sequence to be independent but the triples to be “highly” non-independent.
342
Y. Sun
Next, for concreteness, we assume that the triangular array of random variables { f n }∞ n=1 is real-valued, and we use the Lévy metric on distribution functions. Let F be the space of distribution functions of real-valued random variables. As usual, we assume that distribution functions are right continuous. For F, F ∈ F, let d(F, F ) to be the infimum of all those h for which F(x−h)−h ≤ F (x) ≤ F(x+h)+h whatever be x ∈ R. Then d defines a metric, and the space (F, d) is called L évy’s space (see [26], p. 228). We shall refer to d as the Lévy metric. Note that convergence in (F, d) is equivalent to convergence in distribution. For distribution functions on R2 , we can define a similar distance d2 such that d2 (F, F ) is the infimum of all those positive number h for which F(x −h, y−h)−h ≤ F (x, y) ≤ F(x +h, y+h)+h whatever be (x, y) ∈ R2 , where F and F are distribution functions on R2 . For a given distribution function F(x, y) on R2 with marginal distributions F1 (x) and F2 (y), we can use d2 to define a number ρ2 (F) that measures the degree of independence of random variables with joint distribution F by setting ρ2 (F) := d2 (F(x, y), F1 (x) · F2 (y)). The following proposition, which translates simple versions of Theorems 8.5.7 and 8.6.5 to the discrete setting, shows that for a triangular array of random variables, asymptotic pairwise independence is a necessary and sufficient condition for an asymptotic version of the law of large numbers to hold. The proof, which adopts the routine procedures of lifting, pushing-down and transfer, is omitted. Proposition 8.10.3 Given n ∈ N, let An be a subset of Tn , and let F An be the distribution function of f nAn . For any t1 , t2 ∈ Tn and any ω ∈ , let Ft1 , Ft2 , Ft1 t2 , be, respectively, the distribution functions of the random variables f n (t1 , ·), f n (t2 , ·) and ( f n (t1 , ·), f n (t2 , ·)) on (, P), and let FωAn be the distribution function of the random variable f nAn (·, ω) on (An , λnAn ). For each ε > 0, set Tn (ε) := {(t1 , t2 ) ∈ Tn × Tn : ρ2 (Ft1 t2 ) > ε}, D An (ε) := {ω ∈ : d(FωAn , F An ) > ε}. Assume that for any m ∈ N, limm→∞ sup1≤n<∞ (λn × P)(| f n | > m) = 0. Then the following are equivalent: (1) there is η ∈ (0, 1) such that for any An ⊆ Tn , if λn (An ) > 1 − η, then for any δ > 0, limn→∞ P(D An (δ)) = 0; (2) limn→∞ (λn × λn )(Tn (ε)) = 0 for each ε > 0. Remark 8.10.4 Since the limiting behaviors of triangular arrays of random variables can be captured by processes on Loeb product spaces, the study of such processes can, therefore, be viewed as a way of studying general triangular arrays or sequences of random variables through the systematic applications of existing measure-theoretic techniques in a new context. The corresponding approximate results in a large finite case do reveal new relationships between various notions of independence themselves and between other fundamental probabilistic notions in the asymptotic sense.
8 New Understanding of Stochastic Independence
343
8.11 Notes The framework of Loeb product spaces has played a central role in the discovery of the results presented in this chapter. Once the results are discovered, they can be generalized to various more general settings. The following notes provide a brief summary for those generalizations. The exact law of large numbers and its converse, as shown in Sects. 8.5 and 8.6, are shown via the Fubini property of the Loeb product spaces. Based on that idea, a Fubini extension is formally introduced in [35] as a probability space that extends the usual product probability space and retains the Fubini property. The exact law of large numbers and its converse are shown to hold in this more general framework in Theorem 2.8 of [35]. It is also indicated in a preliminary draft [36] that such a framework of Fubini extension is necessary for showing the validity of the exact law of large numbers and its converse (also announced on p. 533 of [38]). Theorems 8.5.3, 8.6.2 and 8.6.3 on the exact law of large numbers and its converse for real-valued processes are generalized in [25] to vector-valued processes. As noted in Remark 8.8.4, the exact law of large numbers and its converse are also considered in Sect. 5.4 and Sect. 8 of [33] for a large collection of stochastic processes. Similar results are presented in Sect. 2.4 of [35] in the framework of Fubini extension. It is shown in [2] that for a large collection of independent martingales, the martingale property is preserved on the empirical processes. Proposition 8.3.3 indicates that an arbitrarily given process f with independent random variables is never jointly measurable with respect to the usual measuretheoretic product except for the trivial case. It is shown in [13] that such a process f can always be made measurable in a one-way Fubini extension, though the measurability of f in a two-way Fubini extension as in [35] may not be possible in general. As noted in [15], macroeconomic risks are the common random shocks that influence a significant portion of the population while reality suggests that these are supplemented by risks at the individual level that influence a negligible portion of the population. This can be formalized by a process with a continuum of conditionally independent random variables, given the macro level shocks. Based on the framework of Fubini extension, the conditional exact law of large numbers and its converse are shown in [29]. Based on iterated completions of the product probability spaces, the results on the almost equivalence of pairwise and mutual independence in Sect. 8.7 are generalized in [14] to the setting of conditional independence. Proposition 8.4.1 shows the richness of atomless Loeb product spaces as Fubini extensions. Since the external cardinality of a hyperfinite set, in an ultrapower construction based on N (see the earlier Chap. 2 by Loeb), is the cardinality of the continuum, there is a measure preserving bijection between a hyperfinite Loeb counting probability space and the unit interval I = [0, 1] with a probability measure μ. This leads to another construction of a rich Fubini extension with the unit interval [0, 1] as the index space and an extended continuum product probability space as the sample
344
Y. Sun
space (Proposition 5.6 of [35]). However, it is shown in [22] that this new measure μ on I cannot be an extension of the Lebesgue measure. On the other hand, it is shown in [37] that a rich Fubini extension can be constructed in a different way so that the index space is an extended Lebesgue unit interval ([28] contains a construction with a more general index space).
References 1. S. Albeverio, J.E. Fenstad, R. Høegh-Krohn, T. Lindstr øm, Nonstandard Methods in Stochastic Analysis and Mathematical Physics (Academic Press, Orlando, 1986) 2. S. Albeverio, Y.N. Sun, J.-L. Wu, Martingale property of empirical processes. Trans. Am. Math. Soc. 359, 517–527 (2007) 3. R.M. Anderson, Non-standard analysis with applications to economics, in Handbook of Mathematical Economics IV, ed. by W. Hildenbrand, H. Sonnenschein (North-Holland, New York, 1991) 4. P. Billingsley, Convergence of Probability Measures (Wiley, New York, 1968) 5. M. Capi´nski, N.J. Cutland, Nonstandard Methods for Stochastic Fluid Mechanics (World Scientific Publishing Co., River Edge, 1995) 6. Y.H. Chow, H. Teicher, Probability Theory: Independence, Interchangeability and Martingales (Springer, New York, 1978) 7. D.J. Cohn, Measure Theory (Birkhäuser, Boston, 1980) 8. J.L. Doob, Stochastic processes depending on a continuous parameter. Trans. Am. Math. Soc. 42, 107–140 (1937) 9. J.L. Doob, Stochastic Processes (Wiley, New York, 1953) 10. R. Durrett, Probability: Theory and Examples, (Wadsworth, Belmont, 1991) 11. S. Fajardo, H.J. Keisler, Existence theorems in probability theory. Adv. Math. 120, 191–257 (1996) 12. W. Feller, An Introduction to Probability Theory and Its Applications, 3rd edn. (Wiley, New York, 1968) 13. P.J. Hammond, Y.N. Sun, Joint measurability and the one-way Fubini property for a continuum of independent random variables. Proc. Am. Math. Soc. 134, 737–747 (2006) 14. P.J. Hammond, Y.N. Sun, The essential equivalence of pairwise and mutual conditional independence. Probab. Theory Relat. Fields 135, 415–427 (2006) 15. P.J. Hammond, Y.N. Sun, Monte Carlo simulation of macroeconomic risk with a continuum of agents: the general case Econ. Theory 36, 303–325 (2008) 16. C.W. Henson, H.J. Keisler, On the strength of nonstandard analysis. J. Symb. Log. 51, 377–386 (1986) 17. A.E. Hurd, P.A. Loeb, An Introduction to Nonstandard Real Analysis (Academic Press, Orlando, 1985) 18. K.L. Judd, The law of large numbers with a continuum of IID random variables. J. Econ. Theory 35, 19–25 (1985) 19. H.J. Keisler, Hyperfinite model theory, in Logic Colloquium 76, ed. by R.O. Gandy, J.M.E. Hyland (North-Holland, Amsterdam, 1977) 20. H.J. Keisler, Infinitesimals in probability theory, in Nonstandard Analysis and Its Applications, ed. by N.J. Cutland (Cambridge University Press, Cambridge, 1988) 21. H.J. Keisler, Rich and saturated adapted spaces. Adv. Math. 120, 242–288 (1997) 22. H.J. Keisler, Y.N. Sun, Loeb measures and Borel algebras, in Reuniting the Antipodes— Constructive and Nonstandard Views of the Continuum, ed. by U. Berger, H. Osswald, P. Schuster (Kluwer Academic Publishers, Dordrecht, 2001), pp. 111–117
8 New Understanding of Stochastic Independence
345
23. H.J. Keisler, Y.N. Sun, A metric on probabilities, and products of Loeb spaces. J. Lond. Math. Soc. 69, 258–272 (2004) 24. P.A. Loeb, Conversion from nonstandard to standard measure spaces and applications in probability theory. Trans. Am. Math. Soc. 211, 113–122 (1975) 25. P.A. Loeb, H. Osswald, Y.N. Sun, Z.X. Zhang, Uncorrelatedness and orthogonality for vectorvalued processes. Trans. Am. Math. Soc. 356, 3209–3225 (2004) 26. M. Loéve, Probability Theory I, 4th edn. (Springer, New York, 1977) 27. E. Nelson, Radically Elementary Probability Theory (Princeton University Press, Princeton, 1987) 28. K. Podczeck, On existence of rich Fubini extensions. Econ. Theory 45, 1–22 (2010) 29. , L. Qiao, Y.N. Sun, Z.X. Zhang, Conditional exact law of large numbers and asymmetric information economies with aggregate uncertainty. Econ. Theory (published online, 2014) 30. S. Rashid, Economies with Many Agents (The Johns Hopkins University Press, Baltimore, 1987) 31. K.D. Stroyan, J.M. Bayod, Foundations of Infinitesimal Stochastic Analysis (North-Holland, Amsterdam, 1986) 32. Y.N. Sun, Hyperfinite law of large numbers. Bull. Symb. Log. 2, 189–198 (1996) 33. Y.N. Sun, A theory of hyperfinite processes: the complete removal of individual uncertainty via exact LLN. J. Math. Econ. 29, 419–503 (1998) 34. Y.N. Sun, The almost equivalence of pairwise and mutual independence and the duality with exchangeability. Probab. Theory Relat. Fields 112, 425–456 (1998) 35. Y.N. Sun, The exact law of large numbers via Fubini extension and characterization of insurable risks. J. Econ. Theory 126, 31–69 (2006) 36. Y. N. Sun, On the characterization of individual risks and Fubini extension, Mimeo (2007) 37. Y.N. Sun, Y.C. Zhang, Individual risk and Lebesgue extension without aggregate uncertainty. J. Econ. Theory 144, 432–443 (2009) 38. J. Wang, Y.C. Zhang, Purification, saturation and the exact law of large numbers. Econ. Theory 50, 527–545 (2012)
Part V
Economics and Nonstandard Analysis
Chapter 9
Nonstandard Analysis in Mathematical Economics Yeneng Sun
9.1 Introduction Measure-theoretic or probabilistic methods have played a very important role in most areas of modern economics. The dual aims of this chapter are to present some special measure-theoretic properties based on nonstandard constructions and then to illustrate how these properties can be applied to problems in game theory, general equilibrium theory and finance. Though the mathematical methods presented here are based on conventional mathematical formulations, they are not available in the traditional measure-theoretic framework in the sense that they fail on general probability spaces and, in particular, on the unit Lebesgue interval. Explicit examples will be constructed to show that the derived positive results in economics are not valid when traditional measure-theoretic models are used. Thus the measure-theoretic framework adopted in this chapter is crucial not only for the relevant mathematical methods, but also for the economic results obtained with those methods. A brief discussion of each topic covered in the main body of this chapter is given below. The purpose of this discussion is to motivate the general reader who may have no background in economics and also to reveal some underlying coherence among a variety of problems studied here. When an economic agent is required to choose optimal actions under certain constraints, it is often the case that the best actions are not unique. Thus, one is naturally led to work with mappings whose values are nonempty sets. Such mappings are often called correspondences or multifunctions or random sets. In the literature, one usually considers correspondences with measure-theoretic or topological structures. The study of such correspondences and their selections has wide applications in mathematical economics and in other areas; see [7, 43, 66] for some references. Y. Sun (B) Department of Mathematics, National University of Singapore, 10 Lower Kent Ridge Road, Singapore 119076, Republic of Singapore e-mail: [email protected] © Springer Science+Business Media Dordrecht 2015 P.A. Loeb and M.P.H. Wolff (eds.), Nonstandard Analysis for the Working Mathematician, DOI 10.1007/978-94-017-7327-0_9
349
350
Y. Sun
Let F be a correspondence from a probability space (, , μ) to a Polish space X (i.e. a complete separable metric space or just a topological space that can be induced by such a metric). A natural construction arising from F is the set D F consisting of the distributions induced by the measurable selections of F. When the underlying probability space (, , μ) is a general probability space and in particular, the unit Lebesgue interval, D F may lack a number of important regularity properties such as compactness, convexity and preservation of upper semicontinuity (see Example 9.2.2 in the next section). This irregularity bars the basic construction of distributions of correspondences from useful applications where these regularity properties are essential. The purpose of Sect. 9.2 in this chapter is to show that the desired regularity properties do hold if one uses as the underlying probability space a special type of standard probability space arising from nonstandard analysis, the Loeb space introduced in [71]. When the target space X is endowed with a linear structure, one can also consider the set of integrals of the integrable selections of F. It is noted in Sect. 9.2 that the relevant integration theory of correspondences follows trivially from the distribution theory when X is Rn or when X is a compact set in an infinite dimensional space. Game theory was systematically introduced into economics by Von Neumann and Morgenstern in [113]. In 1950, John Nash [78] greatly generalized the equilibrium theory of Von Neumann on zero-sum, two-person games to the general setting of n-person static games with arbitrary payoffs. The equilibrium concept developed in [78], which is called Nash equilibrium in the economic literature, requires each player to maximize his own payoff given other players’ strategies. The proof of the existence of such an equilibrium in randomized (mixed) strategies was a straightforward application of Kakutani’s fixed-point theorem for correspondences. This general methodology can also be routinely applied to show the existence of mixedstrategy equilibria for other classes of games (for games with general action spaces, see [34]; for dynamic games and games with incomplete information, see p. 124 and 152 in [33]). In general, equilibria in non-randomized (pure) strategies may not exist at all (see, for example, [33], p. 29). In a mixed-strategy Nash equilibrium, each player is only required to choose a probability distribution over his action space rather than an action itself. Such an equilibrium is often criticized for its lack of sufficient behavioral evidence in the real world. However, Dvoretsky, Wald and Wolfowitz did provide a general blanket result in their joint work in 1950–51 to guarantee the existence of equivalent pure strategies (called purification) for statistical decision problems and games that are based on atomless measures and finite action spaces (see [23], Theorem 4 in [24] and Theorem 2.1 in [25]). This result is a generalization of Lyapunov’s theorem on the range of a finite dimensional vector measure. As illustrated in Sect. 3 in [25], one can obtain pure strategies with the same distributions and the same level of payoffs as any given mixed strategies. Many results on pure-strategy Nash equilibria for atomless games with finite actions (for example, [77, 86, 94]) can simply be viewed as interpretations of this general purification result in various situations. Since Lyapunov’s theorem fails in an infinite dimensional space, the general procedure of Dvoretsky, Wald and Wolfowitz for the elimination of randomization only
9 Nonstandard Analysis in Mathematical Economics
351
works for finite action spaces. However, in the particular case of statistical decision problems, the purification result for a finite decision space can indeed be extended to the case when the decision space is compact (see Sect. 4 in [25]). This leaves the status of the problem on pure-strategy equilibria for particular forms of atomless games with infinite actions completely uncertain. Speculations that are opposite to each other exist in the literature. For example, Milgrom and Weber ([77], p. 630) claimed that pure-strategy equilibria may not exist for atomless static games with general compact action spaces and with incomplete, diffuse and disparate information, while Fudenberg and Tirole made an opposite claim in [32] (p. 236). Both claims could be somewhat justified by the results in [25]. The purpose of Sects. 9.3 and 9.4 of the present chapter is to provide some definite answers (both negative and positive) to the problem on pure-strategy equilibria for atomless static games with infinite actions. We first consider the negative results. Explicit examples are constructed to show that for any given uncountable compact metric action space, there are atomless games based on the unit Lebesgue interval that have no pure-strategy equilibria at all (see Sects. 9.3.2 and 9.4.1). This also implies the failure of purification for such static games in the Lebesgue setting (see Remarks 9.3.6 and 9.4.4). Since uncountable action spaces arise naturally in economics (for example, the n-dimensional price simplex as in [16]), it is important to know how far one can proceed without the restriction of Lebesgue measure. For this purpose, one does need to go beyond the general purification procedure pioneered in [23–25]. Based on the distribution theory of correspondences on Loeb spaces in Sect. 9.2, we do obtain some positive results. In particular, if the atomlessness is formulated on the basis of Loeb spaces, then a general existence theory for pure-strategy equilibria in atomless games can be developed directly through a routine fixed-point argument (see Sects. 9.3.1 and 9.4.3). Note that atomless measures are usually used to model the negligibility of a large number of economic agents or diffuseness of information (see, for example, [8, 23]). Section 9.3 focuses on games with many players whose player space is an atomless probability space, while Sect. 9.4 considers the case of diffuse information. As noted earlier, correspondences arise naturally when economic agents face non-unique best actions. If uncertainty is also explicitly introduced into an economic model with many participating agents, then the optimal choices under uncertainty for the economy as a whole will give a set-valued process, i.e., a set-valued function on the joint space of agent names and sample realizations. If uncertainty only appears on the individual level, then some associated set-valued process will have almost independent correspondences. One may hope to have a sort of law of large numbers for such a set-valued process. However, Proposition 8.3.3 in Chap. 8 shows that joint measurability and independence are never compatible with each other except for some trivial cases. One thus needs a suitable framework for the study of set-valued processes with independence. As shown in Chap. 8, the larger measure-theoretic framework that is based on the Loeb space of the internal product of two internal probability spaces (the Loeb product space) is rich enough to resolve the incompatibility of joint measurability and
352
Y. Sun
independence. The same framework is also used to study independent set-valued processes in Sect. 9.5 below. As an analogue of the characterization of the validity of a stronger version of the law of large numbers (called the coalitional law or macroscopic stability of sample functions) for a point-valued process in Chap. 8, Theorem 9.5.7 shows that versions of almost independence are necessary and sufficient for macroscopic stability of the distributions of sample correspondences in a set-valued process. Note that the almost independence condition on a set-valued process may not be preserved by its selections. Widespread correlations may exist in some selections. Theorem 9.5.8 shows that such widespread correlations can be removed via redistributions. As in the case of the classical law of large numbers for a sequence of real-valued random variables, there is a sizable literature on the law of large numbers and its applications in the setting of a sequence of correspondences (see [6] and the references in [107]). These sequential laws have mostly focused on the iid case in terms of means. Different versions of these laws may need to be treated in a different way. In comparison, our Theorem 9.5.7 in the distributional form allows a very simple and uniform treatment for all kinds of integrals; it has neither compactness nor boundedness assumptions, nor any distributional restrictions such as requiring identical distributions (see [107]). The iid condition on correspondences, as commonly used in the previous literature on the sequential laws, is a very strong condition. On the other hand, the necessity of almost independence for the validity of macroscopic stability shows that this condition is essentially the weakest possible one for deriving the law of large numbers. This necessity result and the result on removing widespread correlations in Theorem 9.5.8 have no known sequential analogues in the previous literature. Note that the almost independence condition can be viewed as a version of the usual condition of weak dependence (see Remark 8.7.6 in Chap. 8 or p. 437 in [106]). Its asymptotic analogue is much weaker than the pairwise independence condition used in [27] in the sequential setting. General equilibrium theory is the study of the market interaction of a large number of economic agents with partially conflicting interests. A basic problem is the existence of an equilibrium price system under which each agent chooses a best action available to him and yet the demand and supply are balanced in every commodity market. The classical solution to this problem was summarized in the elegant monograph [16] of Debreu. The problem of stochastic consistency in a general equilibrium model was considered by Hildenbrand in [42], where independent shocks are allowed in individual agents’ endowments and preferences. Based on the classical law of large numbers, the existence of approximate deterministic equilibrium price systems was shown for a sequence of random economies with strong assumptions on agents’ preferences (including strict convexity and completeness). In comparison with the literature on measure-theoretic economies as summarized in [43], this treatment is hardly satisfactory. When the commodity space itself is non-convex, the method used in [42] is no longer applicable since one cannot introduce strict convex preferences in this case. Of course, it was not possible to adopt the usual measure-theoretic framework to give a meaningful study of this stochastic consistency problem due to the incompatibility of joint measurability and independence.
9 Nonstandard Analysis in Mathematical Economics
353
Based on the exact versions of the law of large numbers developed in Chap. 8 and in Sect. 9.5 of the present chapter, Sect. 9.6 below illustrates how risks or uncertainty on the individual level can be effectively handled in a general equilibrium model. The main result of the section, Theorem 9.6.1, roughly says that from a macroscopic point of view, the random economies behave exactly the same as a deterministic economy in terms of prices, mean excess demand correspondences and Walras distributions; moreover, uncertainty completely disappears. One advantage of the approach used here is that results on deterministic measure-theoretic economies can be applied directly to random economies. Similar results for random economies with non-convex commodity spaces can also be established based on their deterministic analogues (see Theorems 4 and 5 in [107]). In contrast to the discrete approach in [42] where only the existence of approximate deterministic equilibrium price systems has been shown, the exact approach adopted here allows one to show much more in Theorem 9.6.1. In particular, it is shown that in almost all states of nature, the random economies have exactly the same mean excess demand correspondence and the same nonempty set of deterministic equilibrium prices. The competitive equilibria corresponding to a common deterministic price system are also essentially the same in terms of distributions. More interestingly, each Walras distribution can be achieved by a “global” competitive equilibrium resulting from stochastically independent actions of the individual agents (in spite of possible widespread correlations due to multiple choices); this also provides competitive equilibria with a common distribution across different states of nature. The preference relations considered are, moreover, very general in the usual framework of measuretheoretic economies; in particular, they can be non-convex and non-complete. Processes on hyperfinite Loeb product spaces form a “correct” class for the stochastic modeling of a large number of economic entities moving in an uncertain environment. The reason behind this claim is that the general transfer principle in Chap. 2 guarantees that such hyperfinite processes capture the asymptotic properties of large finite numbers of random variables. Chapter 8 and Sects. 9.5 and 9.6 provide a systematic study of individual specific randomness in terms of almost independent (uncorrelated) hyperfinite processes. In Sect. 9.7 below, risks or uncertainty not necessarily coming from the individual level are analyzed via an understanding of the probabilistic structure of general hyperfinite processes. In particular, Theorem 9.7.1 provides a structural result for a real-valued and square integrable process f on the Loeb product space. The main features are the endogenous derivation of factors through the Karhunen-Loéve biorthogonal expansion and the complete removal of residual risks through diversification. Theorem 9.7.5 shows that though the collection of all the random events in a large market may be very large, those events summarizing the randomness in all the non-negligible coalitions can be generated by just one random variable. Moreover, the individual economic entities move independently, conditioned on this single random source. These results on general risk analysis can also be applied to particular financial models to discover some new phenomena. First, by considering a large number of assets in terms of a hyperfinite process, Theorem 9.7.1 says that naive diversification is indeed useful, since those asset-specific risks exist in general and can be removed
354
Y. Sun
through this procedure. Second, it is pointed out that the factor risks can be decomposed into two further types of risks. This means that there are, in fact, three types of risks in a large asset market instead of the two types of risks studied in two standard and competing models of asset pricing, namely, the Capital-Asset-Pricing Model (CAPM) of Sharpe-Lintner in [70, 95] and the Arbitrage Pricing Theory (APT) of Ross in [93]. Section 9.7.2 presents one result in the hyperfinite asset pricing theory developed in [58, 63]. It is shown in Theorem 9.7.7 that the absence of arbitrage opportunities is not only sufficient, but in contrast to the literature, also necessary for the validity of the APT pricing formula. Economists and geneticists, among others, have implicitly or explicitly assumed the exact law of large numbers for independent random matching in a continuum population, by which we mean an atomless measure space of agents. This result is relied upon in large literatures within general equilibrium theory, game theory, monetary theory, labor economics, illiquid financial markets and biology; see [18, 19] for some of the references. Section 9.8 considers the very special case of independent universal random matching, where agents are matched uniformly and independently. Nonstandard analysis provides a very convenient tool for constructing such a random matching. For earlier applications of nonstandard analysis in economics, the reader is referred to the chapter [1] by Anderson, the monograph [88] by Rashid on general equilibrium theory, and the chapter [67] by Kopp on option pricing. The reader can also find extensive references in [1, 2, 67, 88]. Since good surveys on earlier work are already available, we only present some recent applications of nonstandard methods to economics here. Some incomplete notes on work not discussed in the main body of this chapter are also given in Sect. 9.9. The results presented in Sects. 9.2, 9.5, 9.6 and 9.7.1 are taken from the author’s papers [102–107]. The rest of the results in this chapter (Sects. 9.3, 9.4, and 9.7) are joint work of the author with M.A. Khan, L. Qiao, K.P. Rath, S. Yamashige, or Z. Zhang [51, 60, 63, 84, 91]. Detailed references to these results can be found in Sect. 9.9. Even after economic models with infinitely many traders had been popularized among mathematical economists for some time, most economists were still skeptical of results on such “large” economies. One could wonder whether such results have any relevance at all with real markets that have only large but finitely many traders. Thus, an active program in mathematical economics was to derive approximate results for large but finite economies from results on limit economies. Nonstandard analysis provided a very convenient tool for the approach “from limit to approximate” through the routine procedures of lifting, pushing-down and transfer (see [1, 88]; for detailed expositions in the game-theoretic setting, see [59, 60]). Any results proved for general measure-theoretic economies are also valid on Loeb spaces, and they thus are equivalent to some internal results via the procedures of lifting and pushing-down. The internal results can be transferred back to some standard large finite results by applying the transfer principle as stated in Chap. 2. Thus, the anxiety over results on general measure-theoretic models can be alleviated to a large extent by noting that the existence of corresponding “large finite” results are guaranteed using nonstandard analysis.
9 Nonstandard Analysis in Mathematical Economics
355
We claim on the following grounds that the nonstandard technique for the approach “from limit to approximate” is better than the earlier one using weak convergence of distributions as in [43]. First, the nonstandard procedure is guaranteed by an underlying meta-mathematical principle, and is thus routine for different problems, while the one using weak convergence is not. Second, the nonstandard technique may lead to better approximate results in some situations since the relevant sets of internal functions carry more information than the compact sets of some induced distributions (see [67] for an example on option pricing). The other approach where nonstandard methods have been very fruitful is “from approximate to limit”; see earlier chapters and their references for applications in mathematics. The routine procedures of lifting, pushing-down and transfer together with principles like spillover and saturation can be used again to translate standard or nonstandard approximate results to some exact results on Loeb spaces. It is still better than the use of classical weak convergence arguments for the same reason as above. Many theorems have been proved or strengthened by this nonstandard method. Proposition 9.2.4 in Sect. 9.2 below is also proved this way. In Sect. 9.4.2 of this chapter, a sequence of large finite games is constructed to show how some approximate pure-strategy equilibria can be captured by an exact equilibrium of a game on a Loeb space, while at the same time, their limit in the classical setting disappears. This explicitly shows that the Loeb measure framework does provide a better limiting model for some game-theoretic phenomena. The general nonstandard procedure of deriving nonstandard results from large finite results and vice versa is simply called asymptotic implementation in [59, 60, 62]. Now the question is how to obtain approximate or exact results without using asymptotic implementation. One may do everything in the standard universe by using large finite computations through ε-approximations. One can also use nonstandard internal arguments based on infinitesimal approximations. In some very rough sense, the standard approximation approximates a limiting object from below within an ε-distance, while the nonstandard one approximates from above within an infinitesimal distance. Infinitesimal arguments may often allow more intuition even to those who do not use them. One example is the work of Brown and Loeb [11] that did realize the infinitesimal motivation in the standard measure-theoretic work of Aumann on the value equivalence theorem for exchange economies. Unfortunately, there is a significant barrier to the widespread adoption of infinitesimal arguments in economics due to the use of nonstandard methodology that is unfamiliar to most people ([1], p. 2147). One associated barrier is that relevant standard theorems may often need to be treated in a nonstandard way in order to be applicable in an infinitesimal argument. In Chap. 8, one more approach for using nonstandard analysis, called “from limit to limit”, is emphasized. One simply works with the special limiting entities arising from nonstandard constructions themselves throughout a typical proof by applying conventional mathematical techniques. In particular, only the special measuretheoretic properties of Loeb spaces are used there. In this chapter, all the nonstandard results are expressed in conventional measure-theoretic terms, and all their proofs, except that of Proposition 9.2.4, are based on the applications of standard measure-
356
Y. Sun
theoretic methods to the Loeb measure framework. Thus the heart of this chapter is also based on the approach “from limit to limit”. One advantage of this approach is that the barriers mentioned in the above paragraph no longer exist since no infinitesimals or internal arguments are used. Moreover, since the special measure-theoretic properties (in both mathematics and economics) of Loeb spaces proved here fail for general probability spaces and in particular for the unit Lebesgue interval, we are thus armed with some new measure-theoretic and probabilistic methods that can do what the traditional methods cannot do in economics and related mathematics.
9.2 Distribution and Integration of Correspondences 9.2.1 Distribution of Correspondences A correspondence is a mapping whose values are nonempty sets. Let G be a correspondence from a probability space (, , μ) to a Polish space X . A measurable mapping g : → X is called a selection of G if g(t) ∈ G(t) for μalmost all t ∈ . The correspondence G is said to be measurable if its graph {(t, x) ∈ × X : x ∈ G(t)} belongs to the product σ-algebra ⊗ B(X ), where B(X ) denotes the Borel σ-algebra on X . It is well known that every measurable correspondence has a selection (see [7] or [43]). The correspondence G is said to be closed (compact) valued if G(t) is a closed (compact) subset of X for all t ∈ . If G is closed valued, then the measurability of G as a correspondence is equivalent to the fact that G −1 (O) is measurable for any open set O in X (see [7] or [13]). Note that for a set B in X , G −1 (B) = {t ∈ : G(t) ∩ B = ∅}. Let d be a metric on X . For a point x ∈ X and a nonempty subset B of X , let the distance d(x, B) from the point x to the set B be inf{d(x, y) : y ∈ B}. For nonempty subsets A and B of X , the Hausdorff distance ρ(A, B) between the sets A and B is defined by setting ρ(A, B) = max{sup x∈A d(x, B), sup y∈B d(y, A)}. Let F X be the hyperspace of nonempty closed subsets of X . When X is compact, the metric space (F X , ρ) is also compact. Moreover, for a compact valued correspondence G from (, , μ) to X , the measurability of G as a correspondence is equivalent to the measurability of G as a mapping into the hyperspace (F X , ρ) (see [17, 43]). For a measurable mapping g from (, , μ) to X , we use μ g −1 to denote the Borel probability measure on X induced by g; it is often called the distribution of g. Let M(X ) be the space of Borel probability measures on X endowed with the topology of weak convergence of measures (see [10]). Note that this topology on M(X ) can also be induced by the Prohorov metric on M(X ), which is defined by setting
9 Nonstandard Analysis in Mathematical Economics
357
δ(ν1 , ν2 ) = inf{ε > 0 : ν1 (E) ≤ ν2 (B(E, ε)) + ε}. Here, the infimum is taken over all Borel measurable sets E in X , and for any ε > 0, B(E, ε) = {x ∈ X : ∃y ∈ E, d(x, y) < ε} (see [83], p. 75). Note that the definition of the Prohorov metric typically involves two inequalities instead of the one inequality used here (see [10]). However, the two definitions are equivalent (see [83]). Definition 9.2.1 For a correspondence G from (, , μ) to X , let the distribution of G be given by DG = {μ g −1 : g is a selection of G}. If the correspondence G is measurable, then standard results on the existence of selections guarantee that DG is nonempty. Here we are concerned with some regularity properties of DG , such as convexity and compactness. The following canonical example shows, however, that it is possible for DG to be neither convex nor compact even in the simple case when X is the closed bounded interval [−1, 1] and the probability space (, , μ) is the unit interval [0, 1] endowed with the Lebesgue measure structure. Example 9.2.2 Let F be a correspondence from the unit interval [0, 1] with Lebesgue measure τ to the interval [−1, 1] such that F(t) = {t, −t} for all t ∈ [0, 1]. Let f 1 (t) =
t, −t,
t ∈ [0, 1/2] t ∈ (1/2, 1]
and f 2 (t) = − f 1 (t). Since [−1, 1] is compact, M([−1, 1]) with the topology of weak convergence of measures is indeed compact [10]. Let τ ∗ be the measure (1/2)τ f 1−1 + (1/2)τ f 2−1 . Then τ ∗ is the uniform probability measure on [−1, 1]. / D F . To see this, We note that no selection of F has τ ∗ as its distribution, i.e., τ ∗ ∈ suppose there were such a selection f . Let E = {t ∈ [0, 1] : f (t) ∈ [0, 1]}. Then τ ∗ (E) = τ ( f −1 (E)) = τ (E). Since τ ∗ = (1/2)τ , we have τ (E) = 0 . Hence, / E}) = 1, 1/2 = τ ∗ ([−1, 0]) = τ f −1 ([−1, 0]) = τ ({t ∈ which is a contradiction. Moreover, it can be checked that τ ∗ is a limit point of D F . Hence, D F is neither convex nor compact. Example 9.2.2 shows that one cannot obtain some desired regularity properties for distributions of correspondences on a general probability space. The point to be made below is that if one restricts attention to a correspondence F on a special type of probability space, the Loeb space, then one can obtain important regularity properties for the set D F , including convexity, compactness, and preservation of upper semicontinuity. Note that these three properties are often needed for the application of a fixed-point argument (see the next section). The general theory for distribution of correspondences on Loeb spaces was developed by the author in [103], where the target space X is a Polish space. For simplicity, however, we shall assume in the rest of this subsection that X is a compact metric space.
358
Y. Sun
We shall now work with those correspondences that are defined on a Loeb probability space (T, L λ (T ), λ L ). The reader is referred to Chap. 6 for some basic properties of Loeb spaces. The following is Kuratowski’s notion of a topological limit for a sequence of sets in a topological space. Definition 9.2.3 Let Y be a topological space. If A1 , A2 , . . . are subsets of Y , then limn→∞ supAn denotes the set of all y ∈ Y such that every neighborhood of y intersects infinitely many An . The following proposition shows that topological limits are preserved under distribution of correspondences when the underlying probability space is a Loeb space. By taking each of the correspondences Fn to be the correspondence F in Example 9.2.2, we can observe that this property is not valid for correspondences on the Lebesgue unit interval. Proposition 9.2.4 Let {Fn }∞ n=1 be a sequence of measurable correspondences from the Loeb space (T, L λ (T ), λ L ) to a compact metric space X with a metric d. If the correspondence F is defined by setting F(t) = limn→∞ sup Fn (t) for each t ∈ T , then limn→∞ sup D Fn ⊆ D F . Proof Let { f n }∞ n=1 be a sequence of functions such that for each n ≥ 1, f n is a selection of Fn . We need to show that if a subsequence of {λ L f n−1 }∞ n=1 converges weakly to some Borel probability distribution ν on X , then ν ∈ D F . Without loss of generality, we assume that the whole sequence {λ L f n−1 }∞ n=1 converges weakly to ν. We first show that limn→∞ d( f n (t), F(t)) = 0 for all t ∈ T . Suppose not; then ∞ there is a t ∈ T , a δ0 > 0, and a subsequence { f n k (t)}∞ k=1 of { f n (t)}n=1 such that d( f n k (t), F(t)) > δ0 . By the compactness of X , the sequence { f n k (t)}∞ k=1 in X has a limit point x, which implies that d(x, F(t)) ≥ δ0 . By the definition of F(t), we know that x ∈ F(t), which contradicts d(x, F(t)) ≥ δ0 . Hence, the sequence {d( f n (t), F(t))}∞ n=1 also converges to zero for all t ∈ T (and thus also in measure). Next, for each n ≥ 1, let gn be an internal lifting of f n (see Theorem 6.2.5 in Chap. 6). The sequence {gn }n∈N of internal functions can be extended to an internal sequence {gn }n∈∗ N of functions from T to ∗ X . Since the compact valued correspondence F also induces a Loeb measurable mapping from (T, L λ (T ), λ L ) to the hyperspace (F X , ρ), F has an internal lifting G, which is a mapping from T to ∗ F . Now, the sequence {◦ (∗ d(g (t), G(t)))}∞ also converges to zero in measure. X n n=1 Hence, for each positive integer k, there is a positive integer Nk such that for any n > Nk , (9.1) λ {t : ∗ d(gn (t), G(t)) < 1/k} > 1 − 1/k. By the Spillover Principle (see Theorem 2.8.12 in Chap. 2), there is an Nk ∈ ∗ N∞ such that for any positive integer k and for any n ∈ ∗ N with Nk ≤ n ≤ Nk , Eq. 9.1 still holds. Choose H ∈ ∗ N∞ such that H ≤ Nk for all k. Then for any n ∈ ∗ N∞ with n ≤ H , we have λ L ({t : ∗ d(gn (t), G(t)) ≈ 0}) = 1.
(9.2)
9 Nonstandard Analysis in Mathematical Economics
359
Take a dense sequence {ϕm }∞ m=1 in the space C(X ) of continuous functions on X with the supremum norm. This sequence is a separating class for all distributions on X (see Definition 8.9.1 in Chap. 8 and also Theorem 6.1 on p. 40 in [80]). Since λ L f n−1 converges weakly to ν, we can find for each pair (m, k) of positive integers, a natural number Nmk such that for any natural number n ≥ Nmk ,
∗
ϕm ◦ gn dλ −
T
X
ϕm dν < (1/k).
(9.3)
By the Spillover Principle again, there is a hyperinteger Mmk in ∗ N∞ such that for any n ∈ ∗ N with Nmk ≤ n ≤ Mmk , Eq. 9.3 still holds. It follows from ℵ1 -saturation that there is an η in ∗ N∞ such that η ≤ H and η ≤ Mmk for all natural numbers m, k ≥ 1. Hence, for every positive integer m, we have
∗ T
ϕm ◦ gη dλ ≈
ϕm dν.
(9.4)
X
Let f = ◦ gη . Now, Eq. 9.4 implies that X ϕm dλ L f −1 = X ϕm dν for all m ≥ 1. Since the sequence {ϕm }∞ m=1 is a separating class for all distributions on X , we have λ L f −1 = ν. By Eq. 9.2, we know that f is also a selection of F, and the proof is complete. Based on Proposition 9.2.4, we can prove three essential properties for the distribution of correspondences on Loeb spaces, namely, convexity, compactness, and preservation of upper semicontinuity. Theorem 9.2.5 Let F be a compact valued correspondence from (T, L λ (T ), λ L ) to a compact metric space X . Then D F is compact in the space M(X ) of probability distributions. ∞ Proof Choose a sequence {νn }∞ n=1 from D F . Assume that {νn }n=1 converges weakly to some probability distribution ν on X . Choose a sequence { f n }∞ n=1 of selections of F such that λ L f n−1 = νn for each n ≥ 1 . Let G be a new correspondence from T to X such that for each t ∈ T , G(t) is the closure of the set { f n (t) : n ≥ 1} in X . Then, G is a closed valued measurable correspondence. Set Fn = G for each n ≥ 0. By Proposition 9.2.4, ν ∈ DG . Since F is compact valued, we have G(t) ⊆ F(t) for almost all t ∈ T . Thus ν ∈ D F , and hence the compactness of M(X ) implies that D F is compact.
To establish convexity, a continuous version of the well known marriage lemma is needed (see [43], p. 74 or [90]). m be a finite Lemma 9.2.6 Let (, , μ) be an atomless measure space. Let {Si }i=1 m Assume sequence in and {αi }i=1 a finite sequence of nonnegative real numbers. m m S ) = that μ(∪i∈I Si ) ≥ i∈I αi for any I ⊆ {1, 2, . . . , m}, and μ(∪i=1 i i=1 αi . m Then there exist disjoint sets {Ti }i=1 such that Ti ⊆ Si and μ(Ti ) = αi for all i = 1, 2, . . . , m.
360
Y. Sun
Theorem 9.2.7 Let F be a correspondence from (T, L λ (T ), λ L ) to a compact metric space X . If the Loeb space (T, L λ (T ), λ L ) is atomless, then D F is convex. Proof Pick ν1 and ν2 from D F and α ∈ [0, 1]. Then there are selections f 1 and f 2 of F such that λ L f 1−1 = ν1 and λ L f 2−1 = ν2 . Let ν = αν1 + (1 − α)ν2 . Let G be the correspondence such that G(t) = { f 1 (t), f 2 (t)} for all t ∈ T . mk of nonempty Since X is compact, we can find, for each k ∈ N, a partition {Aik }i=1 k k Borel subsets of X with d(Ai ) < (1/k), where d(Ai ) is the diameter of the set Aik . Let Tik = f 1−1 (Aik ) ∪ f 2−1 (Aik ). Let D be any finite subset of {1, . . . , m k }. Now for each = 1, 2,
λ L f 1−1 (∪ j∈D Akj ) ∪ f 2−1 (∪ j∈D Akj ) ≥ λ L f −1 (∪ j∈D Akj ) = ν ∪ j∈D Akj , and hence
ν(Akj ). λ L ∪ j∈D T jk ≥ ν ∪ j∈D Akj = j∈D
Since λ L is atomless, the continuous version of the marriage lemma as stated in mk of T with Sik ⊆ Tik and Lemma 9.2.6 implies the existence of a partition {Sik }i=1 k k λ L (Si ) = ν(Ai ) for each i. Define a Loeb measurable function gk : T −→ X such that gk (t) =
f 1 (t), t ∈ Sik ∩ f 1−1 (Aik )
f 2 (t), t ∈ Sik ∩ f 2−1 (Aik ) − f 1−1 (Aik ) .
It is clear that gk is a selection of G. Let νk be the distribution λ L gk−1 of gk . Given an arbitrary Borel set E in X , set J = {i : E ∩ Aik = ∅, 1 ≤ i ≤ m k }. We then have
ν(Aik ) ν(E) = ν ∪i∈J E ∩ Aik ≤ =
i∈J
λ L (Sik ) = λ L ∪i∈J gk−1 (Aik )
i∈J
= λ L gk−1 ∪i∈J Aik ≤ λ L gk−1 (B(E, 1/k)) , which implies that the Prohorov metric δ(ν, νk ) is at most 1/k. Thus {νk }∞ k=1 converges weakly to ν on X . Since νk ∈ DG and G is compact valued, Theorem 9.2.5 implies that ν ∈ DG , whence ν ∈ D F . The proof is complete. Definition 9.2.8 A correspondence G from a topological space Y to another topological space Z is said to be upper semicontinuous at y0 ∈ Y if for any open set U that contains G(y0 ), there exists a neighborhood V of y0 such that y ∈ V implies that G(y) ⊆ U .
9 Nonstandard Analysis in Mathematical Economics
361
Theorem 9.2.9 Let Y be a metric space and F a compact valued correspondence from T ×Y to a compact metric space X , where (T, L λ (T ), λ L ) is a Loeb probability space. Assume that for each fixed y ∈ Y , F(·, y) (denoted by Fy ) is a measurable correspondence from T to X . If for each fixed t ∈ T , F(t, ·) is upper semicontinuous on the metric space Y , then the correspondence D Fy from Y to M(X ) is upper semicontinuous. Proof Note that a compact valued correspondence from a metrizable space to a compact metrizable space is upper semicontinuous if and only if it has a closed graph (see Proposition 1.4.8 on p. 42 of [7]). Therefore this result on upper semicontinuity follows from Proposition 9.2.4. Remark 9.2.10 The correspondence F is not assumed to be measurable in Theorem 9.2.5. For the convexity result in Theorem 9.2.7, F is not required to be measurable or close valued. Also the Loeb space in Theorems 9.2.5 and 9.2.9 is not assumed to be atomless. When X is a Polish space, the results on compactness and convexity in Theorems 9.2.5 and 9.2.7 still hold without any additional condition; the upper semicontinuity result in Theorem 9.2.9 is valid provided that there is a compact valued correspondence G such that F(t, y) ⊆ G(t) for all t and y (see [103]). In game theory and other areas, the general existence of equilibria often depends on the convexity of strategy spaces. To convexify the action spaces, strategies involving probability distributions on the action spaces (called mixed strategies) are considered. In contrast, strategies involving only the original action spaces are called pure strategies. The following proposition roughly says that if one is given a mixed strategy G, then a pure strategy f with the same distribution can be found that is a selection of a correspondence formed by taking the support of the distribution G(t). When a mixed strategy is optimal, so are almost all points in the support of G(t). This means that the pure strategy f is also optimal. Thus, we have an analog of the general purification result in [23–25] in the setting of a general Polish space. Proposition 9.2.11 Assume that the Loeb probability space (T, L λ (T ), λ L ) is atomless and that G is a measurable mapping from (T, L λ (T )) to the space M(X ) of probability measures on a Polish space X . Then there is a measurable mapping f from (T, L λ (T )) to X such that (1) for each t ∈ T , f (t) ∈ supp G(t), where supp G(t) is the support of the probability measure G(t) on X , (2) for every Borel set B in X , λ L ( f −1 (B)) = G(ω)(B)dλ L .
9.2.2 Integration of Correspondences The theory of integration of correspondences has been widely used in mathematical economics and other areas (see, for example, [43, 66]). Here we point out that the
362
Y. Sun
integration theory for correspondences taking values in a finite dimensional space follows trivially from the relevant distribution theory. Similar claims can be made for the infinite dimensional case when the correspondences take values in some compact subset of an infinite dimensional space (in norm, or weak, or weak* topologies). Thus the integration theory for correspondences in the finite dimensional setting or for correspondences taking values in some compact set in an infinite dimensional space has its independent mathematical interest only on those underlying measure spaces where the distribution theory fails. We substantiate these claims in the following remarks. Remark 9.2.12 Let F be a correspondence from an atomless probability space (, , μ) to Rn . Assume that the results on compactness, convexity and upper semicontinuity for distribution of correspondences from (, , μ) to a Polish space (Rn in particular) hold as in Remark 9.2.10. For example, one may take (, , μ) to be an atomless Loeb space as in Sect. 9.2.1. Let In = {ν ∈ M(Rn ) : the norm function x is ν-integrable}. Define a mapping L from In to Rn by L(ν) = the correspondence F
x∈Rn
x dν(x). Then the integral of
Fdμ =
f dμ : f is an integrable selection of F
can simply be rewritten as {L(ν) : ν ∈ In ∩ D F }. Since In and D F are convex sets, the fact that L a linear mapping immediately implies that Fdμ is convex. Remark 9.2.13 When F is in addition assumed to be compact valued and integrably bounded, then D F ⊆ In and Fdμ = L(D F ), where the integrable boundedness condition means that there is a nonnegative integrable function g such that F(t) is a subset of the ball with radius g(t). Note that the integrable boundedness condition implies that the selections of F are uniformly integrable on (, , μ). By Theorem 5.4 in Chap. 1 of [10], it is clear that L is continuous. Hence, the results on compactness and upper semicontinuity for integration of correspondence follow trivially. Remark 9.2.14 Let K be a fixed weakly compact set in a separable Banach space X . Then, K is also metrizable (see [22], p. 434). As in Remark 9.2.12, the mapping L from the space of Borel probability measures M(K ) to con (K ), the closed convex hull of K , defined by setting L(ν) = x∈K x dν(x), is linear and continuous; here, the integral is interpreted as the Bochner integral. Note that M(K ) is endowed with the topology of weak convergence of measures, and K has the subspace topology induced by the weak topology on X . Similarly, Fdμ = L(D F ), whence the results on convexity, compactness and upper semicontinuity for integration of correspondence follow trivially from their distributional analogues. When K is norm compact, the same type of results for the norm topology can be derived in a similar way. One
9 Nonstandard Analysis in Mathematical Economics
363
can also obtain the same results immediately if it is the case that K is compact and metrizable in some topological vector space and one uses a suitable notion of integral so that the defined linear mapping L is still continuous (this should be the usual case). For example, one can consider the Gelfand integral for correspondences taking values in the dual of a separable Banach space (see [104]). Remark 9.2.15 For correspondences taking values in an infinite dimensional Banach space that are not dominated by a fixed compact set (in weak or weak* topologies), the procedure used in Remark 9.2.14 will not work. The point is that both the weak and weak* topologies for an infinite dimensional Banach space are not metrizable, and thus the distribution theory for correspondences taking values in a Polish space cannot apply directly. However, [104] does show that the desired exact results still hold in this general case. A general Fatou Lemma is established in [73] for a sequence of Gelfand integrable functions from a vector Loeb space to the dual of a separable Banach space or, with a weaker assumption on the sequence, a Banach lattice. A survey of integration of correspondences from a general probability space to a Banach space can be found in [115].
9.3 Nash Equilibria in Games with Many Players In this section, we consider games with many players. We can either use an atomless measure space (, , μ) to model the ideal situation of continuum many players or consider a sequence of finite games with the number of players going to infinity. We show how easy it is to obtain the existence of equilibria when proper measure spaces of player names are chosen. On the other hand, if the common unit Lebesgue interval is used, an explicit example of a game is constructed to show the nonexistence of equilibria. The discrete version of this example is also used to explain existence and nonexistence phenomena. We shall now give a formal definition of a game based on a measure space. Let A be a compact metric space, and let U A be the space of real-valued continuous functions on (A × M(A)), endowed with its sup-norm topology. A game G is a measurable function from (, , μ) to U A . Thus, a game simply associates with each player t ∈ a payoff function G(t)(a, ν) that depends on the player’s own action a and on the distribution of actions by all players. For simplicity, we also use Gt to denote G(t). Definition 9.3.1 A Nash equilibrium of a game G is a measurable function g from to A such that for all t ∈ , Gt (g(t), μg −1 ) ≥ Gt (a, μg −1 ) for all a ∈ A. It follows that if g is a Nash equilibrium, then the distribution of actions by all the players is μg −1 , and every player chooses their optimal action g(t) under this societal distribution. Note that here we only consider pure-strategy Nash equilibria.
364
Y. Sun
9.3.1 General Existence of Nash Equilibria in the Loeb Setting Fixed-point theorems for correspondences have provided the standard tool for showing the existence of economic equilibria in many areas of economics ever since the work of John Nash (and also David Gale) in [78]. Here we demonstrate how easy it is to obtain the existence of Nash equilibria for games with general action spaces, once the distribution theory for correspondences, as developed in Sect. 9.2, is available. We simply adopt the standard procedure for showing the existence of equilibria by verifying the conditions of compactness, convexity and upper semicontinuity in the particular context. In this subsection, we shall let an atomless hyperfinite Loeb space (T, L λ (T ), λ L ) be the space of player names. Theorem 9.3.2 Let the game G be a measurable function from an atomless hyperfinite Loeb space (T, L λ (T ), λ L ) to the space U A of payoffs, where the action space A is a compact metric space. Then, there exists a Nash equilibrium for the game G. Proof Define a correspondence F from T × M(A) to A such that for any (t, ν) ∈ T × M(A), F(t, ν) is the set of all elements a ∈ A that maximize the function Gt (a, ν) on A. That is, F(t, ν) = Arg Max a∈A Gt (a, ν). It is clear that F is compact valued. For each t ∈ T , since Gt is continuous on A × M(A), it is easy to see that the correspondence Ft = F(t, ·) from M(A) to M(A) has a closed graph; i.e., it is upper semicontinuous on M(A). Furthermore, for each ν ∈ M(A), since Gt (·, ν) is a measurable function on T, and a continuous function on A, the correspondence Fν = F(·, ν) from T to A is measurable (see Theorems III.14 and III.39 in [13]). Hence, there exists a selection for Fν , i.e., D Fν = ∅. Thus G(ν) = D Fν defines a correspondence from M(A) to M(A). Since the hyperfinite Loeb space is assumed to be atomless, Theorem 9.2.7 implies that G is convex valued. By Theorems 9.2.5 and 9.2.9, G is also compact valued and upper semicontinuous. Therefore, we can apply the Fan-Glicksberg fixed-point theorem [28, 34] to assert the existence of ν ∈ G(ν ) and the existence of a selection f of Fν such that λ L ( f )−1 = ν . By modifying the values of f on a null set, we can require that f (t) ∈ Fν (t) for every t. It is now clear that f is a Nash equilibrium. Remark 9.3.3 In contrast to the use of distributions, integrals are also used as societal responses in the economic literature. As argued in Remarks 9.2.12–9.2.14, it is also straightforward to obtain the existence of Nash equilibria in an integral setting from the distributional form of Theorem 9.3.2. Here we give some details. Let the action set A be a weakly compact subset of a separable Banach space. The space U Aw of player payoff functions is the space of weakly continuous real-valued functions on A × con (A) endowed with the sup-norm topology, where con (A) is the closed w convex hull of A. If G is a measurable mapping from (T, L λ (T ), λ L ) to U A , then define a new mapping G from T to U A such that Gt (a, ν) = Gt (a, x∈A xdν), as the Bochner integral. Since the operator L(ν) = where the integral is interpreted indeed defines a measurable mapping from T to U , xdν(x) is continuous, G A x∈A
9 Nonstandard Analysis in Mathematical Economics
365
whence Theorem 9.3.2 implies the existence of a Nash equilibrium f for the game G . It is obvious that f is also a Nash equilibrium for the game G.
9.3.2 Nonexistence of Nash Equilibria in the Lebesgue Setting Example 9.2.2 shows that the distribution theory for correspondences presented in Sect. 9.2.1 fails when the underlying Loeb probability space is replaced by the unit Lebesgue interval. This means that the procedure used in the proof of Theorem 9.3.2 on the existence of Nash equilibria certainly cannot be used to establish similar results in the Lebesgue setting. However, this does not rule out the possibility that the existence of Nash equilibria in the Lebesgue setting may be obtained by other methods. The purpose of this subsection is to show that for any uncountable compact metric space A, one can always find a game with action space A in the Lebesgue setting that has no Nash equilibrium. Example 9.3.4 For each ∈ (0, 1], define a periodic function on R with period 2 such that ⎧ for 0 ≤ a ≤ (/2) ⎨ a/2 for (/2) ≤ a ≤ g(a, ) = ( − a)/2 ⎩ −g(a − , ) for ≤ a ≤ 2. Note that g(·, ) is also an odd function, i.e., g(x, ) = −g(−x, ) for x < 0. When = 0, we simply let g(x, ) ≡ 0 . It is easy to check that g(·, ·) is jointly continuous on [−1, 1] × [0, 1]. Now consider a game G 1 in which the space of player names is the unit interval T = [0, 1] with the Lebesgue measure τ , and the action set A is the interval [−1, 1]. Let the payoff function Gt1 of any player t ∈ [0, 1] be given by Gt1 (a, ν) = h(a, ν) − |t − |a||, where h(a, ν) = g(a, βδ(ν, τ ∗ )); here, β is a number in the open interval (0, 1) and δ(ν, τ ∗ ) is the distance between ν and the uniform probability measure τ ∗ on [−1, 1] under the Prohorov metric. The Prohorov metric is defined here from the natural metric |x − y| on underlying space [−1, 1]. It is thus clear that δ(ν, τ ∗ ) ≤ 1. Note that G 1 is not only measurable but also continuous from T into U A . Proposition 9.3.5 The game G 1 has no Nash equilibrium. Proof Suppose there is a Nash equilibrium f for the game G 1 . Let ν0 be τ f −1 , the distribution on [−1, 1] induced by f . If ν0 is the uniform distribution τ ∗ on [−1, 1], then δ(ν0 , τ ∗ ) = 0. Thus, for a ∈ [−1, 1], h(a, ν0 ) = 0, and hence Gt1 (a, ν0 ) = −|t −|a||. This means that the best response for player t is to choose −t or t. Therefore, the equilibrium f must be a selection of the correspondence F in Example 9.2.2 whose distribution is τ ∗ . This is a contradiction.
366
Y. Sun
It follows that we must have 0 < δ(ν0 , τ ∗ ) ≤ 1. Denote βδ(ν0 , τ ∗ ) by 0 . Consider the case t ∈ ((k − 1)0 , k0 ) for an odd positive integer k. This means that g(t, 0 ) > 0. Note that the payoff for player t is Gt1 (a, ν0 ) = h(a, ν0 ) − |t − |a|| = g(a, 0 ) − |t − |a||. By the fact that g(·, ) is Lipschitz continuous of modulus 1/2, we have for each a in the interval [0, 1] − {t}, Gt1 (a, ν0 ) − Gt1 (t, ν0 ) = g(a, 0 ) − g(t, 0 ) − |t − |a|| ≤ −|t − a|/2 < 0. (9.5) Next take a ∈ [−1, 0). If g(a, 0 ) ≤ 0, then the fact that −g(t, 0 ) < 0 and −|t − |a|| ≤ 0 implies that g(a, 0 ) − g(t, 0 ) − |t − |a|| < 0. If g(a, 0 ) > 0, then g(|a|, 0 ) = −g(a, 0 ) < 0. Thus there is a number c between t and |a| such that g(c, 0 ) = 0, whence g(a, 0 ) − g(t, 0 ) = −g(|a|, 0 ) − g(t, 0 ) = g(c, 0 ) − g(|a|, 0 ) + g(c, 0 ) − g(t, 0 ) ≤ ||a| − c| /2 + |c − t| /2 < |t − |a||. Therefore, we can conclude that Gt1 (a, ν0 ) < Gt1 (t, ν0 ) for any a ∈ [−1, 1] with a = t. This means that the unique optimal action for player t is t. Since f is a Nash equilibrium, f (t) must be an optimal action for player t. Hence, f (t) = t. A similar argument shows that f (t) = −t when t ∈ ((k − 1)0 , k0 ) for an even positive integer k. Thus we have f (t) = (−1)k−1 t for t = m0 , m ∈ N. It is clear that the support S of ν0 oscillates between intervals of length 0 , moving outwards from the origin in both directions, and on the support, ν0 is the same as the Lebesgue measure. We shall show that the Prohorov distance δ(ν0 , τ ∗ ) is at most 0 . In particular, we check that for any Borel set E in [−1, 1], ν0 (E) ≤ τ ∗ (B(E, ε) + ε for any ε > 0 . Without loss of generality, we may assume that E is a Borel subset of S that does not contain any endpoints of the subintervals in S. List the subintervals in S as S1 , S2 , . . . , Sm in an increasing order, with S1 or Sm possibly of length less than 0 . Let E i = E ∩ Si . Then E i + 0 is a subset of the open subinterval of length 0 on the right of Si for 1 ≤ i ≤ m − 1 (note that Sm may not be followed by a subinterval of length 0 ). It is clear that all the E i , E i + 0 for 1 ≤ i ≤ m − 1 are disjoint, and also their union is a subset of B(E, ε). Since τ (E m ) ≤ 0 and also τ ∗ (E i ) = τ ∗ (E i + 0 ) = τ (E i )/2, we have ν0 (E) = ≤
m i=1 m−1
ν0 (E i ) =
m−1
τ (E i ) + τ (E m )
i=1
i=1
whence δ(ν0 , τ ∗ ) ≤ 0 .
τ ∗ (E i ) + τ ∗ (E i + 0 ) + 0 ≤ τ ∗ (B(E, ε)) + ε,
9 Nonstandard Analysis in Mathematical Economics
367
Finally, we recall that by definition, 0 = βδ(ν0 , τ ∗ ), and so δ(ν0 , τ ∗ ) ≤ βδ(ν0 , τ ∗ ), which implies that β ≥ 1. This contradicts the original choice of β in the open interval (0, 1). Therefore the game G 1 has no Nash equilibrium. Remark 9.3.6 Define a mixed strategy φ for the game G 1 by letting φ(t) be the discrete probability measure that gives equal weight 1/2 to t and −t. Then, its distribution ν1 defined by ν1 (B) = t∈[0,1] φ(t)(B)dτ for each Borel set B in [−1, 1] is the uniform distribution τ ∗ . It is easy to see that for all t ∈ [0, 1],
a∈[−1,1]
Gt1 (a, ν1 )dφ(t)(a)
≥
Gt (a, ν1 )dν(a) a∈[−1,1]
for all ν ∈ M([−1, 1]). Such a mixed strategy φ is a mixed-strategy Nash equilibrium in the usual sense. Thus the game G 1 has a mixed-strategy Nash equilibrium that cannot be purified. The real number structure on [−1, 1] plays a central role in the construction of the game G 1 in Example 9.3.4 and in the computations in the proof of Proposition 9.3.5. One may wonder whether similar nonexistence examples can always be found in more general cases. Note that there are very strong connections between an interval and a general uncountable compact metric space. They have the same structures not only in terms of Borel measurability (a well known fact), but also in terms of weak convergence and almost sure convergence as shown in [101]. This gives the hope that something should be there for the general cases. In fact, as shown in the following theorem, the best possible nonexistence result in this context can be obtained easily. Theorem 9.3.7 Let B be any given uncountable compact metric space. Then there is a game G B with action space B, whose space of player names is the unit interval T = [0, 1] with Lebesgue measure τ , such that the game has no Nash equilibrium. Proof Since B is uncountable and compact, there is a continuous mapping φ from B to [−1, 1] that is onto (see, for example, [91]). For each t ∈ T , define the payoff GtB for player t to be a real-valued function on B × M(B) such that GtB (b, ν) = Gt1 (φ(b), νφ−1 ), where G 1 is the game in Example 9.3.4. By the continuity of φ, it is clear that G B is continuous in all the three variables, t, b, ν. Thus G B also defines a game, i.e., a measurable mapping from T to U B . Suppose that g is a Nash equilibrium for the game G B . Then, for each t ∈ T , GtB (g(t), τ g −1 ) ≥ GtB (b, τ g −1 ) for all b ∈ B. This means that Gt1 (φ(g(t)), (τ g −1 )φ−1 ) ≥ Gt1 (φ(b), (τ g −1 )φ−1 ) for all b ∈ B.
368
Y. Sun
Letting f = φ ◦ g, we have Gt1 ( f (t), τ f −1 ) ≥ Gt1 (φ(b), τ f −1 ) for all b ∈ B. Since φ is onto, we can conclude that f is a Nash equilibrium for the game G 1 , which contradicts the nonexistence result in Proposition 9.3.5. Remark 9.3.8 Remark 9.3.3 shows that the existence of Nash equilibria for games with societal responses in the integral form in a Banach space follows from that in the distributional form. Thus, an example showing the nonexistence of a Nash equilibrium in the integral setting automatically provides a nonexistence result for a game in the distributional setting. This means that when the unit Lebesgue interval is used as the space of player names, it should be more difficult to find a nonexistence example when some particular averages are used as societal responses. Indeed, the payoff functions in Example 9.3.4 can no longer be used for the new purpose when the Bochner integral is used. However, [50] does provide a more intricate example showing the nonexistence of a Nash equilibrium for a game with a Lebesgue space of player names and a compact action space in a Hilbert space. On the other hand, if the societal responses are formalized as Gelfand integrals, then Example 9.3.4 can still be reinterpreted to establish the nonexistence result in this new setting (see [50]). When the action space is taken from an arbitrary Banach space, nonexistence results are also established in [100].
9.4 Nash Equilibria in Finite Games with Incomplete Information 9.4.1 Nonexistence of Nash Equilibria for Games with Information When the unit Lebesgue interval is used as the space of player names and the action space is any uncountable compact metric space, Example 9.3.4 in Sect. 9.3.2 shows that there are games with many players that have no Nash equilibria. We shall now consider similar nonexistence results in the setting of a two-player game with incomplete information. Example 9.4.1 Consider a two-player game G 2 with incomplete information. The players have identical action sets A1 = A2 = [−1, 1] and identical private information (or signal or type) spaces given by the unit interval T1 = T2 = [0, 1]. The space of joint information is the unit square [0, 1] × [0, 1] endowed with the product Lebesgue measure, and thus the two players receive independent signals. Since the marginal measures on T1 and T2 are equal to Lebesgue measure, which has no atoms, the information is diffuse.
9 Nonstandard Analysis in Mathematical Economics
369
We shall now specify the payoff function for each player in the game G 2 . Each player’s payoff function depends on their action, the action of the other player, and the signal received. We shall assume that u 1 : A1 × A2 × T1 −→ R and u 2 : A1 × A2 × T2 −→ R are such that u 1 (a1 , a2 , t) = −|t − |a1 || + (t − a1 )z(t, a2 ), u 2 (a1 , a2 , t) = −|t − |a2 || − (t − a2 )z(t, a1 ), where the function z : [0, 1] × [−1, 1] −→ R is defined as follows. For each t ∈ [0, 1/2], ⎧ for 0 ≤ a ≤ t ⎨a for t < a ≤ 1 z(t, a) = t ⎩ −z(t, −a) for a < 0, and for any t ∈ (1/2, 1], z(t, ·) = z(1/2, ·). This is to say that for all indices t in [1/2, 1], the functions z(t, ·) are identical. It is obvious that for a given type t, both u 1 (·, ·, t) and u 2 (·, ·, t) are continuous on A1 × A2 . Definition 9.4.2 A player’s pure strategy in the game G 2 is simply a measurable function from [0, 1] to [−1, 1], i.e., a delineation of one action for each signal. A Nash equilibrium of the game is a pair of measurable functions f i∗ from Ti to Ai such that for any measurable functions f i : Ti −→ Ai , i = 1, 2,
u 1 ( f 1∗ (t1 ), f 2∗ (t2 ), t1 )dτ dτ ≥ u 1 ( f 1 (t1 ), f 2∗ (t2 ), t1 )dτ dτ , T T T T 1 2 1 2 u 2 ( f 1∗ (t1 ), f 2∗ (t2 ), t2 )dτ dτ ≥ u 2 ( f 1∗ (t1 ), f 2 (t2 ), t2 )dτ dτ . T2
T1
T2
T1
It follows that if the players in the game G 2 were to play a Nash equilibrium, then each player, given the strategy of the other player, would choose their optimal plan so as to maximize their expected utility. However, it can be shown that this game has no Nash equilibrium at all, which is stated as a proposition below. Proposition 9.4.3 The game G 2 has no Nash equilibrium. The proof of this nonexistence result is much more difficult than the one for games with many players as in Proposition 9.3.5. The interested reader can find the details in [51]. As in Theorem 9.3.7, we can also extend Example 9.4.1 to games of incomplete information with any uncountable compact metric action space, thereby showing again that the properties of the interval are not germane to the existence of a pure-strategy Nash equilibrium.
370
Y. Sun
Remark 9.4.4 Define mixed strategies by letting φ1 = φ2 = φ in game G 2 , where φ is the same as in Remark 9.3.6. Then the pair (φ1 , φ2 ) is a mixed-strategy equilibrium of the game G 2 that cannot be purified.
9.4.2 Approximate Nash Equilibria for Large Finite Games and Idealizations We have shown the nonexistence of Nash equilibria for games based on the Lebesgue interval. What is not as clear is why models based on Lebesgue measure fail at isolating this class of game-theoretic phenomena. In order to illustrate the reasons for this dissonance, we take a discrete version of the game G 2 in Example 9.4.1 to obtain a sequence of finite games with an increasing number of sample points. We show how the limit of the existing approximate Nash equilibria disappears in the limit game in the Lebesgue setting. Example 9.4.5 For any fixed n ∈ N with n ≥ 4, consider a two-person game Gn2 with finite information spaces (type spaces) T n = T1n = T2n = {1/n, 2/n, . . . , n/n} endowed with the counting probability measures τ n = τ1n = τ2n , and with action spaces A1 = A2 = [−1, 1]. The probability measure on the joint information space T1n × T2n is the product measure τ1n ⊗ τ2n . This means that the two players have independent information. The payoffs are specified by u 1 : A1 × A2 × T1n −→ R and u 2 : A1 × A2 × T2n −→ R, and are such that u 1 (a1 , a2 , t) = −|t − |a1 || + (t − a1 )z(t, a2 ), u 2 (a1 , a2 , t) = −|t − |a2 || − (t − a2 )z(t, a1 ), where z : [0, 1] × [−1, 1] −→ R is the same function as in Example 9.4.1. Proposition 9.4.6 Let f 1n : T1n −→ A1 and f 2n : T2n −→ A2 be the functions such that for any j = 1, 2, . . . , n, f 1n ( j/n) = f 2n ( j/n) = (−1) j ( j/n). Then the pair ( f 1n , f 2n ) forms an approximate pure-strategy Nash equilibrium for the discrete game Gn2 in the sense that the approximate maximal values are achieved with a deviation of at most 2/n . In particular, for any g1 : T1n −→ A1 and for any g2 : T2n −→ A2 , we have D1 = u 1 ( f 1n (t1 ), f 2n (t2 ), t1 ) − u 1 (g1 (t1 ), f 2n (t2 ), t1 ) dτ2n dτ1n > −(2/n), Tn
Tn
1 2 D2 =
T2n
T1n
u 2 ( f 1n (t1 ), f 2n (t2 ), t2 ) − u 1 ( f 1n (t1 ), g2 (t2 ), t2 ) dτ1n dτ2n > −(2/n).
9 Nonstandard Analysis in Mathematical Economics
371
Proof We consider D1 first.
D1 = =
T1n
T1n
≥
|t1 − |g1 (t1 )|| − (t1 − g1 (t1 ))z(t1 , f 2n (t2 )))dτ2n dτ1n |t1 − |g1 (t1 )||dτ1n + (g1 (t1 ) − f 1n (t1 )) z(t1 , f 2n (t2 ))dτ2n dτ1n
T1n
T1n
(−|t1 − | f 1n (t1 )|| + (t1 − f 1n (t1 ))z(t1 , f 2n (t2 )) +
T2n
=
T2n
n n n n u 1 ( f 1 (t1 ), f 2 (t2 ), t1 ) − u 1 (g1 (t1 ), f 2 (t2 ), t1 ) dτ2 dτ1n
(g1 (t1 ) − f 1n (t1 ))
T1n
T2n
T2n
z(t1 , f 2n (t2 ))dτ2n dτ1n .
Since g1 (t1 ) and f 1n (t1 ) take values in the interval [−1, 1], it is obvious that |g1 (t1 ) − f 1n (t1 )| ≤ 2. Hence D1 ≥ −2 z(t1 , f 2n (t2 ))dτ2n dτ1n = −2(1/n) (1/n) z(t1 , f 2n (t2 )) . T1n T2n t2 ∈T n t1 ∈T n
1
2
Let H (t1 ) = t2 ∈T n z(t1 , f 2n (t2 )). From the definition of z(t, a), it follows that for 2 t1 ∈ T1n ∩ [0, 1/2], H (t1 ) =
(−1)nt2 · t2 +
1/n≤t2 ≤t1
(−1)nt2 · t1 .
t1
By a simple induction argument, we obtain k k k+1 (−1)k − 1 , and (−1)i i = (−1)k (−1)i = 2 2 i=1
i=1
where the integer part of a real number x is denoted by [x]. Thus, by letting nt2 = i, ⎛ ⎞ nt n 1 i H (t1 ) = (−1)i (−1)i ⎠ + t1 ⎝ n i=1 i=nt1 +1 nt1 + 1 (−1)nt1 (−1)n−nt1 − 1 + nt1 = n 2 2 for t1 ∈ T1n ∩ [0, 1/2].
372
Y. Sun
Next, for t1 ∈ T1n ∩ (1/2, 1], it follows from the definition of z(t, a) that z(t1 , f 2n (t2 )) = z(1/2, f 2n (t2 )) and
H (t1 ) =
1/n≤t2 ≤1/2
(−1)[n/2] = n Therefore, we have |H (t1 )| = t1 ∈T1n
t1 ∈T1n ∩[0,1/2]
≤
(−1)nt2 · t2 +
(−1)nt2 ·
1/2
1 2
[n/2] + 1 n (−1)n−[n/2] − 1 + · . 2 2 2
|H (t1 )| +
|H (t1 )|
t1 ∈T1n ∩(1/2,1]
[n/2] n 1 [n/2] + 1 n j +1 1 + j + n− + n 2 2 n 2 2 j=1
[n/2]
n 1 n/2 + 1 n j +1 + j + n− + 2 2 n 2 2 j=1
1 n n 3 1 1 3 1 n n · +1 + · · · + n− + = n 2 2 2 2 2 2 2 4 2n
1 n 1 3 n n 1 3 n · · +1 + · +1 + + ≤ n 4 2 2 2 2 2 4 2n 5 3 1 9 13 1 3 n+ + n+ +1≤ n+ + ≤ n. = 16 8 8 2n 16 8 8 1 ≤ n
It follows that D1 ≥ (−2/n 2 ) · n = −(2/n). Similarly, we can prove that D2 > −(2/n). Remark 9.4.7 By transferring the result in Proposition 9.4.6 to the nonstandard universe, we obtain for any η ∈ ∗ N∞ , an internal game Gη2 together with an internal η η equilibrium ( f 1 , f 2 ). By the definition of the payoff functions in Example 9.4.5, the internal game Gη2 is an internal lifting of the game G 2 based on the unit Lebesgue interval in terms of their payoff functions. Note that an internal function F on T η is said to be an internal lifting of a Lebesgue measurable function g on the unit Lebesgue η η η interval if F(t) g(◦ t) for τ L -almost all t ∈ T η . Since the equilibrium ( f 1 , f 2 ) 2 of the hyperfinite game Gη involves functions in which players with “infinitesimally close” signals essentially do not have “infinitesimally close” actions, it cannot be reduced to an equilibrium of the corresponding game G 2 in the Lebesgue setting. In fact, the nonexistence result in Proposition 9.4.3 implies that any other equilibrium of the hyperfinite game also has the same problem. This means that any approximate equilibria of the corresponding large finite games must be highly oscillatory. On the other hand, we can obtain a game ◦ Gη2 with a Loeb information space and its η η equilibrium (◦ f 1 , ◦ f 2 ) by “rounding off” the infinitesimals. In this Loeb game, the
9 Nonstandard Analysis in Mathematical Economics
373 η
individual signal space is Tη with the Loeb measure τ L , and the payoffs are defined as follows. For for each t ∈ T η , u 1 (a1 , a2 , t) = −|◦ t − |a1 || + (◦ t − a1 )z(◦ t, a2 ) u 2 (a1 , a2 , t) = −|◦ t − |a2 || − (◦ t − a2 )z(◦ t, a1 ), where the function z is exactly the same as in Example 9.4.1.
9.4.3 General Existence of Nash Equilibria for Games with Information In Sect. 9.4.2, it is shown how the approximate equilibria of large finite games can disappear in a limit game when the unit Lebesgue interval is used to model diffuse information. It is also pointed out that if the Loeb space is used in the same context, there is no problem at all. The purpose of this subsection is to present an existence result on Nash equilibria for general games with incomplete information. For this purpose, we first need to give a formal definition for such games. Definition 9.4.8 A game with incomplete information consists of a finite set I of players, each of whom is endowed with a compact metric action space Ai , an information space captured by two measurable spaces (Z i , Zi ) and (X i , Xi ), a utility function u i : A × X i −→ R, and a probability measure μ on (, F). Here, A denotes the product space i∈I Ai and (, F) the measurable space (i∈I (Z i × X i ), i∈I Zi ⊗ Xi ), the product space being equipped with the product σ-algebra. In summary form, a game with incomplete information is given by = (I, (u i , Ai , ((Z i , Zi ), (X i , Xi )))i∈I , μ), with each object described as above. For any point ω = (z 1 , x1 , . . . , z , x ) ∈ , and for any i = 1, . . . , , let (ζi , χi ) be the coordinate projections, i.e., ζi (ω) = z i and χi (ω) = xi . The strategy space Si for player i is the space of measurable functions from Z i to Ai . For any g = S , we shall denote the resulting expected payoff to the ith (g1 , . . . , g ) ∈ i=1 i player by Ui (g) =
ω∈
u i [g1 (ζ1 (ω)), . . . , g (ζ (ω)), χi (ω)]μ(dω).
Furthermore, we shall use g−i to denote the ( − 1)-vector given by g with its ith component deleted, and (gi , g−i ) to denote the -vector g but with g i as its ith component. S is called a Definition 9.4.9 An -tuple of functions g = (g1 , . . . , g ) ∈ i=1 i Nash equilibrium of the game with incomplete information if for each player i, ) for all g ∈ S . Ui (g ) ≥ Ui (gi , g−i i i
374
Y. Sun
Theorem 9.4.10 Suppose that for every player i, (a) (Z i , Zi , μi ), with μi being the marginal of μ on (Z i , Zi ), is an atomless hyperfinite Loeb space, (b) the random variables {ζ j : j = i} together with the random variable ξi ≡ (ζi , χi ) form a mutually independent set, (c) u i (·, χi (ω)) is a continuous function on A for μ-almost all ω ∈ , and there is a real-valued integrable function h i on (, F, μ) such that for μ-almost all ω ∈ , |u i (·, χi (ω))| ≤ h i (ω) for all a ∈ A. Then there exists a Nash equilibrium of the game with incomplete information . Proof Focus on the player i, and note that u i (a, χi (·)) is a uniformly μ-integrable function on by condition (c). Hence, we can appeal to Theorem 2.1 in [26] to assert the existence of a function Vi from A × Z i to R such that Vi (a, ζi (ω)) is the regular conditional expectation of u i (a, χi (ω)) under the sub-σ-algebra of F generated by ζi . Thus for any measurable set W ⊆ Z i , we have
{ω∈: ζi (ω)∈W }
u i (a, χi (ω))dμ(ω) =
z i ∈W
Vi (a, z i )dμζi−1 (z i ).
Furthermore, by Theorem 2.2 in [26], we know that for μζi−1 -almost all z i ∈ Z i , V (·, z i ) is continuous on A. Without loss of generality, we can assume that for all z i ∈ Z i , V (·, z i ) is continuous on A. Consider the real-valued mapping G i on Z i × Ai × j=1 M(A j ) given by (z i , ai , ν1 , . . . , ν ) −→ G i (z i , ai , ν1 , . . . , ν ) =
a−i ∈A−i
Vi (a, z i )dν−i .
M(A ) Then, for any fixed z i ∈ Z i , G i is a continuous function from Ai × i=1 i to R, and for any given (ai , ν1 , . . . , ν ) ∈ Ai × i=1 M(Ai ), G i is a measurable function from Z i to R. M(A ) Now we construct the mapping, in general set-valued, from Z i × i=1 i into Ai given by
(z i , ν1 , . . . , ν ) −→ F i (z i , ν1 , . . . , ν ) = Arg Maxai ∈Ai G i (z i , ai , ν1 , . . . , ν ); i.e., F i (z i , ν1 , . . . , ν ) is the set of all elements ai ∈ Ai that maximize the function G i (z i , ai , ν1 , . . . , ν ) on Ai . Then F i is compact valued. For each z i ∈ Z i , M(A ), it is easy to see that the since G i (z i , . . .) is continuous on Ai × i=1 i M(A ) to correspondence Fzii = F i (z i , . . .) from the compact metric space i=1 i itself has a closed graph, i.e., it is upper semicontinuous. Furthermore, for any fixed M(A ), since G is a measurable function on Z and contin(ν1 , . . . , ν ) ∈ i=1 i i i i i (·, ν , . . . , ν ) from Z to A is uous on Ai , the correspondence F(ν (·) = F 1 i i 1 ,...,ν )
9 Nonstandard Analysis in Mathematical Economics
375
measurable (see Theorems III.14 and III.39 in [13]). Hence there exists a selection i ; i.e., D F i = ∅. for F(ν 1 ,··· ,ν ) (ν1 ,...,ν )
Since the relevant Loeb space is assumed to be atomless, Theorem 9.2.7 implies is convex valued. By Theorems 9.2.5 and 9.2.9, D F i is also a that D F i (ν1 ,...,ν )
(ν1 ,...,ν )
M(A ) into compact valued and upper semicontinuous correspondence from i=1 i M(Ai ). M(A ) to M(A ) such that Next, define a correspondence φ from i=1 i i i=1 DFi φ(ν1 , . . . , ν ) = i=1
(ν1 ,...,ν )
.
It is easy to see that φ is an upper semicontinuous, closed and convex valued correspondence from a nonempty convex compact subset of a locally convex space into itself. Hence, we can apply the Fan-Glicksberg fixed-point theorem [28, 34] to assert the existence of (ν1 , . . . , ν ) such that (ν1 , . . . , ν ) ∈ φ(ν1 , . . . , ν ). Since for any player i, νi ∈ D F i μgi−1
νi .
(ν1 ,...,ν )
with = It is easy to check that strategies for the game .
i , there exists a selection gi of F(ν ,...,ν ) 1
(g1 , . . . , g )
is a Nash equilibrium in pure
9.5 Exact Law of Large Numbers and Independent Set-Valued Processes In Chap. 8 of this book, the satisfiability of some versions of the law of large numbers for a process is characterized by almost independence. The purpose of this section is to consider the set-valued case. The results presented here will be used in the next section to obtain competitive equilibria for random economies. Since the target space in the next section is the commodity space Rn , we need to consider correspondences that take values in a non-compact space. As noted in Sect. 9.2.1, it is well known that for a closed valued correspondence G from a probability space (, , μ) to a compact metric space X , the measurability of G as a correspondence is equivalent to the measurability of G as a mapping into the hyperspace (F X , ρ) of closed sets with the Hausdorff distance ρ. Here we consider the general case when X is a complete separable metrizable topological space, i.e., a Polish space. Unless otherwise noted, we shall now use (X, d) to denote a Polish space with a totally bounded metric d on X . Then the space F X of nonempty closed subsets of X endowed with the Hausdorff distance ρ derived from d is still a Polish space. For each open set O in X , let E O = {A ∈ F X : A ∩ O = ∅}. For a closed valued
376
Y. Sun
correspondence G from a probability space (, , μ) to X , let G¯ denote the mapping from (, , μ) to the hyperspace F X of nonempty closed sets such that the value of G¯ at each point is the nonempty closed set obtained by evaluating the correspondence at that point. The following lemma, which is Proposition 2.3 in [103], characterizes measurable correspondences as well as the Borel σ-algebra B(F X ) of the Polish space F X . Lemma 9.5.1 The Borel σ-algebra B(F X ) is generated by the collection of all the E O for open sets O in X . Let G be a closed valued correspondence from a probability space (, , μ) to (X, d). Then G is a measurable correspondence if and only if G¯ is a measurable mapping from (, , μ) to (F X , B(F X )). Definition 9.5.2 Let G 1 , G 2 , . . . , G n be n closed valued measurable correspondences from a probability space (, , μ) to a Polish space X . They are said to be −1 independent if for any open sets O1 , O2 , . . . , On in X , the events G −1 1 (O1 ), G 2 (O2 ), . . . , G −1 n (On ) are independent. The following lemma shows that the notions of independence for correspondences and for their associated point-valued mappings into the hyperspace F X are the same. Lemma 9.5.3 Let F and G be closed valued measurable correspondences from a probability space (, , μ) to a Polish space X with a totally bounded metric d. Then F and G are independent as correspondences if and only if the random variables F¯ and G¯ from (, , μ) to F X are independent. Proof If the random variables F¯ and G¯ are independent, then for any open sets O1 , O2 in X , the events F¯ −1 (E O1 ) and G¯ −1 (E O2 ) are independent, and so are the events F −1 (O1 ) and G −1 (O2 ). This means that the correspondences F and G are independent. On the other hand, if F and G are independent as correspondences, then for any open sets O1 , O2 in X , F¯ −1 (E O1 ) and G¯ −1 (E O2 ) are independent events, and so are the complements F¯ −1 (F X − E O1 ) and G¯ −1 (F X − E O2 ). By Lemma 9.5.1, open set in X } also {F X − E O : O is an generates the Borel σ-algebra B(F X ). It n n O . This also means that the F X − E Oi = F X − E∪i=1 is easy to check that i=1 i collection {F X −E O : O is an open set in X } is closed under finite intersections; i.e., it is a π-system. Thus the independence of F¯ and G¯ follows from the usual extension theorem ([75], p. 237). The following lemma characterizes the equality of distributions of correspondences on Loeb spaces. This exact characterization is very useful for applications. Lemma 9.5.4 Let F and G be closed valued measurable correspondences from atomless Loeb probability spaces (1 , L P 1 (A1 ), PL1 ) and (2 , L P 2 (A2 ), PL2 ) to a Polish space X respectively. Then the following are equivalent:
9 Nonstandard Analysis in Mathematical Economics
377
(1) For every open set O in X , PL1 (F −1 (O)) = PL2 (G −1 (O)); (2) D F = DG ; (3) F¯ and G¯ have the same distributions on the hyperspace F X . Proof The proof of (1) =⇒ (2) is quite involved; the interested reader is referred to Proposition 3.5 in [103]. To show (2) =⇒ (1), we need the following claim: For any given open set O in X , PL1 (F −1 (O)) = sup{μ(O) : μ ∈ D F }. Denote the Loeb measurable set F −1 (O) by A, and let FO be the correspondence from the measurable space (A, L P 1 (A1 ) ∩ A) into O defined by FO (ω1 ) = F(ω1 ) ∩ O for ω1 ∈ A. Then the graph of FO is the intersection of F with A × O, which is measurable in the product σ of the graph algebra L P 1 (A1 ) ∩ A ⊗ B(O). Hence, there exists a selection f 1 of FO . Take any / A. selection f 2 of F. Let f (ω1 ) = f 1 (ω1 ) if ω1 ∈ A, and f (ω1 ) = f 2 (ω1 ) if ω1 ∈ Then f is a selection of F with PL1 ( f −1 (O)) ≥ PL1 (A), whence PL1 (F −1 (O)) ≤ sup{μ(O) : μ ∈ D F }. Since the other direction of the inequality is obvious, the claim is thus proven. Similarly, PL2 (G −1 (O)) = sup{μ(O) : μ ∈ DG } . Therefore, (2) =⇒ (1) is now obvious. By evaluating the distributions of F¯ and G¯ on sets of the form E O in F X , we can obtain (3) =⇒ (1). If (1) holds, then the distributions of F¯ and G¯ agree on sets of the form E O in F X , and hence also on sets of the form F X − E O . As noted in the proof of Lemma 9.5.3, this latter collection is a π -system. Since two probability measures that agree on a π-system must be the same ([14], p. 45), the two distributions must be the same. Thus (3) holds. Let (T × , L λ⊗P (T ⊗ A), (λ ⊗ P) L ) be the Loeb product space, which is the Loeb space of the internal product probability space (T × , T ⊗ A, λ ⊗ P). For basic properties of Loeb product spaces and its applications, see the previous Chaps. 6 and 8. A measurable function f from the Loeb product space to a Polish space X is called a process. As in Chap. 8, for each t ∈ T and ω ∈ , f t denotes the function f (t, ·) on , and f ω denotes the function f (·, ω) on T . Note that the measurability (in an almost sure sense) of f t and f ω is guaranteed by Keisler’s Fubini Theorem for Loeb product spaces. The functions f t are usually called the random variables of the process f , while the f ω form the sample functions of the process. A closed valued measurable correspondence from the Loeb product space (T × , L λ⊗P (T ⊗ A), (λ ⊗ P) L ) to X is called a set-valued process. Similarly, for each ω ∈ , let Fω be the correspondence defined by Fω (t) = F(t, ω), which is called a sample correspondence in the set-valued process F. For each t ∈ T , let Ft be the correspondence defined by Ft (ω) = F(t, ω), which is simply called a correspondence in F. A process f is said to be a selection of a set-valued process F if f (t, ω) ∈ F(t, ω) for (λ ⊗ P) L -almost all (t, ω) ∈ T × . When F is regarded as a correspondence on the Loeb product space, D F denotes the set of distributions of the selections of F viewed as random variables on the Loeb product space. For ω ∈ , the meaning of D Fω is obvious.
378
Y. Sun
Definition 9.5.5 For a set-valued process F from (T ×, L λ⊗P (T ⊗A), (λ⊗ P) L ) to a Polish space X , if for (λ ⊗ λ) L -almost all (t1 , t2 ) ∈ T × T , the correspondences Ft1 and Ft2 are independent, we say that the correspondences Ft are almost surely pairwise independent; if for any n ≥ 2 and for (λn ) L -almost all (t1 , t2 , . . . , tn ) ∈ T n , the correspondences Ft1 , Ft2 , . . . , Ftn are independent, we say that the correspondences Ft are almost mutually independent. Definition 9.5.6 Let F be a set-valued process from (T × , L λ⊗P (T ⊗ A), (λ ⊗ P) L ) to a Polish space X . We say that F satisfies the coalitional law of large numbers (or simply the coalitional law) if for any measurable set A ∈ L λ (T ) with λ L (A) > 0, the set-valued process F A from (A × , L λ⊗P (T ⊗ A) A , (λ ⊗ P) LA ) to X satisfies the law of large numbers. That is, for PL -almost all ω ∈ , D FωA = D F A , where F A and L λ⊗P (T ⊗ A) A are respectively the restrictions of F and L λ⊗P (T ⊗ A) to A × . Here, (λ ⊗ P) LA denotes (λ ⊗ P) L | A× /λ L (A), a probability measure on (A × , L λ⊗P (T ⊗ A) A ) rescaled from (λ ⊗ P) L . Note that pairwise independence is almost the weakest version of independence, while mutual independence is almost the strongest. Parts (2) and (3) of the following theorem shows that their almost versions are equivalent for correspondences. Such an equivalence in the setting of random variables was first shown by the author in [106] (see Theorem 8.7.4 in Chap. 8). Based on this equivalence, we shall simply refer to the almost versions of pairwise and mutual independence as almost independence. The following theorem also characterizes the satisfiability of coalitional law of large numbers for set-valued processes by almost independence. Theorem 9.5.7 Let F be a set-valued process mapping the Loeb product space (T × , L λ⊗P (T ⊗ A), (λ ⊗ P) L ) to a Polish space X . Assume that both Loeb measures λ L and PL are atomless. Then the following are equivalent: (1) F satisfies the coalitional law; (2) the correspondences Ft are almost surely pairwise independent; (3) the correspondences Ft are almost mutually independent. Proof By Lemma 9.5.4, it is clear that the set-valued process F satisfies the coalitional law if and only if the process F¯ taking values in F X satisfies the colitional law in distribution. By Theorems 8.5.7 and 8.6.5 in Chap. 8 (also Theorem 4 in [102] or Theorem 7.6 in [105]), the latter assertion is equivalent to the fact that for (λ ⊗ λ) L almost all (t1 , t2 ) ∈ T × T , F¯t1 and F¯t2 are independent. Lemma 9.5.3 says that the independence of the correspondences Ft1 and Ft2 is equivalent to that of the random variables F¯t1 and F¯t2 . Hence (1) and (2) are equivalent. Next, (3) =⇒ (2) is obvious. If (2) holds, then the above paragraph shows that for (λ ⊗ λ) L -almost all (t1 , t2 ) ∈ T × T , F¯t1 and F¯t2 are independent. By Theorem 8.7.4 in Chap. 8 (also Theorem 3 in [106]), we know that for all n ≥ 2, F¯t1 , F¯t2 , . . . , F¯tn are independent for (λn ) L -almost all (t1 , t2 , . . . , tn ) ∈ T n , which certainly implies (3) by taking the inverse images of these point-valued mappings on the sets of the form E O in F X .
9 Nonstandard Analysis in Mathematical Economics
379
Next, consider a trivial example. Let F be a set-valued process to R such that F(t, ω) = R for all t and ω. This set-valued process F takes a constant value and thus obviously has independent correspondences. Let α be a nontrivial, real-valued random variable on , and define f (t, ω) = α(ω) for all t, ω. Then f is a selection of F, but its random variables are all perfectly correlated. This means that the almost independence of correspondences in a set-valued process may not be preserved by its selections. Widespread correlations could indeed exist in some selections. The following theorem however shows that such widespread correlations can be removed via redistributions. Theorem 9.5.8 Let F be a set-valued process from (T ×, L λ⊗P (T ⊗A), (λ⊗P) L ) to a Polish space X such that the correspondences Ft are almost independent. Assume that both Loeb measures λ L and PL are atomless. Let μ be the distribution of a selection f of F as a random variable on T × . Then there is a selection g of F such that the random variables gt are almost independent and the distribution of g viewed as a random variable on T × is μ. The proof of Theorem 9.5.8 involves delicate computations. The main difficulty comes from the atoms of the distribution of F. A closed set D in X is an atom for the distribution of F if there is a non-negligible set R in T × such that F(t, ω) = D for all (t, ω) ∈ R. The events Rt are almost independent. In some sense, one then has to choose a process with almost independence from R to D with a given distribution. The idea is to cut through all the events Rt to obtain subevents with a given common proportion of the probability of the Rt so that almost independence remains valid for those subevents. Eventually, one also has to paste all the constructions arising from the atoms as well as the atomless part together. The interested reader can find the details in [107]. As noted in Remark 9.2.12, the integration theory of correspondences in Rn follows trivially from the relevant distribution theory. The following corollary is thus clear. Corollary 9.5.9 Let F be a set-valued process from the atomless Loeb product space to Rn . If the correspondences Ft are almost independent, then (1) for PL -almost all ω ∈ , T Fω dλ L = T × Fd(λ ⊗ P) L , (2) for any x ∈ T × Fd(λ ⊗ P) L , there is a selection g of F such that the random variables gt are almost independent and x = T × gd(λ ⊗ P) L . Remark 9.5.10 It is easy to generalize Corollary 9.5.9 to the case where the setvalued processes take values from the power set of an infinite dimensional space and the integrals are either the strong Bochner or the weak Gelfand integrals (see [107]). Remark 9.5.11 Let X be a locally compact separable metric space. Then the collection F(X ) of nonempty closed sets in X , joined by the empty set, is compact and metrizable when the enlarged collection is endowed with the topology of closed convergence ([43], p. 19). As noted in Remark 2.4 (2) in [103], The σ-algebra generated by this topology is the same as B(F X ). Thus, as far as measurability, distributions and
380
Y. Sun
independence are concerned on F X , it is irrelevant to know whether the underlying topology on F X is the topology of closed convergence or the topology induced by the Hausdorff distance defined from an equivalent totally bounded metric on X . Remark 9.5.12 The well known Castaing representation theorem says that a closed valued measurable correspondence is the closure of a sequence of its selections. Based on Theorem 9.5.8, Xiaoai Lin obtained an independence version of the Castaing representation in [68]. It is shown that the structure of an almost independent setvalued process is rather simple; i.e., the set-valued process is simply the closure of a sequence of almost independent point-valued processes. A number of different definitions for independence of correspondences are also unified in [68].
9.6 Competitive Equilibria in Random Economies We shall first fix some notation for this section. Let be the number of distinguishable commodities in a market. The positive orthant R+ of all the vectors in R with nonnegative components is the common consumption set. That is, all goods are assumed to be perfectly divisible. A preference relation is a transitive and irreflexive binary relation on R+ such that is a relatively open set in R+ × R+ (see [43], p. 86). The set of all preference relations is denoted by P. The topology on P is the one induced by the topology of closed convergence on the closed sets R+ × R+ \ . The space of preferences P with this topology is a compact and metrizable space (see [43], p. 19 and 96). In Sect. 9.3.1, an atomless Loeb probability space (T, L λ (T ), λ L ) is used to model the space of player names for games with many players. Here we use (T, L λ (T ), λ L ) to represent the space of economic agents. An economy α = (, e) is a measurable mapping from the space of economic agents (T, L λ (T ), λ L ) to the space of agents’ characteristics P × R+ such that T e(t)dλ L is finite, where t and e(t) are, respectively, the preference relation and the endowment of agent t, and T e(t)dλ L is the mean endowment of the economy α. A price system p is a vector in R+ such that the sum of its components is one. The set of all price systems is denoted by . For an economic agent with preference and endowment e, the demand set ϕ(, e, p) under the price system p is the set of maximal elements in the budget set {x ∈ R+ : p · x ≤ p · e} under the preference . Thus the demand set of agent t is ϕ(t , et , p) or ϕ(α(t), p). When p is strictly positive, i.e., positive in every component, the demand set ϕ(, e, p) is always nonempty and compact. Note that the demand relation ϕ(, e, p) has a Borel measurable graph ([43], p. 102). Hence, under a strictly positive price system p, the individual demand relation ϕ(α(·), p) defines a compact valued, measurable correspondence from T to R+ , which is called the individual demand correspondence. The mean demand correspondence
(α, p), as a mapping of the price system p, is simply the integral T ϕ(α(t), p)dλ L . The mean excess demand Z (α, p) is the difference of the mean demand with the mean endowment; i.e., Z (α, p) = (α, p) − T e(t)dλ L . Hereafter, we shall always
9 Nonstandard Analysis in Mathematical Economics
381
regard various demand correspondences as set-valued mappings on the set of strictly positive prices unless otherwise noted. An allocation f for the economy α is simply an integrable function from T into the consumption set R+ . An allocation f and a price system p is called a competitive equilibrium for α if for λ L -almost all t ∈ T , f (t) ∈ ϕ(α(t), p), and also T f dλ L = T edλ L . That is, almost all agents choose optimal consumptions within their budgets, and the market is clear. We shall also call f a Walras allocation and its distribution, a Walras distribution as in [43]. Another atomless Loeb probability space (, L P (A), PL ) is used to formalize the uncertainty for all the agents. A random economy E = (, e) is a measurable mapping, or simply a process, from the Loeb product space (T × , L λ⊗P (T ⊗ A), (λ ⊗ P) L ) to the space of agents’ characteristics P × R+ such that the vector T × e(t, ω)d(λ ⊗ P) L is finite. Here, (t,ω) and e(t, ω) are, respectively, the preference relation and the endowment of agent t at sample realization ω, and T × e(t, ω)d(λ ⊗ P) L is the expected mean endowment of the random economy. For almost all t ∈ T , Et is a measurable mapping from the sample space to P ×R+ , and et (·) represents the possible random shocks in agent t’s endowment, while t (ω) represents the agent’s random preferences. We can also view E as a deterministic economy with the Loeb product space (T × , L λ⊗P (T ⊗ A), (λ ⊗ P) L ) as the space of economic agents. The relevant economic notions defined in the previous paragraphs can be trivially restated for this case. For almost all ω ∈ , Eω is a measurable mapping from the space of economic agents (T, L λ (T ), λ L ) to the space of agents’ characteristics P × R+ such that the mean endowment T eω (t)dλ L is finite, and hence an exchange economy in the usual sense. The purpose of this section is to illustrate how Theorems 9.5.7 and 9.5.8 together with known results on measure-theoretic economies can be used to derive results on the probabilistic stability of the random economies Eω . Here we also point out that the terminology of “random economies” is used for both E and Eω . In the following theorem, we show that when the primitive economic data are almost independent for different economic agents, the set of equilibrium prices for Eω is in general nonempty and does not depend on particular sample realizations. By Theorem 9.5.8, we can also find an almost independent equilibrium g for the deterministic economy E through a redistribution of an equilibrium for E. The almost independence condition then guarantees that gω is an equilibrium for Eω for almost all ω. That is, g also induces an equilibrium essentially per state of nature, and it is thus a “global” equilibrium which, in addition, preserves the microstructure on independence. Moreover, the distribution of every Walras allocation can be achieved by such an independent “global” equilibrium. Here the random economies Eω are compared with a common reference economy E, which is viewed as a deterministic economy with the Loeb product space as the space of agents. Note that when E is viewed as a deterministic economy, an arbitrary equilibrium may contain widespread correlation due to multiple optimal choices. Theorem 9.6.1 Let E be a random economy. Assume that the expected mean endowment T × e(t, ω)d(λ ⊗ P) L is strictly positive, and for (λ ⊗ P) L -almost all
382
Y. Sun
(t, ω) ∈ T × , (t,ω) is monotonic; i.e., for 0 ≤ x ≤ y, x = y implies that y (t,ω) x. If the Et ’s are almost independent, then we have the following. (1) There is a competitive equilibrium ( p, g) for E, viewed as a deterministic economy, such that the gt ’s are almost independent, and for almost all ω ∈ , ( p, gω ) is a competitive equilibrium for the economy Eω . Moreover, the distribution of every Walras allocation of E can be attained by such an independent Walras allocation g. (2) For almost all ω ∈ , the economies Eω and E have the same mean excess demand correspondence and the same nonempty set of strictly positive equilibrium price systems. (3) For any fixed equilibrium price p of E, essentially all the economies Eω have the same set of distributions for the relevant Walras allocations with equilibrium price p as that of the economy E, viewed as a deterministic economy on the product space. Proof We first consider (2). By the exact law of large numbers in distribution as presented in Theorem 8.5.7 of Chap. 8 (also Theorem 4 in [102] or Theorem 5.2 in [105]), the independence assumption on the economy E implies that for almost all ω ∈ , the economy Eω has the same preference-endowment distribution as E. Since both economies are atomless, Proposition 4 on p. 114 in [43] shows that the mean demand correspondences (Eω , p) and (E, p) are the same, and hence, so are the relevant mean excess demand correspondences Z (Eω , p) and Z (E, p). Since p is an equilibrium price for E (or Eω ) if and only if 0 ∈ Z (E, p) (or 0 ∈ Z (Eω , p)), it is thus obvious that Eω and E have the same set of equilibrium prices for almost all ω ∈ . Since the expected mean endowment is strictly positive, Theorem 2 on p. 151 in [43] implies that E indeed has at least one strictly positive equilibrium price. Note that the monotonicity assumption also implies that any equilibrium price must be strictly positive. For (3), fix an equilibrium price p. Since ϕ(, e, p) has a measurable graph, the individual demand correspondence ϕ(E(t, ω), p) for the economy E, viewed as a deterministic economy, is thus measurable. For simplicity, we denote ϕ(E(t, ω), p) by F(t, ω). Note that for any ω ∈ , Fω is the individual demand correspondence for the economy Eω . By the almost independence assumption on the economy E, it is easy to see that the correspondences Ft are almost independent. By Theorem 9.5.7, D Fω = D F for almost all ω ∈ . By the exact law of large numbers in Theorem 8.5.7 of Chap. 8 (or simply Corollary version), we obtain 9.5.9 in the point-valued the fact that for almost all ω ∈ , T × e(t, ω)d(λ ⊗ P) L = T eω (t)dλ L (t). Now it is obvious that the set Mω = μ ∈ D Fω : is the same as the set
x∈R+
xdμ = T
eω (t)dλ L
9 Nonstandard Analysis in Mathematical Economics
M = μ ∈ DF :
383
x∈R+
xdμ =
T ×
e(t, ω)d(λ ⊗ P) L
for almost all ω ∈ . Note that Mω (or M) is set of distributions of the Walras allocations of the economy Eω (or E) with equilibrium price p. Thus (3) is proven. Finally, we consider (1). Note that (2) already shows the existence of at least one competitive equilibrium. Let ( p, f ) be a competitive equilibrium of the economy E and F the individual demand correspondence under the price p defined in the previous paragraph. By Theorem 9.5.8, there is a selection g of F such that g has the same distribution μ as f and the random variables gt are almost independent. It is clear that ( p, g) is also a competitive equilibrium of the economy E. By the exact law of large numbers in distribution again, it follows that for almost all ω ∈ , gω has the same distribution μ as g. For almost all ω ∈ , since T × e(t, ω)d(λ ⊗ P) L = T eω (t)dλ L (t), we thus have
T
gω (t)dλ L (t) =
T ×
g d(λ ⊗ P) L =
T ×
e d(λ ⊗ P) L = T
eω (t)dλ L (t).
This shows that (gω , p) is a competitive equilibrium for essentially every economy Eω , and thus g is a “global” equilibrium.
9.7 General Risk Analysis and Asset Pricing 9.7.1 General Risk Analysis for Large Markets In the previous Chap. 8 and Sect. 9.5 of this chapter, idiosyncratic risks that involve randomness on the individual level but not on a large scale are simply characterized by the almost sure versions of uncorrelatedness or independence. In this section, we study the general structure of risks or uncertainty coming from a large number of random entities (for example, a large number of stocks or consumers, or even a large number of physical or biological objects) that are not assumed to move in an idiosyncratic way. As in Sect. 9.6, one can use an atomless Loeb space (T, L λ (T ), λ L ) to index a large number of random entities, and another atomless Loeb space (, L P (A), PL ) to model the uncertainty for all the entities in the market. The random motions of all the entities are modeled by a measurable mapping or simply a process f from the Loeb product space (T × , L λ⊗P (T ⊗ A), (λ ⊗ P) L ) to some relevant space. As in Sect. 8.5 in Chap. 8, let U denote the usual product σ-algebra L λ (T ) ⊗ L P (A). For an integrable real-valued process f on the Loeb product space, let E( f |U ) be the conditional expectation of f with respect to U, and set e = f − E( f |U ). It is obvious that E(e|U ) = 0, and hence Theorem 8.5.5 implies that the residual process e satisfies the law of large numbers. Thus, it only remains
384
Y. Sun
to understand the structure of E( f |U ). For this purpose, we assume that f is square integrable. Without loss of generality, we also assume that f is a centered process, i.e., E f t = 0 for all t ∈ T (otherwise, consider the process f − E f t instead). By generalizing the Karhunen-Loéve type biorthogonal expansion for continuous time processes ([79], p. 457) to our setting, we can obtain a biorthogonal expansion for E( f |U ). Such a type of expansion is called a proper orthogonal decomposition in [76] (p. 144); it can also be viewed as the continuous version of the expansion in classical principal components analysis (see [9]). Theorem 9.7.1 below is a combination of the results on E( f |U ) and on the residual process e. Note that the biorthogonality property requires not only the orthogonality for the relevant functions ϕn on (called factors) but also for the functions ψn on T (called loadings). Theorem 9.7.1 Let f be a real-valued square integrable centered process. Then the structure of f can be expressed as f (t, ω) = ∞ λ ψ n=1 n n (t)ϕn (ω) + e(t, ω) with the following properties. (1) λn , for 1 ≤ n < ∞, is a decreasing sequence of positive numbers; the collection {ψn : 1 ≤ n < ∞} is orthonormal, and {ϕn : 1 ≤ n < ∞} is a collection of orthonormal and centered random variables. (2) E( f |U )(t, ω) = ∞ n=1 λn ψn (t)ϕn (ω), and E(e|U ) = 0. orthogonal, which is to say that for (3) The random variables et are almost surely (λ ⊗ λ) L -almost all (t1 , t2 ) ∈ T × T , et1 (ω)et2 (ω)d PL (ω) = 0. (4) If p is a square integrable real-valued function on (T, L λ (T ), λ L ), then for PL almost all ω ∈ , T p(t)eω (t)dλ L = 0; i.e., the individual risks as described by the residual process e are canceled by aggregation in various ways, and p(t) f (t, ω)dλ L (t) = T
∞ n=1
λn
p(t)ψn (t)dλ L (t) ϕn (ω). T
Proof For notational simplicity, denote E( f |U) by g and f − g by e. Define a function R f on T × T by letting R f (t1 , t2 ) = f (t1 , ω) f (t2 , ω)d PL (called the autocorrelation function of the process f in [79], p. 282). We shall first show that R f is (λ ⊗ λ) L -almost surely equal to the L λ (T ) ⊗ L λ (T )-measurable autocorrelation function Rg of the process g. By Theorem 8.5.5, the equality E(e|U ) = 0 implies that Eeω = 0 for PL -almost all ω ∈ . By symmetry, we can obtain Eet = 0 for almost all t ∈ T . Since both f and g are square integrable, Keisler’s Fubini theorem implies that there is a measurable subset C1 of T with λ L (C1 ) = 1 such that for any t ∈ C1 , f t , gt and et are PL -square integrable. Thus, for any fixed t ∈ C1 , E( f t e|U ) = 0, which also implies that E( f t et ) = 0 for almost all t ∈ T . Hence, for (λ ⊗ λ) L -almost all (t1 , t2 ) ∈ T × T , E( f t1 et2 ) = 0. Similarly, for (λ ⊗ λ) L -almost all (t1 , t2 ) ∈ T × T , E(et1 gt2 ) = 0, whence
9 Nonstandard Analysis in Mathematical Economics
385
E( f t1 f t2 ) = E( f t1 gt2 ) + E( f t1 et2 ) = E(gt1 gt2 ) + E(et1 gt2 ) = E(gt1 gt2 ). That is, R f and Rg are identical (λ ⊗ λ) L -almost surely. Define an integral operator K on the space L 2 (λ L ) of square integrable functions on (T, L λ (T ), λ L ) by using R f as a kernel. That is, for each h ∈ L 2 (λ L ),
K (h)(t1 ) =
R f (t1 , t2 )h(t2 )dλ L (t2 ) = T
T
Rg (t1 , t2 )h(t2 )dλ L (t2 ).
Since Rg is square integrable on the usual product measure space, K is a HilbertSchmidt operator, and thus a compact operator (see [45], p. 170). It is also self-adjoint and semi-definite. Let γ1 , γ2 , . . . be the nonincreasing sequence of all the positive eigenvalues of K with each eigenvalue being repeated up to its multiplicity. Let ψ1 , ψ2 , . . . be the corresponding eigenfunctions adjusted to form an orthonormal family. This sequence of functions is called a complete eigensystem for K . Now, fix any bounded Loeb measurable functions α on T and β on . Let ψ be the projection of α on the range space of K , and let φ = α − ψ. Since the functions ψn form a complete orthonormal basis for the range space of K , it is also clear that ψ(·) =
∞
α(t)ψn (t)dλ L ψn (·).
n=1 T
Note that if there are only m positive eigenvalues for K , then we should replace the infinite sum by a finite sum of m terms. Define an operator F by letting F(h) = 2 f (t, ω)h(t)dλ L for each h ∈ L (λ L ). It is easy to see that T φ(t2 )K (φ)(t2 )dλ L (t2 ) = φ(t2 ) φ(t1 ) f (t1 , ω) f (t2 , ω)d PL dλ L (t1 )dλ L (t2 ) t2 ∈T t1 ∈T f (t1 , ω)φ(t1 )dλ L (t1 ) f (t2 , ω)φ(t2 )dλ L (t2 )d PL = t2 ∈T
t1 ∈T
=
2 f (t, ω)φ(t)dλ L
T
t2 ∈T
d PL =
(F(φ))2 d PL .
Since φ is orthogonal to every function in the range of the operator K , we know that φ(t 2 )K (φ)(t2 )dλ L (t2 ) = 0, and hence F(φ) = 0. Therefore F(α) = F(ψ). t2 ∈T
386
Y. Sun
Next we compute the following integrals. T ×
α(t)β(ω)
=
β(ω)
β(ω)
f (t , ω)ψn (t )dλ L (t )
α(t)ψn (t)dλ L (t)d PL (ω) T
t ∈T
f (t , ω)ψn (t )dλ L (t )ψn (t) d(λ ⊗ P) L (t, ω)
n=1 T
∞ n=1 T
=
∞
f (t , ω)
∞ n=1
α(t)ψn (t)dλ L (t) ψn (t )dλ L (t )d PL (ω)
t∈T
f (t , ω)ψ(t )dλ L (t )d PL (ω) = β(ω)F(ψ)(ω)d PL = β(ω)F(α)(ω)d PL = β(ω) f (t, ω)α(t)dλ L d PL = α(t)β(ω) f (t, ω)d(λ ⊗ P) L . =
β(ω)
t ∈T
T ×
T
∞
Since n=1 T f (t , ω)ψn (t )dλ L (t )ψn (t) is L λ (T ) ⊗ L P (A)-measurable, we therefore have
E( f |U )(t, ω) =
∞
f (t , ω)ψn (t )dλ L (t )ψn (t)
n=1 T
by the arbitrary choices of the α and β. 1/2 Let λn = γn and ϕn (ω) = λ1n T f ω (t)ψn (t)d L(λ). Then E( f |U )(t, ω) = ∞ n=1 λn ϕn (ω)ψn (t), and thus Part (2) of the theorem is proved. It is also easy to check that f (t1 , ω)ψm (t1 )dλ L (t1 ) f (t2 , ω)ψn (t2 )dλ L (t2 )d PL (ω) = γn δnm ,
T
T
where δnm equals 0 if m = n and 1 if m = n. Hence, the random variables ϕn are orthonormal. By the definition of ϕn and the Fubini theorem, we know that ϕn has mean zero. Thus Part (1) of the theorem is established. Since E(e|U ) = 0, Theorem 8.5.5 implies that e satisfies the coalitional law. Part (3) of the theorem follows from Theorem 8.6.3 and the fact that e is a centered process. For a square λ L -integrable real-valued function p on T , we still have E( p e|U ) = 0, and hence Theorem 8.5.5 implies that for PL -almost all ω ∈ , T p(t)eω (t)dλ L = 0, whence Part (4) of the theorem follows. Remark 9.7.2 The idiosyncratic risks et may also be called unsystematic risks. A major strength of Theorem 9.7.1 is of course the fact that unsystematic risks completely disappear via broad diversification. Another strength of Theorem 9.7.1 is that the factors ϕn are endogenously derived from the observable autocorrelation function
9 Nonstandard Analysis in Mathematical Economics
387
of the original process f by the standard procedure of computing the eigenfunctions of some relevant compact, self-adjoint, semi-definite operator (see, for example, [41], p. 281). It is important to know that the loadings ψn and factors ϕn in the biorthogonal representation of E( f |U ) can be determined from f directly. Thus E( f |U ) can be approximated before we know what it is. As usual, linear combinations of factors may be called factor or systematic risks. In applications, we may simply take T to be a hyperfinite set and call the structural result in Theorem 9.7.1 the endogenous hyperfinite factor model. Remark 9.7.3 One can also define the autocorrelation function S of sample functions f ω by letting S(ω1 , ω2 ) = T f (t, ω1 ) f (t, ω2 )dλ L . Let L be the integral operator 2 with S serving as the kernel function. Since S(ω1 , ω2 ) = ∞ n=1 λn ϕn (ω1 )ϕn (ω2 ), the factors ϕn thus form a complete eigensystem for the Hilbert-Schmidt operator L. Theorem 9.7.1 provides a general structure for a real-valued square integrable process on the Loeb product space. We shall now take a different point of view. Consider a general large market whose random motions are modeled by a Loeb product measurable process f . The collection L P (A) of all the random events in the market may be very large. It is particularly so when the individual entities are also influenced by their own random sources. However, Theorem 9.7.5 below shows that those events related to the coalitional randomness form a countably generated σ-algebra. This means that one can generate this σ-algebra with just one random variable, and then the individual entities move independently conditioned on this single random source (see Remark 8.8.3 for a very special case in terms of exchangeability). An exact version of the conditional law of large numbers is automatically valid. Definition 9.7.4 Let f be a process from the Loeb product space to a Polish space X . Fix A ∈ L λ (T ) with λ L (A) > 0; the set A is called a coalition. Let f A be the restriction of f to A × . For PL -almost all ω ∈ , the sample function f ωA is Loeb measurable from (A, L λ (T ) ∩ A, λ LA ) to X , where λ LA is the probability measure on A rescaled from λ L . The sample distribution A (ω) = λ LA ( f ωA )−1 defines a Loeb measurable mapping A from the sample space to the space M(X ) of distributions on X endowed with the topology of weak convergence of distributions. Then, the randomness on the coalition A is the σ -algebra σ( A ) of events generated in the whole by the mapping A . Thus, the total randomness at the coalitional level A
, which is σ
market is modeled by the σ-algebra C = σ A∈L λ (T ),λ L (A)>0 the σ-algebra generated by all the events in σ( A ) for all coalitions A. Theorem 9.7.5 Let C be the σ-algebra summarizing the total coalitional randomness in the market. Then (1) C can be generated by a real-valued random variable α; (2) for almost all (t1 , t2 , . . . , tn ), f t1 , f t2 , . . . , f tn are independent conditioned on α. (3) f satisfies the law of large numbers in distribution conditioned on α, i.e., the random variable defined by the sample distributions is constant conditioned on α.
388
Y. Sun
9.7.2 The Equivalence of Exact No Arbitrage and APT Pricing In this subsection, we apply the general framework for risk analysis to the case of asset pricing. Let x be a real-valued square integrable process on the Loeb product space. For each t ∈ T , xt is the random return of asset t, μ(t) is its expected return, and f t = xt −μ(t) is its net random return. The process f is now centered and models the total risk in the asset market. Theorem 9.7.1 presented the following endogenous factor model xt (ω) − μ(t) = f t (ω) =
∞
λn ψn (t)ϕn (ω) + et (ω)
n=1
with properties (1)–(4). A portfolio p is simply a λ L -square integrable function with domain T . Its (ω) = random and expected returns are, respectively, R p T p(t)x(t, ω)dλ L and E( p) = T p(t)μ(t)dλ L . Theorem 9.7.1 says that the unsystematic risk part completely disappears in such a diversified portfolio p.Thus, the portfolio risk R p −E( p) λ p(t)ψ (t)dλ (t) ϕn as in Part has only the systematic risk component ∞ n n L n=1 T (4) of Theorem 9.7.1. Thus, the variance V ( p) of the portfolio risk is simply the sum 2 ∞ 2 n=1 λn T p(t)ψn (t)dλ L . The cost C( p) of the portfolio is T p(t)dλ L . Definition 9.7.6 The asset market is said to satisfy the exact no arbitrage condition if for any portfolio p, V ( p) = C( p) = 0 =⇒ E( p) = 0. The following theorem characterizes the validity of an APT pricing equation by the exact no arbitrage condition. Theorem 9.7.7 The market satisfies the exact no arbitrage condition if and only if there is a sequence {γn }∞ n=0 of real numbers such that for λ L -almost all t ∈ T , ∞ μ(t) = γ0 + n=1 γn ψn (t). Proof For a portfolio p, let pr be the projection of p on the closed subspace spanned by the constant function 1 and all the loadings ψn . Set ps = p − pr ; let μr and μs be defined accordingly. If p is costless and riskless, i.e., C( p) = V ( p) = 0, then p is orthogonal to 1 and to all the loadings ψn . It follows that pr = 0, and hence
ps (t)μ(t)dλ L (t) =
E( p) = T
ps (t)μs (t)dλ L (t). T
Thus, the exact no arbitrage condition is equivalent to the equality T ps (t)μs (t) dλ L (t) = 0 holding for any ps . If the APT pricing equation holds, then μs = 0, and so the exact no arbitrage condition is satisfied. the other hand, if the market has exact no arbitrage, then we should have On 2 (t)dλ = 0, and thus μ (t) = 0 for λ -almost all t ∈ T . By the definition μ L s L s T
9 Nonstandard Analysis in Mathematical Economics
389
of μr , there is a sequence of real numbers {γn }∞ n=0 such that μ(t) = μr (t) = γ0 + ∞ γ ψ (t) for λ -almost all t ∈ T . L n=1 n n
9.8 Independent Universal Random Matching Independent random matching is important in the economics and genetics literature. Since a random matching with finitely many agents must induce correlations among the agents, one needs to consider a continuum population of agents in order to work with independent random matching. In this section, we study the very special case of independent universal random matching in which are the agents are matched with each other in a uniform and independent way. With probability one, the proportion of agents from a given set matched with agents from another given set is the product of the proportions of two given sets of agents. Static and dynamic random matchings with various other properties are considered in [18–21, 99]; nonstandard analysis provides the key role in constructing those random matchings. Definition 9.8.1 (Independent universal random matching) 1. A full matching φ is a bijection from T to T such that for each i ∈ T , φ(i) = i and φ(φ(i)) = i. 2. A random full matching π is a mapping from T × to T such that πω is a full matching for each ω ∈ ; 3. A random full matching π from T × to T is said to be a random universal matching if pi is a measurable mapping from (T × , L λ⊗P (T ⊗ A), (λ ⊗ P) L ) to (T, L λ (T ), λ L ) such that (i) for each ω ∈ , λ L (πω−1 (A)) = λ L (A) for any A ∈ L λ (T ), (ii) for each i ∈ T , PL (πi−1 (A)) = λ L (A) for any A ∈ L λ (T ), (iii) for any A1 , A2 ∈ L λ (T ), λ L (A1 ∩ πω−1 (A2 )) = λ L (A1 )λ L (A2 ) holds for PL -almost all ω ∈ ; 4. A random universal matching π is said to be independent if πi and π j are independent for any i = j ∈ T . The following theorem shows the existence of an independent random universal matching. Theorem 9.8.2 Let (T, L λ (T ), λ L ) be a hyperfinite Loeb counting probability space with T = {1, 2, . . . , N } for some unlimited even hyperfinite integer in ∗ N∞ . Then there exists a hyperfinite Loeb counting probability space (, L P (A), PL ) and an independent random universal matching π from the Loeb product space (T × , L λ⊗P (T ⊗ A), (λ ⊗ P) L ) to (T, L λ (T ), λ L ). Proof We can draw agents from T in pairs without replacement; and then match them in these pairs. The procedure can be the following. Take one fixed agent; this agent can be matched with N − 1 different agents. After the first pair is matched, there are N −2 agents. We can do the same thing to match a second pair with N −3 possibilities.
390
Y. Sun
Continue this procedure to produce a total number of 1×3×· · ·×(N −3)×(N −1), denoted by (N − 1)!!, different matchings. Let be the space of all such matchings, A the collection of all internal subsets of , and P the internal counting probability measure on A. Let (, L P (A), PL ) be the Loeb space of the internal probability space (, A, P). Let (T × , T ⊗ A, λ ⊗ P) be the internal product probability space of (T, T , λ) and (, A, P). Then T ⊗ A is actually the collection of all the internal subsets of T × and λ ⊗ P is the internal counting probability measure on T ⊗ A. Let (T × , L λ⊗P (T ⊗ A), (λ ⊗ P) L ) be the Loeb product space, which is the Loeb space of the internal product (T × , T ⊗ A, λ ⊗ P). Now, for a given matching ω ∈ and a given agent i, let π(i, ω) be the unique j such that the pair (i, j) is matched under ω. Since π is an internal mapping, it is clear that it is a measurable mapping from (T × , L λ⊗P (T ⊗ A), (λ ⊗ P) L ) to (T, L λ (T ), λ L ). For each ω ∈ , since πω is an internal bijection on T , it is obvious that πω is measure-preserving from the Loeb space (T, L λ (T ), λ L ) to itself. Thus, Property (i) of a random universal matching is shown. Fix any agent i ∈ T . It is obvious that P({ω ∈ : πi (ω) = j}) =
1 N −1
for any j = i; that is, the ith agent is matched with equal chance to other agents. Thus, for any internal set C ∈ T , we obtain that P(ω ∈ : πi (ω) ∈ C) is |C|/(N − 1) if i ∈ / C, and (|C| − 1)/(N − 1) if i ∈ C, where |C| is the internal cardinality of C. This means that |C| = λ(C) λ L (C). P(πi−1 (C)) N We can also obtain that (λ ⊗ P)(π −1 (C)) =
i∈T
P(πi−1 (C)) dλ(i) λ(C) λ L (C).
For any Loeb measurable set B ∈ L λ (T ) and for any standard positive real number , there are internal sets C and D in T such that C ⊆ B ⊆ D and λ(D − C) < . Thus πi−1 (C) ⊆ πi−1 (B) ⊆ πi−1 (D), and P(πi−1 (D) − πi−1 (C)) λ(D − C) < , which implies that πi−1 (B) is Loeb measurable in L P (A). Also, λ(C) ≤ P(πi−1 (B)) ≤ λ(D), and thus |PL (πi−1 (B)) − λ L (B) | ≤ λ(D − C) ≤ for any standard positive real number . This means that PL (πi−1 (B)) = λ L (B). Therefore, πi is a measure-preserving mapping from the Loeb space (, L P (A), PL ) to the Loeb space (T, L λ (T ), λ L ). Thus, Property (ii) of a random universal matching
9 Nonstandard Analysis in Mathematical Economics
391
is shown. Similarly, we can show that π is a measure-preserving mapping from (T × , L λ⊗P (T ⊗ A), (λ ⊗ P) L ) to (T, L λ (T ), λ L ). Next, for i = j, consider the joint event E = {ω ∈ : (πi (ω), π j (ω)) = (i , j )}, that is, the ith agent is matched to the i th agent and the jth agent is matched to the j th agent. In order to show the measure-preserving property of the mapping (πi , π j ) in the following paragraph, we need to know the value of P(E) in three different cases. The first case is (i , j ) = ( j, i), that is, the ith agent is matched to the jth agent and the jth agent is matched to the ith agent. In this case, P(E) = 1/(N − 1). The second case is that of P(E) = 0, which holds when i = i or j = j (the ith agent is matched to herself, or the jth agent is matched to herself), or when i = j but j = i (the ith agent is matched to the jth agent, but the jth agent is not matched to the ith agent), or when j = i but i = j (the jth agent is matched to the ith agent, but the ith agent is not matched to the jth agent), or when i = j (both the ith agent and the jth agent are matched to the same agent). The third case applies if the indices i, j and i , j are completely distinct. In this third case, after the pairs (i, i ), ( j, j ) are drawn, there are N − 4 agents left, and hence there are (N − 5)!! ways to draw the rest of the pairs in order to complete the matching. This means that P(E) = (N − 5)!!/(N − 1)!! = 1/((N − 1)(N − 3)). Let (T × T, T ⊗ T , λ ⊗ λ) be the internal product of (T, T , λ) with itself, and (T × T, L λ⊗λ (T ⊗ T ), (λ ⊗ λ) L ) the Loeb space of the internal product. Fix any i, j ∈ T with i = j. Let D be the diagonal {(i , i ) : i ∈ T }. The third case of the above paragraph implies that for any internal set G ∈ T ⊗ T , P({ω ∈ : (πi (ω), π j (ω)) ∈ G − (D ∪ ({i, j} × T ) ∪ (T × {i, j}))}) =
|G| |G − (D ∪ ({i, j} × T ) ∪ (T × {i, j})) | = (λ ⊗ λ)(G). (N − 1)(N − 3) (N )2
By using the formula for P(E) in the first two cases, we can obtain that P({ω ∈ : (πi (ω), π j (ω)) ∈ (D ∪ ({i, j} × T ) ∪ (T × {i, j}))}) =
1 0. N −1
The above two equations imply that P({ω ∈ : (πi (ω), π j (ω)) ∈ G}) (λ ⊗ λ)(G). The same proof for showing the measure-preserving property of πi can be adapted to prove that (πi , π j ) is a measure-preserving mapping from (, L P (A), PL ) to (T × T, L λ⊗λ (T ⊗ T ), (λ ⊗ λ) L ). This implies that the mappings πi and π j are independent. In fact, for any positive integer r , and for any distinct i 1 , i 2 , . . . , ir ∈ I ,
392
Y. Sun
the same idea of the proof can be used to prove that the mappings πi1 , πi2 , . . . , πir are independent. Finally, take any A1 , A2 ∈ L λ (T ), and let f (i, ω) = 1 A1 (i)1 A2 (π(i, ω)) for all (i, ω) ∈ T × . Then, the random variables f i are pairwise independent. The / A1 , and has probabilities λ L (A2 ) on distribution of f i has probability 1 on {0} if i ∈ {1} and 1 − λ L (A2 ) on {0} if i ∈ A1 . By applying the exact law of large numbers as in Theorem 8.5.3 to f A1 , one has for PL -almost all ω ∈ , λ L (A1 ∩ πω−1 (A2 )) = λ L (A1 )λ L (A2 ). Hence, Property (iii) of a random universal matching follows.
9.9 Notes The results in Sect. 9.2.1 are simplified versions of some selected results in [103]. The purpose of these results is to show some special important properties for distributions of correspondences on Loeb spaces that are not shared by general probability spaces. It is not the aim here to use Loeb spaces to reconsider some results that are known to be true on general probability spaces. Distributions of integrably bounded correspondences from a general probability space to Rn were already considered in [36, 43]. More general correspondences taking compact values in a complete separable metric space were studied in [5]. The focus for them in these papers was on the relationship between the distribution of a correspondence and the distributions of its selections. The result on selectionable distributions in [5], which is valid for correspondences from a general atomless probability space to a Polish space, was extended to the case of a possibly nonseparable and non-metrizable topological space X in [92] where some hyperfinite Loeb counting space in a 2card(X ) -saturated model was used. It needs to be noted that if a result holds on a general probability space, it automatically holds on a Loeb probability space. The point here is to obtain results with the meaningful Polish space structure that are valid on Loeb spaces but are not valid on general probability spaces. Since the existence of a mixed-strategy Nash equilibrium has already been shown by Nash in [78] for games with an arbitrary number n of players (no matter how large n is), the economic interest of studying games with many players comes from the consideration of pure-strategy Nash equilibria. The general purification procedure in [23–25] naturally leads one to the case of games with finite action spaces [94]. Thus the interesting problem is what happens when the action space is not finite. For the case that players in a large game choose their best actions in a compact set in Rn according to the average of others’ actions, a solution of the existence problem for pure-strategy equilibria was obtained by Rath [89]. On the other hand, for the case that the players make their decisions based on the distribution of others’ actions, the nonexistence result in Sect. 9.3.2 for games whose space of player names is the unit Lebesgue interval was first established in [91] with further exposition in [50]. A more general version of the positive result in Theorem 9.3.2 of Sect. 9.3.2 for large games in the Loeb space setting can be found in Theorem 1 in [60]. Some variations of large games such as large games with traits and large games with transformed
9 Nonstandard Analysis in Mathematical Economics
393
summary statistics are considered respectively in [53, 116]. Instead of focusing on the existence or nonexistence of Nash equilibria, [30] provides characterizations of Nash equilibria in large games in several different settings. Theorem 7 of [62] (p. 1792) and Theorem 2 of [54] consider a randomized Nash equilibrium keeping the equilibrium property after the realization of uncertainty (i.e., ex post Nash equilibrium property). Hyperfinite models are also used in [96] for the study of equilibrium refinements in finite-player games with infinite actions. The nonexistence result in Sect. 9.4.1 for games with incomplete information is taken from [51]. The discrete version as presented in Sect. 9.4.2 appears as Example 3 in [60]. Since Lusin’s theorem says that a Lebesgue measurable function must be essentially continuous, it is thus obvious that the limit of the functions f 1n , f 2n in Proposition 9.4.6 cannot be found on the unit Lebesgue interval due to their high oscillation. The point is, however, that they form approximate equilibria for the explicitly constructed sequence of games, and the Lebesgue framework does fail to capture such game-theoretic phenomena. Theorem 9.4.10 in Sect. 9.4.3 on the existence of pure-strategy Nash equilibria in the Loeb space setting is Theorem 3 in [60]. Such an existence result is further extended in Sect. 4 of [73] for incomplete information games with both private and public information, which generalizes the corresponding result in the finite-action case in [29, 31] to the case of compact metric action spaces. Star finite expansion of information structures is used in [98] for studying games with incomplete information while hyperfinite sets are used in [97] to provide a direct interpretation of finitistic equilibria in infinite normal form games. Since hyperfinite Loeb spaces have the property of asymptotic implementation, Theorems 9.3.2 and 9.4.10 can be rewritten in the asymptotic form as in Sect. 5 in [60]. Asymptotic results are also obtained in [53, 54, 57, 58, 60, 105–107] and [109] by translating the corresponding exact results on Leob spaces. Several other advantages of the hyperfinite Loeb space over the Lebesgue space are also given in [59, 60] (a brief announcement is in [57]). These general properties provide some evidences for the claim that games on hyperfinite Loeb spaces constitute a viable model for situations where individual players are strategically negligible, or where information is diffused. Proposition 9.2.11 is Theorem 3 in [103]. It shares some similarity with Cutland’s nonstandard characterization of standard relaxed controls from [0, 1] to M(A) in [15], where A is a compact subset of Rn . In [15], one needs to consider a real interval since it represents the time space where some differential system is defined. Thus a relaxed control on the standard unit Lebesgue interval [0, 1] corresponds to some internal control on another probability space, the hyperfinite Loeb counting space. In the atomless games considered here, a probability space of player names or sample realizations just represents the meaning of a large number of players or samples. It is not really meaningful to impose a topological structure on player names; and this may indeed cause trouble as shown in Sect. 9.3.2. Thus, a Loeb probability space is a primitive object to work with. For a given mixed strategy on the Loeb space, it is not meaningful to find a pure strategy on another probability space corresponding to it. This statement is in fact true on any atomless probability space without the Loeb
394
Y. Sun
space restriction. The point is that the equivalent pure strategy must be on the same underlying Loeb probability space as in [23–25]. Note that the requirement that the pure strategy f must be a selection of the support correspondence of G is essential here. Without it, the rest of the result again holds on any atomless probability space. The concept of a saturated probability space is introduced in [44]. Section 5 of [49] demonstrates a general technique for extending certain types of results from atomless Loeb probability spaces (or even the simplest hyperfinite Loeb counting spaces) to the larger class of saturated probability spaces. Thus, the distribution theory of correspondences as presented in Sect. 9.2.1 can be restated on saturated probability spaces as in Proposition 5.2 of [49]. Theorem 9.3.2 can also be generalized to large games with a saturated agent space as in Proposition 5.3 of [49]. In addition to such positive results, converse results are obtained for showing the necessity of saturated probability spaces. One can do similar generalizations and derive necessity results for integration of correspondences, and games with many players or incomplete information; see [65, 81, 85, 100, 111]. Based on the concept of setwise coarseness or nowhere equivalence, further extensions are presented in [37–40]. The general procedure of Dvoretsky, Wald and Wolfowitz, as specified in [23–25, 52], for the elimination of randomization only works for finite action spaces. Loeb measure spaces are used in [72] to purify measure valued mappings on a general compact metric space. Further generalizations to saturated probability spaces are considered in [74, 82]. In the general measure-theoretic framework, the distribution and integration theory of correspondences and the existence theory of pure-strategy equilibria for the games considered here may fail when the relevant target space or action space is uncountable. If the relevant space is finite, both theories are valid. One might ask what happens if the relevant space is only countable. This question is answered in [55, 56]. The results in Sect. 9.5 are taken from [107]. Theorems 9.5.7 and 9.5.8 are, respectively, Theorems 1 and 2 in [107]. Note that the independence condition used here and in Chap. 8 on Loeb spaces is the usual notion of independence; it is an external condition from the nonstandard point of view. Our purpose has been to study independence from a measure-theoretic point of view in the usual sense. Thus, the external independence condition is crucial. It is important not to confuse it with the ∗-independence condition obtained by transferring the independence condition in the standard model. Any result on a sequences of independent random variables can be transferred to a result on some ∗-independent hyperfinite process immediately (including the law of large numbers for random variables). Though ∗-independence is not relevant for our study of independence via the Loeb product space, it is useful in nonstandard probabilistic arguments in other situations. Economies with a hyperfinite number of agents were considered originally in [11, 12] by Brown et al. Keisler proved in [46] a law of large numbers for certain kinds of weakly interacting particle systems. It could also be interpreted as a dynamic trading process with price adjustment where a randomly chosen agent trades with a central warehouse or market maker at any given time. In the nonstandard formulation,
9 Nonstandard Analysis in Mathematical Economics
395
the random selection of traders involves an internal independence condition (i.e., ∗independence; see also [1], p. 2202), which corresponds to the usual independence condition in the large finite formulation. Note that internal arguments can often be naturally translated to large finite arguments. Indeed, some further work in [47] and [48] in this direction only involves standard approximate arguments. The model considered by Keisler has some attractive features. The prices are adjusted slowly enough to allow trading at a non-equilibrium price by a few traders but fast enough to avoid prolonged trading at the wrong price. Theorem 9.6.1 in Sect. 9.6 is Theorem 3 in [107] . Similar results for competitive equilibria and core of economies with indivisible commodities can be found in Theorems 4 and 5 in [107]. The point of this study is to demonstrate how exact law of large numbers based on Loeb product spaces can be used together with classical results on measure-theoretic economies to derive much stronger results on individual uncertainty than the discrete approach used in [42]. It is crucial to use the external condition of almost independence here. Note that if one just wants to use the ∗independence condition in this context, one can simply transfer the result in [42] to the hyperfinite setting. Of course, nothing new is obtained this way. The exact law of large numbers is also used in [108–110, 112] for the study of Pareto efficiency, competitive equilibrium, core, and rational expectations equilibrium in private information economies. Theorem 9.7.1 is a combination of the results in Theorems 1–3 in [102], which also appears as Corollary 4.8 in [105]. Theorem 9.7.5 is a special case of Theorem 3 in [84] (which is based on Theorem 1 of [35]). Theorem 9.7.7 is Theorem 1 in [58, 63]. Detailed portfolio analysis for large asset markets is given in [64]. By using the same arguments, most of those results can also be restated in the slightly more general setting of a Fubini extension; see [114]. Note that the point of introducing a continuum model is to have formulations of conditions and conclusions in the exact sense. The exact no arbitrage condition captures the intuitive notion of the statement “if there is no cost and no risk, then there is no gain”. If one still uses the usual asymptotic no arbitrage condition in the continuum setting, then either methods in the classical asymptotic models apply directly or nothing needs to proven about the pricing relationship. See also [61, 69] for the systematic study of APT pricing under the condition of asymptotic no arbitrage. The APT model is a static asset pricing model. For some applications of nonstandard analysis to the study of equilibrium in a continuous-time securities market, see [3, 4, 87]. Theorem 9.8.2 is taken from Theorem 2.4 of [18]. Based on some delicate constructions in nonstandard analysis, [18] provides a construction of discretetime Markov independent dynamical systems with random mutation, and with type changes induced by partial or full bilateral random matching. It is assumed in [18, 19, 99] that when a given agent is matched, the paired agent is drawn uniformly from the population of other agents to be matched. Duffie, et al. [20] demonstrates the existence of a hyperefinite number of agents conducting dynamic directed random searches for counterparties (i.e., with nonlinear matching probabilities), and characterize the implications while [21] considers continuous time random matching.
396
Y. Sun
References 1. R.M. Anderson, Non-standard analysis with applications to economics, in Handbook of Mathematical Economics IV, ed. by W. Hildenbrand, H. Sonnenschein (North-Holland, New York, 1991), pp. 2145–2208 2. R.M. Anderson, The core in perfectly competitive economies, in Handbook of Game Theory I, ed. by R.J. Aumann, S. Hart (North Holland, Amsterdam, 1994), pp. 413–457 3. R.M. Anderson, R.C. Raimondo, Market clearing and derivative pricing. Econ. Theory 25, 21–34 (2005) 4. R.M. Anderson, R.C. Raimondo, Equilibrium in continuous-time financial markets: endogenously dynamically complete markets. Econometrica 76, 841–907 (2008) 5. Z. Artstein, Distributions of random sets and random selections. Isr. J. Math. 46, 313–324 (1983) 6. Z. Artstein, R.A. Vitale, A strong law of large numbers for random compact sets. Ann. Probab. 3, 879–882 (1975) 7. J.P. Aubin, H. Frankowska, Set Valued Analysis (Birkhäuser, Boston, 1990) 8. R.J. Aumann, Markets with a continuum of traders. Econometrica 32, 39–50 (1964) 9. A. Basilevsky, Statistical Factor Analysis and Related Methods (Wiley, New York, 1994) 10. P. Billingsley, Convergence of Probability Measures (Wiley, New York, 1968) 11. D.J. Brown, P.A. Loeb, The values of nonstandard exchange economies. Isr. J. Math. 25, 71–86 (1976) 12. D.J. Brown, A. Robinson, Nonstandard exchange economies. Econometrica 43, 41–55 (1975) 13. C. Castaing, M. Valadier, Convex Analysis and Measurable Multifuctions, Lecture Notes in Mathematics, vol. 580 (Springer, New York, 1977) 14. D.J. Cohn, Measure Theory (Birkhäuser, Boston, 1980) 15. N.J. Cutland, Internal controls and relaxed controls. J. Lond. Math. Soc. 27, 130–140 (1983) 16. G. Debreu, Theory of Value (Wiley, New York, 1959) 17. G. Debreu, Integration of correspondences, in Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 2, Part 1 (1967), pp. 351–372 18. D. Duffie, Y.N. Sun, Existence of independent random matching. Ann. Appl. Probab. 17, 386–419 (2007) 19. D. Duffie, Y.N. Sun, The exact law of large numbers for independent random matching. J. Econ. Theory 147, 1105–1139 (2012) 20. D. Duffie, L. Qiao, Y.N. Sun, Dynamic directed random matching, working paper, Stanford University (2014) 21. D. Duffie, L. Qiao, Y.N. Sun, Continuous-time random matching, preliminary draft (2014) 22. N. Dunford, J. Schwartz, Linear Operators (Interscience, New York, 1958) 23. A. Dvoretsky, A. Wald, J. Wolfowitz, Elimination of randomization in certain problems of statistics and of the theory of games. Proc. Natl. Acad. Sci. USA 36, 256–260 (1950) 24. A. Dvoretsky, A. Wald, J. Wolfowitz, Relations among certain ranges of vector measures. Pac. J. Math. 1, 59–74 (1951) 25. A. Dvoretsky, A. Wald, J. Wolfowitz, Elimination of randomization in certain statistical decision procedures and zero-sum two-person games. Ann. Math. Stat. 22, 1–21 (1951) 26. E.B. Dynkin, V.L. Evstigneev, Regular conditional expectations of correspondences. Theor. Probab. Appl. 21, 325–338 (1976) 27. N. Etemadi, An elementary proof of the strong law of large numbers. Z. Wahrsch. Verw. Gebiete 55, 119–122 (1981) 28. K. Fan, Fixed points and minimax theorems in locally convex linear spaces. Proc. Natl. Acad. Sci. USA 38, 121–126 (1952) 29. H. Fu, Mixed-strategy equilibria and strong purification for games with private and public information. Econ. Theory 37, 521–532 (2008) 30. H. Fu, Y. Xu, L. Zhang, Pure-strategy Nash equilibria in large games: characterization and existence, preprint (2014)
9 Nonstandard Analysis in Mathematical Economics
397
31. H. Fu, Y.N. Sun, N.C. Yannelis, Z. Zhang, Pure strategy equilibria in games with private and public information. J. Math. Econ. 43, 523–531 (2007) 32. D. Fudenberg, J. Tirole, Game Theory (MIT Press, Cambridge, 1991) 33. R. Gibbons, A Primer in Game Theory (Harvester, New York, 1992) 34. I. Glicksberg, A further generalization of Kakutani’s fixed-point theorem with application to Nash equilibrium points. Proc. Am. Math. Soc. 3, 170–174 (1952) 35. P.J. Hammond, Y.N. Sun, Monte Carlo simulation of macroeconomic risk with a continuum of agents: the general case. Econ. Theory 36, 303–325 (2008) 36. S. Hart, E. Kohlberg, Equally distributed correspondences. J. Math. Econ. 1, 167–174 (1974) 37. W. He, X. Sun, On the diffuseness of incomplete information game. J. Math. Econ. 54, 131– 137 (2014) 38. W. He, X. Sun, Y.N. Sun, Modeling infinitely many agents, working paper, National University of Singapore (2013) 39. W. He, Y.N. Sun, The necessity of nowhere equivalence, working paper, National University of Singapore (2013) 40. W. He, Y.N. Sun, Conditional expectations of Banach valued correspondences, working paper, National University of Singapore (2013) 41. H.G. Heuser, Functional Analysis (Wiley, New York, 1982) 42. W. Hildenbrand, Random preferences and equilibrium analysis. J. Econ. Theory 3, 414–429 (1971) 43. W. Hildenbrand, Core and Equilibria of a Large Economy (Princeton University Press, Princeton, 1974) 44. D.N. Hoover, H.J. Keisler, Adapted probability distribution. Trans. Am. Math. Soc. 286, 159–201 (1984) 45. R.V. Kadison, J.R. Ringrose, Fundamentals of the Theory of Operator Algebras I (Academic Press, New York, 1983) 46. H.J. Keisler, A law of large numbers for fast price adjustment. Trans. Am. Math. Soc. 332, 1–51 (1992) 47. H.J. Keisler, Approximate tatonnement processes. Econ. Theory 5, 127–173 (1995) 48. H.J. Keisler, Getting to a competitive equilibrium. Econometrica 64, 29–49 (1996) 49. H.J. Keisler, Y.N. Sun, Why saturated probability spaces are necessary. Adv. Math. 221, 1584–1607 (2009) 50. M.A. Khan, K.P. Rath, Y.N. Sun, On the existence of pure strategy equilibria in games with a continuum of players. J. Econ. Theory 76, 13–46 (1997) 51. M.A. Khan, K.P. Rath, Y.N. Sun, On a private information game without pure strategy equilibria. J. Math. Econ. 31, 341–359 (1999) 52. M.A. Khan, K.P. Rath, Y.N. Sun, The Dvoretzky-Wald-Wolfowitz Theorem and purification in atomless finite-action games. Int. J. Game Theory 34, 91–104 (2006) 53. M.A. Khan, K.P. Rath, Y.N. Sun, H.M. Yu, Large games with a bio-social typology. J. Econ. Theory 148, 1122–1149 (2013) 54. M.A. Khan, K.P. Rath, Y.N. Sun, H.M. Yu, Strategic uncertainty and the ex-post Nash property in large games. Theor. Econ. 10, 103–129 (2015) 55. M.A. Khan, Y.N. Sun, Pure strategies in games with private information. J. Math. Econ. 24, 633–653 (1995) 56. M.A. Khan, Y.N. Sun, Integrals of set-valued functions with a countable range. Math. Oper. Res. 21, 946–954 (1996) 57. M.A. Khan, Y.N. Sun, Nonatomic games on Loeb spaces. Proc. Natl. Acad. Sci. USA 93, 15518–15521 (1996) 58. M.A. Khan, Y.N. Sun, The capital-asset-pricing model and arbitrage pricing theory: a unification. Proc. Natl. Acad. Sci. USA 94, 4229–4232 (1997) 59. M.A. Khan, Y.N. Sun, On Loeb measure spaces and their significance for non-cooperative game theory, in Current and Future Directions in Applied Mathematics, ed. by M. Alber, B. Hu, J. Rosenthal (Birkhäuser, Berlin, 1997), pp. 183–218
398
Y. Sun
60. M.A. Khan, Y.N. Sun, Non-cooperative games on hyperfinite Loeb spaces. J. Math. Econ. 31, 455–492 (1999) 61. M.A. Khan, Y.N. Sun, Asymptotic arbitrage and the APT with or without measure-theoretic structures. J. Econ. Theory 101, 222–251 (2001) 62. M.A. Khan, Y.N. Sun, Non-cooperative games with many players, in Handbook of Game Theory, Chapter 46, vol. 3, ed. by R.J. Aumann, S. Hart (North-Holland, Amsterdam, 2002), pp. 1761–1808 63. M.A. Khan, Y.N. Sun, Exact arbitrage, well-diversified portfolios and asset pricing in large markets. J. Econ. Theory 110, 337–373 (2003) 64. M.A. Khan, Y.N. Sun, Exact arbitrage and portfolio analysis in large asset markets. Econ. Theory 22, 495–528 (2003) 65. M.A. Khan, Y.C. Zhang, On the existence of pure-strategy equilibria in games with private information: a complete characterization. J. Math. Econ. 50, 197–202 (2014) 66. E. Klein, A.C. Thompson, Theory of Correspondences (Wiley, New York, 1984) 67. P.K. Kopp, Hyperfinite mathematical finance, in Nonstandard Analysis Theory and Applications, ed. by L.O. Arkeryd, N.J. Cutland, C.W. Henson (Kluwer, Dordrecht, 1997), pp. 279–307 68. X.A. Lin, On the independence of correspondences. Proc. Am. Math. Soc. 129, 1329–1334 (2001) 69. X.A. Lin, X. Liu, Y.N. Sun, The necessity of no asymptotic arbitrage in APT pricing, in Recent Developments in Mathematical Finance, ed. by Jiongmin Yong (World Scientific, Singapore, 2002), pp. 181–189 70. J. Lintner, The valuation of risky assets and the selection of risky investments in stock portfolios and capital budgets. Rev. Econ. Stat. 47, 13–37 (1965) 71. P.A. Loeb, Conversion from nonstandard to standard measure spaces and applications in probability theory. Trans. Am. Math. Soc. 211, 113–122 (1975) 72. P. Loeb, Y.N. Sun, Purification of measure-valued maps. Ill. J. Math. 50, 747–762 (2006) 73. P. Loeb, Y.N. Sun, A general Fatou lemma. Adv. Math. 213, 741–762 (2007) 74. P.A. Loeb, Y. Sun, Purification and saturation. Proc. Am. Math. Soc. 137, 2719–2724 (2009) 75. M. Loéve, Probability Theory I, 4th edn. (Springer, New York, 1977) 76. M. Loéve, Probability Theory II, 4th edn. (Springer, New York, 1977) 77. P.R. Milgrom, R.J. Weber, Distributional strategies for games with incomplete information. Math. Oper. Res. 10, 619–632 (1985) 78. J.F. Nash, Equilibrium points in N -person games. Proc. Natl. Acad. Sci. USA 36, 48–49 (1950) 79. A. Papoulis, Probability, Random Variables, and Stochastic Processes (McGraw-Hill, New York, 1965) 80. K.R. Parthasarathy, Probability Measures on Metric Spaces (Academic Press, New York, 1967) 81. K. Podczeck, On the convexity and compactness of the integral of a Banach space valued correspondence. J. Math. Econ. 44, 836–852 (2008) 82. K. Podczeck, On purification of measure-valued maps. Econ. Theory 38, 399–418 (2009) 83. D. Pollard, Convergence of Stochastic Processes (Springer, New York, 1984) 84. L. Qiao, Y.N. Sun, Z.X. Zhang, Conditional exact law of large numbers and asymmetric information economies with aggregate uncertainty, Econ. Theory (2014) (published online) 85. L. Qiao, H.M. Yu, On the space of players in idealized limit games. J. Econ. Theory 153, 177–190 (2014) 86. R. Radner, R.W. Rosenthal, Private information and pure-strategy equilibria. Math. Oper. Res. 7, 401–409 (1982) 87. R.C. Raimondo, Market clearing, utility functions, and securities prices. Econ. Theory 25, 265–285 (2005) 88. S. Rashid, Economies with Many Agents (Johns Hopkins University Press, Baltimore, 1987) 89. K.P. Rath, A direct proof of the existence of pure strategy Nash equilibria in games with a continuum of players. Econ. Theory 2, 427–433 (1992)
9 Nonstandard Analysis in Mathematical Economics
399
90. K.P. Rath, On the representation of sets in finite measure spaces. J. Math. Anal. Appl. 200, 506–510 (1996) 91. K.P. Rath, Y.N. Sun, S. Yamashige, The nonexistence of symmetric equilibria in anonymous games with compact action spaces. J. Math. Econ. 24, 331–346 (1995) 92. D. Ross, Selectionable distributions for a random set. Math. Proc. Camb. Philos. Soc. 108, 405–408 (1990) 93. S.A. Ross, The arbitrage theory of capital asset pricing. J. Econ. Theory 13, 341–360 (1976) 94. D. Schmeidler, Equilibrium points of nonatomic games. J. Stat. Phys. 7, 295–300 (1973) 95. W. Sharpe, Capital asset prices: a theory of market equilibrium under conditions of risk. J. Financ. 33, 885–901 (1964) 96. L.K. Simon, M.B. Stinchcombe, Econometrica 63, 1421–1443 (1995) 97. M.B. Stinchcombe, Nash equilibrium and generalized integration for infinite normal form games. Games Econ. Behav. 50, 332–365 (2005) 98. M.B. Stinchcombe, Correlated equilibrium existence for infinite games with type-dependent strategies. J. Econ. Theory 146, 638–655 (2011) 99. X. Sun, Independent random partial matching with general types, working paper, National University of Singapore (2013) 100. X. Sun, Y.C. Zhang, Pure-strategy Nash equilibria in nonatomic games with infinitedimensional action spaces. Econ. Theory 58, 161–182 (2015) 101. Y.N. Sun, Isomorphisms for convergence structures. Adv. Math. 116, 322–355 (1995) 102. Y.N. Sun, Hyperfinite law of large numbers. Bull. Symb. Log. 2, 189–198 (1996) 103. Y.N. Sun, Distributional properties of correspondences on Loeb spaces. J. Funct. Anal. 139, 68–93 (1996) 104. Y.N. Sun, Integration of correspondences on Loeb spaces. Trans. Am. Math. Soc. 349, 129– 153 (1997) 105. Y.N. Sun, A theory of hyperfinite processes: the complete removal of individual uncertainty via exact LLN. J. Math. Econ. 29, 419–503 (1998) 106. Y.N. Sun, The almost equivalence of pairwise and mutual independence and the duality with exchangeability. Probab. Theory Relat. Fields 112, 425–456 (1998) 107. Y.N. Sun, The complete removal of individual uncertainty: multiple optimal choices and random exchange economies. Econ. Theory 14, 507–544 (1999) 108. Y.N. Sun, N.C. Yannelis, Core, equilibria and incentives in large asymmetric information economies. Games Econ. Behav. 61, 131–155 (2007) 109. Y.N. Sun, N.C. Yannelis, Perfect competition in asymmetric information economies compatibility of efficiency and incentives. J. Econ. Theory 134, 175–194 (2007) 110. Y.N. Sun, N.C. Yannelis, Ex ante efficiency implies incentive compatibility. Econ. Theory 36, 35–55 (2008) 111. Y.N. Sun, N.C. Yannelis, Saturation and the integration of Banach valued correspondences. J. Math. Econ. 44, 861–865 (2008) 112. Y.N. Sun, L. Wu, N.C. Yannelis, Existence, incentive compatibility and efficiency of the rational expectations equilibrium. Games Econ. Behav. 76, 329–339 (2012) 113. J. Von Neumann, O. Morgenstern, Theory of Games and Economic Behavior (Princeton University Press, Princeton, 1944) 114. Y. Xu, Large games and large asset markets. Master thesis, Department of Mathematics, National University of Singapore (2007) 115. N.C. Yannelis, Integration of Banach-valued correspondences, in Equilibrium Theory in Infinite Dimensional Spaces, ed. by M.A. Khan, N.C. Yannelis (Springer, Berlin, 1991) 116. H.M. Yu, W. Zhu, Large games with transformed summary statistics. Econ. Theory 26, 237– 241 (2005)
Part VI
Combinatorial Number Theory
Chapter 10
Density Problems and Freiman’s Inverse Problems Renling Jin
10.1 Introduction In this chapter we present applications of nonstandard analysis to density problems and Freiman’s inverse problems in combinatorial number theory. The reader is assumed to be familiar with some basic knowledge and principles in nonstandard analysis such as nonstandard extensions of the standard universe, transfer principle, standard part map, distinctions among standard, internal, and external sets and arguments, hyperfinite sets, hyperfinite integers, overspill/underspill principle, countable saturation, etc. What kinds of advantages do nonstandard analysis provide when dealing with standard problems in combinatorial number theory? In general, two characteristics are worth to mention: First, nonstandard analysis allows us to replace some standard asymptotic arguments by nonstandard simple arguments. This can sometimes simplify and speed up the derivation processes. It is not a coincidence that many problems involving densities are naturally chosen for the applications. In the first section, we study density problems. Second, hyperfinite integers are available in a nonstandard universe. A hyperfinite integer is infinitely large from a standard point of view and finitely large from a nonstandard point of view. One can often apply continuous techniques from a standard angle and discrete or combinatorial techniques from a nonstandard angle to one problem under two interpretations. These two complementary aspects offer advantages of each side. In the second section on Freiman’s inverse problems, this method will be frequently used. For taking these advantages, some tools such as Loeb probability spaces and additive cuts will be employed. These tools are intrinsically nonstandard. The reader has probably learned Loeb measure elsewhere. To be self-contained, a brief definition of Loeb space is given below. Note that we are interested in the Loeb R. Jin (B) Department of Mathematics, College of Charleston, Charleston, SC 29424, USA e-mail: [email protected] © Springer Science+Business Media Dordrecht 2015 P.A. Loeb and M.P.H. Wolff (eds.), Nonstandard Analysis for the Working Mathematician, DOI 10.1007/978-94-017-7327-0_10
403
404
R. Jin
measure space generated by a normalized counting measure space on a hyperfinite set only in this chapter. We will work within a fixed countably saturated nonstandard universe. Let N be the set of all non-negative standard integers. So 0 ∈ N. Let Z be the set of all standard integers. Given two integers a, b ∈ ∗ Z. The interval notation [a, b] is exclusively for the interval of integers. Definition 10.1.1 Let H be a hyperfinite integer and 0 be the algebra of all internal subsets of [0, H ]. For each A ∈ 0 let μ(A) = |A|/(H + 1). Note that μ is the normalized counting measure on ([0, H ]; 0 ) in the nonstandard universe. For each X ⊆ [0, H ], internal or external, let μ(X ) := inf{st(μ(A)) : A ∈ 0 and A ⊇ X } and μ(X ) := sup{st(μ(A)) : A ∈ 0 and A ⊆ X } where sup and inf are operators in the standard universe and st is the standard part map, which maps every real number r in the nonstandard universe with |r | ≤ n for some standard n ∈ N to a standard real number α such that r is infinitesimally close to α. Let := {X ⊆ [0, H ] : μ(X ) = μ(X )}. is called the algebra of all Loeb measurable subsets of [0, H ]. Let μ H (X ) := μ(X ) for every X ∈ . Now μ H is an atomless, complete, countably additive probability measure on the Loeb measurable space ([0, H ]; ) in the standard universe. For any sets X , not necessarily a subset of [0, H ] we denote μ H (X ) for the Loeb measure of X ∩ [0, H ] provided that X ∩ [0, H ] is Loeb measurable. The Loeb measure on [0, H ] can be viewed as an enrichment of the Lebesgue measure on the unit interval of standard reals. So a Loeb measure is a bridge across the two universes connecting combinatorial properties of a hyperfinite set in the nonstandard universe and measure-theoretical properties of the set in the standard universe. We now define additive cuts. For any two sets A and B of integers let A ± B := {x ± y : x ∈ A, y ∈ B}. If a is an integer, then a ± A := {a} ± A and A ± a := A ± {a}. Definition 10.1.2 Let U ⊆ ∗ N be an initial segment, i.e., x ∈ U and y ≤ x imply y ∈ U for non-negative integers x and y. Then U is called an additive cut if 1 ∈ U and U + U ⊆ U . Given a cut U , the cofinality of U is the least (set-theoretical) cardinality of a set X ⊆ U such that X is unbounded in U . An additive cut is a semigroup resembling the semigroup (N, +) of all standard non-negative integers. Suppose that U is an additive cut in a hyperfinite interval
10 Density Problems and Freiman’s Inverse Problems
405
[0, N ] and A is an internal subset of [0, N ]. It may be possible to apply standard number theoretical techniques for N to the problems in U and hope that the results about the infinite set U ∩ A can be used to influence the combinatorial properties of the “finite” set A in [0, N ]. Since only additive cuts are considered in this chapter, the word “additive” will be omitted from now on. The set ∗ N itself is a trivial cut. Note that if a cut U is a proper subset of ∗ N, then U is an external set because U does not possess a maximal element. Note also that the set N of all standard non-negative integers is a cut. Given a hyperfinite integer H , let UH =
[0, H/(n + 1) ] .
(10.1)
n∈N
It is easy to see that U H ⊆ [0, H ] and U H is a cut. Note that U H is the largest cut and N is the smallest cut in [0, H ]. Note also that N has a countable cofinality, and U H has an uncountable cofinality by countable saturation. We will often write x < U for x ∈ U and x > U for x ∈ ∗ N U . For any set B ⊆ U , by saying that “there are sufficiently large x ∈ B in U such that P(x) is true” we mean “for any y ∈ B, there exists x with y < x < U such that P(x) is true.” By saying that “for all sufficiently large x ∈ B in U such that P(x) is true” we mean “there exists a y ∈ U such that P(x) is true for all x ∈ B with y < x < U .” For a set A of integers and two integers a and b, let A(a, b) := |A ∩ [a, b]|. If a = 0, then A(b) := A(0, b). We use I for the unit interval of real numbers instead of [0, 1] to avoid possible confusion. The symbol means “less than or infinitesimally close to” and means “less than but not infinitesimally close to.” Similarly, means “greater than or infinitesimally close to” and means “greater than but not infinitesimally close to.” To avoid typing too many fractions, we will employ the following notation: Da,b (A) :=
A(a, b) and Db (A) := D0,b (A) b−a+1
for any integers a ≤ b and any standard or internal set A. Intuitively, Da,b (A) is the local density of A in [a, b]. The Greek letters α, β, γ, , etc. are reserved for standard real numbers.
10.2 Applications to Density Problems Definition 10.2.1 For a set A ⊆ N we define the following four densities of A. • Lower asymptotic density of A: d(A) = lim inf x→∞ Dx (A); • Upper asymptotic density of A: d(A) = lim supx→∞ Dx (A); • Upper Banach density of A: BD(A) = lim x→∞ supk≥0 Dk,k+x (A);
406
R. Jin
• Lower Banach density of A: BD(A) = lim x→∞ inf k≥0 Dk,k+x (A). Exercise 10.2.2 Let A ⊆ N. Prove that lim x→∞ supk≥0 Dk,k+x (A) exists and is equal to inf x≥0 supk≥0 Dk,k+x (A). There is another density popular among additive number theorists called Shnirel’ man density. Shnirel’man density of A is defined by σ(A) = inf x≥1 D1,x (A). Note that Shnirel’man density does not involve limit process. A density is a way to measure the “size” of an infinite set of non-negative integers. Note that BD(A) = α if and only if α is the greatest real number such that there exists a sequence of intervals {[an , bn ]}∞ n=1 such that lim n→∞ (bn − an ) = ∞ and limn→∞ Dan ,bn (A) = α. Similarly, BD(A) = α if and only if α is the least real number such that there exists a sequence of intervals {[an , bn ]}∞ n=1 such that limn→∞ (bn − an ) = ∞ and limn→∞ Dan ,bn (A) = α. The following two propositions establish nonstandard characterizations of the densities in Definition 10.2.1. Proposition 10.2.3 Let A ⊆ N. Then 1. d(A) ≥ α if and only if Dx (∗A) α for all hyperfinite integer x; 2. d(A) ≥ α if and only if Dx (∗A) α for some hyperfinite integer x. Proof of Part 1: “⇒”: Let k ∈ N be a standard positive integer. By the definition of lower asymptotic density there exists a standard n k ∈ N such that Dx (A) > α−1/k for any standard x ≥ n k . Hence Dx (∗A) > α − 1/k for any hyperfinite integer x by the transfer principle. Suppose x is one particular hyperfinite integer. Then Dx (∗A) > α − 1/k is true for any standard positive integer k. By the overspill principle Dx (∗A) > α − 1/k is true for some hyperfinite integer k . This shows Dx (∗A) α. Now the conclusion follows from the fact that the hyperfinite integer x is arbitrarily chosen. “⇐”: Given any standard > 0, it is trivially true that Dx (∗A) > α − for any hyperfinite integer x. Hence there exists n ∈ N such that Dx (∗A) > α − is true for any integer x ≥ n . By the transfer principle, Dx (A) > α − for any standard integer x ≥ n . Therefore, d(A) ≥ α because > 0 is arbitrary. Exercise 10.2.4 Prove Part 2 of Proposition 10.2.3. Proposition 10.2.5 Let A ⊆ N. Then 1. BD(A) ≥ α if and only if Dx,y (∗A) α for some integers x, y with y − x being hyperfinite; 2. BD(A) ≥ α if and only if Dx,y (∗A) α for all integers x, y with y − x being hyperfinite. Proof of Part 1: “⇒”: Given non-zero m ∈ N, there exists n m ∈ N with n m > m such that 1 sup Dk,k+n m (A) > α − . m k≥0
10 Density Problems and Freiman’s Inverse Problems
407
Hence there exists km ∈ N such that Dkm ,km +n m (A) > α −
1 . m
By the overspill principle, there exists a hyperfinite integer M such that Dk M ,k M +n M (∗A) > α −
1 . M
Now the conclusion follows because 1/M ≈ 0 and n M > M is hyperfinite. “⇐”: Given m ∈ N, since Dx,y (∗A) > α − 1/m for some x, y with y − x being hyperfinite, the sentence “there exists an interval [x, y] with y − x > m such that Dx,y (∗A) > α − 1/m ” is true in the nonstandard universe. By the transfer principle it is also true that there exist two standard integers x, y such that y − x > m and Dx,y (A) > α − 1/m for every standard non-zero m ∈ N. This implies BD(A) ≥ α. Exercise 10.2.6 Prove Part 2 of Proposition 10.2.5. The following two exercises show the distributive uniformity of ∗C in an interval of hyperfinite length where the local density of ∗C is infinitesimally close to the upper (lower) Banach density of C. Exercise 10.2.7 Suppose that C ⊆ N, BD(C) = α, and Dx,x+H (∗C) ≈ α where H is hyperfinite. Let K be a hyperfinite integer such that K /H ≈ 0. Let N = H/K and Ii = [x + i K , x + (i + 1)K − 1] for i = 0, 1, . . . , N − 1 and I N = [x + N K , x + H ]. Then |∗C ∩ Ii |/K α for any i ∈ [0, N − 1] and μN
|∗C ∩ Ii | i ∈ [0, N ] : ≈α =1 K
where μ N is the Loeb measure on [0, N ]. (Hint: Use Proposition 10.2.5.) Exercise 10.2.8 Suppose that C ⊆ N, BD(C) = α, and Dx,x+H (∗C) ≈ α where H is hyperfinite. Let K be a hyperfinite integer such that K /H ≈ 0. Let N = H/K and Ii = [x + i K , x + (i + 1)K − 1] for i = 0, 1, . . . , N − 1 and I N = [x + NK, x + H ]. Then |∗C ∩ Ii |/K α for any i ∈ [0, N − 1] and |∗C ∩ Ii | i ∈ [0, N ] : μN ≈α = 1. K
10.2.1 Sumset Phenomenon It is a well-known fact in real analysis that if A and B are two sets of real numbers with positive Lebesgue measure, then A + B contains a non-empty open interval of real
408
R. Jin
numbers. This fact can be easily proven by applying the so called Lebesgue Density Theorem. This fact is a contrast to the duality of the null ideal and the meager ideal, which means that the real line is the union of two sets A and B where A has Lebesgue measure zero and B is a meager set, i.e., B is the union of countably many nowhere dense sets. So the smallness of a set of reals in terms of measure is incompatible with the smallness of a set of reals in terms of order-topology. However, if two sets are not small in terms of Lebesgue measure, then the sum of the two sets must not be small in terms of order-topology. Can this fact have a discrete version? Note that the order-topology of a discrete linear order is trivial, i.e., every singleton set is open. Definition 10.2.9 Let A ⊆ N. • A is syndetic if there is a k ∈ N such that N ⊆ A + [−k, k]; • A set A ⊆ N is thick if A contains arbitrarily long sequences of consecutive integers; • A set A ⊆ N is piecewise syndetic if there is a k ∈ N such that A + [0, k] is thick. If a syndetic set in N is viewed as a very dense set in N in terms of the order structure of N, then a piecewise syndetic set can be viewed as a locally very dense set in N. Exercise 10.2.10 Let A ⊆ N. Prove that A is piecewise syndetic if and only if there is a standard m ∈ N such that ∗A + [0, m] contains an interval of hyperfinite length. In [12], a simple result hints the relationship between measure and order topology in a discrete setting. Theorem 10.2.11 (H. Furstenberg) If A has a positive upper Banach density, then (A − A) ∩ N is syndetic. Let H be a hyperfinite integer. In nonstandard analysis, one often discretizes the unit interval ∗I of real numbers by focusing on its hyperfinite subset = {0, H1 , H2 , . . . , 1}. In fact, or more conveniently [0, H ], can be used to retrieve the standard unit interval I of real numbers by modulo a cut. Definition 10.2.12 Let U ⊆ [0, H ] be a cut and x, y ∈ [0, H ]. Let U be a binary relation on [0, H ]. Define • • • •
x U y if |x − y| ∈ U ; [x]U := {y ∈ [0, H ] : y U x}; [0, H ]/U := {[x]U : x ∈ [0, H ]}; [x]U ≤ [y]U if x < y or x U y.
Clearly, U is an equivalence relation. [x]U is called the U -monad of x. The quotient space [0, H ]/U with the defined order among U -monads is a linearly ordered self-dense set, i.e., there exists a U -monad between any two distinct U -monads.
10 Density Problems and Freiman’s Inverse Problems
409
Exercise 10.2.13 Let U = U H . Prove that [0, H ]/U is order-isomorphic to the unit interval I of standard real numbers. Hint: Prove ϕ([x]U ) = st (x/H ) is an order-preserving bijection from [0, H ]/U to I . Note that the order-topology on [0, H ]/U induces a quotient topology on [0, H ] called U -topology. The U -topology can also be defined by the following. Definition 10.2.14 A set O ⊆ [0, H ] is called U -open if for any x ∈ O, there is an r > U such that [x − r, x + r ] ∩ [0, H ] ⊆ O. All U -open sets form a U -topology. A set X ⊆ [0, H ] is U -nowhere dense if for any interval [a, b] ⊆ [0, H ] with b − a > U , there exists an interval [c, d] ⊆ [a, b] with d − c > U such that [c, d] ∩ X = ∅. A set X ⊆ [0, H ] is U -meager if X is the union of at most countably many U -nowhere dense subsets of [0, H ]. Exercise 10.2.15 Prove that the U -topology defined above is the quotient topology on [0, H ] induced by the quotient map f (x) = [x]U from [0, H ] to [0, H ]/U where [0, H ]/U has the usual order-topology. Although U -topology cannot separate any two points in the same U -monad, it behaves like an order-topology. It is shown in [22] that [0, H ] is the union of a Loeb measure zero set and a U -meager set for any cut U . Hence the smallness in terms of Loeb measure and the smallness in terms of U -topology are incompatible. By contrast, if A and B are not small in terms of Loeb measure in [0, H ], then A + B must not be small in terms of U -topology. The following is the main theorem of this subsection. It is proven in [14] for the first time in order to answer a question posed in [22]. Theorem 10.2.16 Let H be a hyperfinite integer and U be a cut in [0, H ]. Let μ H be the Loeb measure on [0, H ]. Let A, B be any two Loeb measurable subsets of [0, H ]. If μ H (A) > 0 and μ H (B) > 0, then A + B is not U -meager in [0, 2H ]. Proof First, the sets A and B can be assumed to be internal because otherwise we replace A and B by their internal subsets with positive Loeb measure. Second, it suffices to prove that A + B is not U -nowhere dense by the following argument: Suppose that A + B is U -meager but not U -nowhere dense. Let A + B = n∈N Fn where Fn ’s are U -nowhere dense subsets of [0, 2H ]. Since A + B is not U -nowhere dense, there exists an interval [a, b] ⊆ [0, H ] with b − a > U such that every subinterval [c, d] ⊆ [a, b] with d − c > U contains elements from A + B. So we dn ] ∩ Fn = ∅ can find a nested sequence of intervals [cn , dn ] ⊆ [a, b] such that [cn , and X n = [cn , dn ] ∩ (A + B) = ∅. By countable saturation X = n∈N X n is a non-empty subset of A + B which is disjoint from every Fn , a contradiction. Suppose that the theorem is not true. Let α = sup {μ H (A) : ∃H > U, A ⊆ [0, H ], ∃B ⊆ [0, H ], A, B internal, μ H (B) > 0, A + B is U -nowhere dense in [0, 2H ]} .
410
R. Jin
Clearly, α > 0. Let β = sup {μ H (B) : ∃H > U, B ⊆ [0, H ], ∃A ⊆ [0, H ], A, B internal, μ H (A) > 0.9·α, A + B is U -nowhere dense in [0, 2H ]} . Clearly, 0 < β ≤ α. Fix a hyperfinite integer H and internal sets A, B ⊆ [0, H ] such that μ H (A) > 0.9α, μ H (B) > 0.9β, and A + B is U -nowhere dense in [0, 2H ]. If 0.9α + 0.9β > 1, then A + B is not U -nowhere dense because for each x ∈ H ± U , A ∩ (x − B) = ∅ due to the fact that μ H (A) + μ H (x − B) > 1, and hence H ± U ⊆ A + B, which implies [H − m, H + m] ⊆ A + B for some m > U by the overspill principle. So we can assume that 0.9α + 0.9β ≤ 1. Since β ≤ α, we have that β ≤ 1/1.8 < 0.56. For each m ∈ U , (A + B + [0, m]) ∩ [0, 2H ] is U -nowhere dense because A + B is U -nowhere dense. By the maximality of β we have that μ H (B + [0, m]) ≤ β. By the overspill principle there is M > U such that D H (B + [0, m]) < 1.1β for any m ∈ [0, M]. Let K > U be such that (K +1)/(H +1) < 0.01, 2K +2 < M, and Dx,x+K (A) > 0.9α for some x ∈ [0, H − K ]. Partition [0, H ] into a collection I of intervals of length K + 1 except the last interval with a length less than or equal to K + 1. If at least two-third of intervals in I contain elements of B, then μ H (B + [0, M]) ≥ 2/3 − 0.01 > 1.1β, a contradiction. So we can assume that more than one-third of the intervals in I are disjoint from B. Since B is distributed in at most two-third of the intervals in I, there exists one interval [y, y + K ] ∈ I such that D y,y+K (B) 23 β. Let A = A ∩ [x, x + K ] − x and B = B ∩ [y, y + K ] − y. Then K > U is hyperfinite, μ K (A ) > 0.9α, μ K (B ) ≥ 23 β, and A + B is U -nowhere dense. We now have a contradiction to the maximality of β. This completes the proof of the theorem. Theorem 10.2.16 can also be proven by a Lebesgue density theorem for the quotient space of [0, H ] modulo the cut U (see [7]). The following are two corollaries of Theorem 10.2.16. Corollary 10.2.17 If A, B are two sets of standard real numbers with positive Lebesgue measure, then A + B contains a non-empty open interval of standard real numbers. Proof Without loss of generality we can assume that A, B ⊆ I where I is the unit interval of real numbers. Let H be a hyperfinite integer and let ϕ : [0, 2H ] → 2I be such that x
ϕ(x) = st . H Let A = ϕ−1 (A) and B = ϕ−1 (B). Then A and B are Loeb measurable sets in [0, H ], μ H (A ) > 0, μ H (B ) > 0, and ϕ(A + B ) = A + B. Let U = U H be the largest cut in [0, H ]. By Theorem 10.2.16 there exists an interval [a, b] ⊆ [0, 2H ] such that b − a > U and every subinterval [c, d] ⊆ [a, b] with d − c > U must
10 Density Problems and Freiman’s Inverse Problems
411
contain some elements of A + B . So every U -monad in [a, b] has non-empty intersection with A + B . It is easy to check that ϕ(a) < ϕ(b) and every standard real number between ϕ(a) and ϕ(b) is in A + B. Corollary 10.2.18 If A, B are two sets of non-negative integers with positive upper Banach density, then A + B are piecewise syndetic. Proof Let BD(A) = α and BD(B) = β. By Proposition 10.2.5 there exist intervals [x, x + H ] and [y, y + H ] such that Dx,x+H (∗A) ≈ α and D y,y+K (∗B) ≈ β. Let A = ∗A ∩ [x, x + H ] − x and B = ∗B ∩ [y, y + H ] − y. Then μ H (A ) = α > 0 and μ H (B ) = β > 0. Let U = N. By Theorem 10.2.16 there exists an interval [a, b] ⊆ [0, H ] of hyperfinite length such that every subinterval [c, d] ⊆ [a, b] of hyperfinite length contains some elements of A + B . Since A + B is an internal set, the maximal length of a gap of A + B in [a, b] exists. Let m be the maximal length of gaps of A + B inside [a, b]. Clearly, m ∈ N. Then A + B + [0, m] covers the interval [a + m, b] of hyperfinite length. So ∗A + ∗B + [0, m] covers the interval x + y +[a +m, b] of hyperfinite length. This implies that A + B is piecewise syndetic by Exercise 10.2.10. Theorem 10.2.16 is a nonstandard result which unites a continuous version and a discrete version of the so called sumset phenomenon, i.e., if two sets A and B are not small in terms of “measure”, then A + B is not small in terms of “order-topology.” Corollary 10.2.18 has been generalized to amenable groups in [3]. The corollary has also been improved in [4] to that if A and B are sets of integers having positive upper Banach density, then A + B is a piecewise Bohr set. Note that a piecewise Bohr set is piecewise syndetic. There are different ways to prove Corollary 10.2.18. For example, an ultrafilter proof in [2] and an elementary proof with some quantitative information in [6]. Recently, some asymptotic versions of Theorem 10.2.18 are developed in [7].
10.2.2 Plünnecke Type of Inequalities for Densities Recall that σ(A) is the Shnirel’man density of A. A set B ⊆ N is called an essential component if σ(A + B) > σ(A) for any set A ⊆ N with 0 < σ(A) < 1. Note that if 0 ∈ / A ∪ B, then 1 ∈ / A + B. Hence σ(A + B) = 0. So it is natural to assume that an essential component contains 0. If 0 ∈ B and σ(B) > 0, then B is an essential component by Shnirel’man Theorem [13, page 3]. There are essential component B with σ(B) = 0. A set B ⊆ N is called a basis of order h for some positive integer h if h B = B + B + · · · + B = N. h
The following theorem of Erd˝os [13, page 10] shows that a basis is an essential component although a basis could have Shnirel’man density 0.
412
R. Jin
Theorem 10.2.19 (P. Erd˝os) If B is a basis of order h, then σ(A + B) ≥ σ(A) +
1 σ(A) (1 − σ(A)) 2h
for any set A ⊆ N. Example 10.2.20 Let S = {n 2 :√n ∈ N}. Then S is a basis of order 4 by so called Lagrange Theorem. Since S(x)/ x = O(1) we have that σ(S) = 0. There is an improvement of Erd˝os Theorem by Plünnecke [25]. The proof of the following theorem can also be found in [24]. Theorem 10.2.21 (H. Plünnecke) If B is a basis of order h, then 1
σ(A + B) ≥ σ(A)1− h for any set A ⊆ N.
In this subsection we discuss whether Theorem 10.2.21 has asymptotic versions with respect to other densities defined in Definition 10.2.1. Theorem 10.2.21 is a significant improvement of Theorem 10.2.19: Let f (x) = −1/ h −1− 1 (1−x) for 0 < x ≤ 1. Then f (1) = 0 and f (x) = − 1 x − h1 −1 + 1 < 0 x h h h for 0 < x < 1. Hence f (x) is strictly decreasing for 0 < x < 1, which implies that 1 f (x) > f (1) = 0 for all 0 < x < 1. Since x f (x) = x 1− h − x − h1 x(1 − x) > 0 1
for 0 < x < 1, we have that σ(A)1− h > σ(A) + h1 σ(A)(1 − σ(A)) whenever 1 0 < σ(A) < 1. Note that 2h in Theorem 10.2.19 is replaced by h1 . Besides an improvement of Erd˝os’ theorem, more importantly, Theorem 10.2.21 is proved by using a powerful graph-theoretic inequality developed by Plünnecke. For simplicity, we introduce only a consequence of the Plünnecke’s inequality here. The interested reader should consult [24] where Plünnecke’s theory is well presented including the proof of the following theorem. Theorem 10.2.22 (H. Plünnecke) Let A, B ⊆ N, n, h ∈ N with h > 0, and A(0, n) > 0. There exists A ⊆ A with A (0, n) > 0 such that |(A + B) ∩ [0, n]| ≥ A(0, n)
|(A + h B) ∩ [0, n]| A (0, n)
1/ h .
Note that if h > 1, then B is a basis of order h if and only if σ(h B) = 1. Definition 10.2.23 Let B ⊆ N and h > 0. • B is a lower asymptotic basis of order h if d(h B) = 1; • B is an upper asymptotic basis of order h if d(h B) = 1;
10 Density Problems and Freiman’s Inverse Problems
413
• B is an upper Banach basis of order h if BD(h B) = 1; • B is an lower Banach basis of order h if BD(h B) = 1. A set B ⊆ N is often called an asymptotic basis of order h if h B is a cofinite subset of N. So a basis is an asymptotic basis and an asymptotic basis is a lower asymptotic basis. Example 10.2.24 Let P be the set of all prime numbers. Then P is a lower asymptotic basis of order 3. The above example follows from the “almost” version of Goldbach Conjecture which was independently proved by Estermann, Chudakov, and van der Corput. The set P is an asymptotic basis of order 4 (or 3 if Goldbach Conjecture is true). Example 10.2.25 Let C be the set of all cubes of non-negative integers. Then C is a lower asymptotic basis of order 4. See [5] for the proof of above example. Note that C is an asymptotic basis of order 7. Next is the main theorem in this subsection. Theorem 10.2.26 Let A and B be two sets of non-negative integers and h be a positive integer. Then 1
1
1. d(A + B) ≥ d(A)1− h d(h B) h ; 1
1
1
1
2. BD(A + B) ≥ BD(A)1− h BD(h B) h ; 3. BD(A + B) ≥ BD(A)1− h BD(h B) h . We will prove each of these three parts of the theorem. Slightly less general forms of Part 1 and Part 2 of Theorem 10.2.26 are proved in [15] by nonstandard methods. Part 1 and Part 2 are reproved and Part 3 is proved for the fist time without nonstandard methods in [16]. However, the standard proofs are longer and, in the authors’ opinion, less intuitive. Proof of Theorem 10.2.26, Part 1: Let d(A) = α and d(h B) = β. Let H be a hyperfinite integer. It suffices to prove that 1 1 |∗(A + B) ∩ [0, H ]| α1− h β h H +1
by Proposition 10.2.3. Choose a hyperfinite integer K < H such that H −K is hyperfinite and K /H ≈ 1. Let C0 = ∗A ∩ [0, K ]. We define an internal sequence of sets C0 ⊇ C1 ⊇ · · · ⊇ C H inductively by Ck+1 =
if Ck (H − k, H ) ≤ α(k + 1) Ck Ck {H − k} if Ck (H − k, H ) > α(k + 1).
414
R. Jin
Let A0 = C H . We verify that (i) A0 (0, H )/(H + 1) α and (ii) A0 (z, H )/(H − z + 1) α for any z ∈ [0, H ]. Suppose (i) is not true. Let be a standard positive real number such that A0 (0, H )/(H + 1) < α − . Let k0 = max{k ∈ [0, H ] : Ck+1 = Ck }. The number k0 exists because otherwise A0 = C0 , which contradicts A0 (0, H )/(H +1) < α−. Note that H − k0 < K , A0 = Ck0 +1 , A0 ∩ [0, H − k0 − 1] = C0 ∩ [0, H − k0 − 1], and Ck0 (H − k0 , H ) > α(k0 + 1). If H − k0 is finite, then A0 (0, H ) A0 (0, H − k0 − 1) A0 (H − k0 , H ) = + H +1 H +1 H +1 Ck (H − k0 , H ) − 1 k0 + 1 · α. 0+ 0 k0 + 1 H +1 If H − k0 is hyperfinite, then A0 (0, H ) A0 (0, H − k0 − 1) A0 (H − k0 , H ) = + H +1 H +1 H +1 Ck0 (H − k0 , H ) − 1 k0 + 1 C0 (0, H − k0 − 1) H − k0 + · · H − k0 H +1 k0 + 1 H +1 α·
k0 + 1 H − k0 +α· = α. H +1 H +1
Each of these two cases above contradicts the assumption that A0 (0, H )/(H + 1) < α − . So (i) is verified. Suppose that (ii) is not true. Let z 0 = max{z ∈ [0, H ] : A0 (z, H ) > α(H − z + 1)}. Note that z 0 ≤ K because A0 (K + 1, H ) ≤ C0 (K + 1, H ) = 0. Since / C H −z 0 +1 by C H −z 0 (z 0 , H ) ≥ A0 (z 0 , H ) > α(H − z 0 + 1), we have that z 0 ∈ / A0 . This implies that the definition of C H −z 0 +1 . Hence z 0 ∈ A0 (z 0 + 1, H ) A0 (z 0 , H ) A0 (z 0 , H ) > α, = > H − z0 H − z0 H − z0 + 1 which contradicts the maximality of z 0 . So (ii) is verified. By Theorem 10.2.22 we can find a non-empty internal set A ⊆ A0 with z = min A such that
10 Density Problems and Freiman’s Inverse Problems
415
|∗(A + B) ∩ [0, H ]| |(A0 + ∗B) ∩ [0, H ]| A0 (0, H ) ≥ · H +1 A0 (0, H ) H +1 h1 1 ∗ A0 (0, H ) |(A + B) ∩ [0, H ]| |(z + h ∗B) ∩ [z, H ]| h · ·α A (0, H ) H +1 A (z, H ) 1 1 ∗ 1 1 β h (h B)(0, H − z)/(H − z + 1) h ·α · α = α1− h β h . A0 (z, H )/(H − z + 1) α 1
1
Thus the conclusion d(A + B) ≥ d(A)1− h d(h B) h holds by Proposition 10.2.3. Note that (h ∗B)(0, H − z)/(H − z + 1) β because z ≤ K and hence H − z is hyperfinite. Part 1 of Theorem 10.2.26 is mentioned in [26]. In the proof of Part 2 the facts in the following two exercises about the upper Banach density are needed. Exercise 10.2.27 Suppose that BD(C) = α (or BD(A) = α) and K 1 < K 2 are two hyperfinite integers. Prove that there exist hyperfinite intervals [a, a + N1 ] and [b, b + N2 ] such that Da,a+N1 (∗C) ≈ α, Db,b+N2 (∗C) ≈ α, a + N1 < K 1 , and N2 > K 2 . Exercise 10.2.28 Suppose that BD(C) = α (or BD(A) = α) and Dx, x+2K (∗C) ≈ α for some hyperfinite integer K . Show that if [y, y + K ] ⊆ [x, x + 2K ], then D y,y+K (∗C) ≈ α. Proof of Theorem 10.2.26, Part 2: Let BD(A) = α and BD(h B) = β. Let [x, x + H ] be a hyperfinite interval such that Dx,x+H (∗A) ≈ α and [y, y + 2K ] be a hyperfinite interval such that D y,y+2K (h ∗B) ≈ β. By Exercise 10.2.27 we can require that (y + 2K )/H ≈ 0 . Note that Dt,t+K (h ∗B) ≈ β for any t ∈ [y, y + K ] by Exercise 10.2.28. We prove that 1 1 |∗(A + B) ∩ [x, x + H ]| α1− h β h . H +1
Let A0 = ∗A ∩ [x, x + H − K − y] − x. Clearly, |∗(A + B) ∩ [x, x + H ]| |(A0 + ∗B) ∩ [0, H ]| ≥ . H +1 H +1 Note that A0 (0, H )/(H + 1) ≈ α because (y + 2K )/H ≈ 0. It now suffices to show that 1 1 |(A0 + ∗B) ∩ [0, H ]| α1− h β h H +1 by Proposition 10.2.5. Let N = H/K and I = {Ii : i = 0, 1, . . . , N } where Ii = [i K , (i + 1)K − 1] for i = 0, 1, . . . , N − 1 and I N = [N K , H ]. By Theorem
416
R. Jin
10.2.22 there exists a non-empty set A ⊆ A0 such that |(A0 + ∗B) ∩ [0, H ]| ≥ A0 (0, H )
|(A + h ∗B) ∩ [0, H ]| A (0, H )
h1
.
Let J = {I ∈ I : I ∩ A = ∅}. Note that if I = [i K , (i + 1)K − 1] ∈ J and a ∈ I ∩ A , then A + h ∗B ⊇ a + (h ∗B) ∩ [y + K − (a − i K ), y + 2K − (a − i K ) − 1]. Note also that for any I = [i K , (i + 1)K − 1] ∈ J , by Exercise 10.2.28 we have a + (h ∗B) ∩ [y + K − (a − i K ), y + 2K − (a − i K ) − 1] ⊆ y + [(i + 1)K , (i + 2)K − 1],
1 |(h ∗B) ∩ [y + K − (a − i K ), y + 2K − (a − i K ) − 1]| ≈ β, K and
A0 (i K , (i + 1)K − 1) A (i K , (i + 1)K − 1) α. K K
Thus
|(A + h ∗B) ∩ [0, H ]| |J |β K , A (0, H ) |J |αK
which implies that |(A0 + ∗B) ∩ [0, H ]| A0 (0, H )
1 β h . α
Therefore, |(A0 + ∗B) ∩ [0, H ]| |(A0 + ∗B) ∩ [0, H ]| A0 (0, H ) = · H +1 A0 (0, H ) H +1
1 1 1 β h α = α1− h β h . α
Proof of Theorem 10.2.26, Part 3: The proof of this part is similar to the proof of Part 2. Let BD(A) = α and BD(h B) = β. Fix an arbitrary hyperfinite interval [x, x + H ] of integers and choose [y, y + 2K ] to be a hyperfinite interval of integers such that D y,y+2K (h ∗B) ≈ β and (y + 2K )/H ≈ 0. Note that Dx,x+H (∗A) α. Let A0 = ∗A ∩ [x, x + H ] − x. It suffices to show that 1 1 |(A0 + ∗B) ∩ [0, H ]| α1− h β h . H +1
10 Density Problems and Freiman’s Inverse Problems
417
Let N = H/K and I = {Ii : i ∈ [0, N ]} where Ii = [i K , (i + 1)K − 1] for i < N and I N = [N K , H ]. Note that |A0 ∩ Ii |/|Ii | α for any i ∈ [0, N − 1]. Let A1 ⊆ A0 be internal such that |A1 ∩ Ii |/|Ii | ≈ α for each i ∈ [0, N − 1]. Note that A1 can be obtained by simply deleting enough many elements from A0 ∩ Ii if |A0 ∩ Ii |/|Ii | α. Let A ⊆ A1 be non-empty such that |(A1 + ∗B) ∩ [0, H ]| ≥ A1 (0, H )
|(A + h ∗B) ∩ [0, H ]| A (0, H )
h1
.
The set A exists by Theorem 10.2.21. Let J = {I ∈ I : I ∩ A = ∅}. Now |(A1 + ∗B) ∩ [0, H ]| |∗(A + B) ∩ [x, x + H ]| ≥ H +1 H +1 h1 1 ∗ 1 1 |J |β K h |(A + h B) ∩ [0, H ]| A1 (0, H ) α ≈ α1− h β h . A (0, H ) H +1 |J |αK This completes the proof because [x, x + H ] is arbitrarily chosen and because of Proposition 10.2.5. Corollary 10.2.29 Let P be the set of all prime numbers and C be the set of all cubes of non-negative integers. Then d(A + P) ≥ d(A)2/3 and d(A + C) ≥ d(A)3/4 for any A ⊆ N. The reader might be curious about whether one can have a Plünnecke’s type of inequality for upper asymptotic density. The answer is no by an example in [15] where two sets A and B of non-negative integers are constructed such that d(2B) = 1, d(A + B) = d(A) = 21 . Exercise 10.2.30 Prove that if d(h B) = 1 for some positive integer h and 0 < d(A) < 1, then d(A + B) > d(A).
10.3 Applications to Freiman’s Inverse Problems Definition 10.3.1 A set A of integers is called an arithmetic progression, or a.p. for short, if there are integers a ≥ 0 and d > 0 such that A = {a, a + d, . . . , a + kd} for some non-negative integer k or A = {a, a + d, a + 2d, . . .} without a maximal element. The number d is called the difference of the arithmetic progression. A set A of integers is called a bi-arithmetic progression, or b.p. for short, if A = I0 ∪I1 where I0 and I1 are two arithmetic progressions of the same difference and
418
R. Jin
2I0 , I0 + I1 , 2I1 are pairwise disjoint. The pair (I0 , I1 ) is called a b.p. decomposition of A. We use the notation A = I0 I1 to indicate that (I0 , I1 ) is a b.p. decomposition of A. When we say that A is a subset of a b.p. I0 I1 , we always assume that A ∩ Ii = ∅ for i = 0, 1. By saying that A is a tight subset of a b.p. I0 I1 we mean that if A is a subset of any other b.p. J0 J1 , then I0 ∪ I1 ⊆ J0 ∪ J1 . For a set A ⊆ N and h ∈ N let h ∗ A := {ha : a ∈ A}. Note the difference between h ∗ A and h A. Recall that h A is the h-fold sum of A. Example 10.3.2 {0, 1} + 3 ∗ N is a b.p. and {0, 2} + 4 ∗ N is not a b.p. Freiman’s inverse phenomenon is the following statement: If A + B is “small”, then A and B must have some arithmetic structural properties. The first evidence of Freiman’s inverse phenomenon is that if A and B are two finite sets of non-negative integers and |A + B| = |A| + |B| − 1, then A and B must be two a.p.’s of the same difference. Note that |A + B| ≥ |A| + |B| − 1 is true for any non-empty finite sets A and B of integers. In [8] two types of Freiman’s theorems are introduced: the celebrated theorem of Freiman for large doubling constants and less famous theorems of Freiman for small doubling constants. The theorem for large doubling constants says roughly that for any positive constant C, there is a positive constant c such that for every sufficiently large finite set A of integers, if |2 A| ≤ C|A|, then A must be a subset of a C − 1 -dimensional arithmetic progression P with |A| ≥ c|P|. The interested reader should consult [8] or [24] for the proof of the theorem and the definition of multi-dimensional arithmetic progression. In this chapter we focus only on Freiman’s theorems for small doubling constants. Freiman’s theorem for large doubling constants has a weaker condition, i.e., a more general upper bound for |2 A|, but is less precise on the description of structural properties of A while Freiman’s theorems for small doubling constants have stronger conditions on the upper bound of |2 A| but is more precise on the description of the structural properties of A. Theorem 10.3.3 (G.A. Freiman) Let A be a finite set of integers such that |2 A| = 2|A| − 1 + b. 1. If |A| > 2 and 0 ≤ b < |A| − 2, then A must be a subset of an a.p. of length at most |A| + b. 2. If |A| > 6 and b = |A| − 2, then either A is a subset of an a.p. of length at most 2|A| − 1 or A is a b.p. There are some generalizations of Theorem 10.3.3 for the addition of two different sets or for more detailed descriptions of the structural properties of the involved sets or for both, see [1, 9, 10, 17, 23, 27]. We would like to state one of the generalizations in [23] which is needed in the proofs in this section.
10 Density Problems and Freiman’s Inverse Problems
419
Theorem 10.3.4 (V. Lev and P.Y. Smeliansky) Let A and B be two finite sets of non-negative integers such that 0 ∈ A ∩ B, |A|, |B| > 1, gcd(A) = 1, m = max A, and n = max B ≤ m. If m = n, then |A + B| ≥ min{m + |B|, |A| + 2|B| − 3}. If m > n, then |A + B| ≥ min{m + |B|, |A| + 2|B| − 2}. The main goal of this section is to generalize Theorem 10.3.3 to the case when |2 A| > 3|A| − 3. Freiman in fact posed a conjecture in [11] for the case of |2 A| > 3|A| − 3. Conjecture 10.3.5 (G.A. Freiman) There exists a natural number K such that for any finite set of integers A with |A| > K and |2 A| = 3|A| − 3 + b where 0 ≤ b < 1 3 |A| − 2, either A is a subset of an a.p. of length at most 2|A| − 1 + 2b or A is a subset of a b.p. of length at most |A| + b. It is proved by Freiman that if A is already known to be a subset of a b.p., then Conjecture 10.3.5 is true for A. The proof of the following theorem can be found in [8]. Theorem 10.3.6 (G.A. Freiman) If A is a tight subset of a b.p. I0 I1 and |2 A| = 3|A| − 3 + b for some 0 ≤ b < |A| − 3, then |I0 | + |I1 | ≤ |A| + b. In the next subsection, we prove a theorem on Freiman’s inverse problem for cuts, which is an important ingredient in the second subsection where a weak version of Conjecture 10.3.5 is presented.
10.3.1 Freiman’s Inverse Problem for Cuts Recall that a cut U is an infinite initial segment of ∗ N such that 2U ⊆ U . The cofinality of a cut is the least cardinality of a set which is unbounded in U . For example, the cut U H for some hyperfinite integer H defined in (10.1) has an uncountable cofinality by countable saturation. The inverse theorem for cuts is motivated by a theorem of Kneser and its corollary. Theorem 10.3.7 (M. Kneser) Let A, B ⊆ N. If d(A + B) < d(A) + d(B), then there exist positive integer g and G ⊆ [0, g − 1] such that (a) d(A + B) ≥ d(A) + d(B) − g1 , (b) A + B ⊆ G + g ∗ N, and (c) (G + g ∗ N) (A + B) is finite. Corollary 10.3.8 Suppose A, B ⊆ N and d(A + B) < d(A) + d(B). Then there exist positive integer g and F, F ⊆ [0, g − 1] such that A ⊆ F + g ∗ N, B ⊆ F + g ∗ N, and |F| + |F | 1 d(A) + d(B) > − . g g
420
R. Jin
The proof of a more general version of Theorem 10.3.7 can be found in [13]. Note that Theorem 10.3.7 and Corollary 10.3.8 are actually equivalent. See [21, Lemma 2.2] for a proof. Roughly speaking, Corollary 10.3.8 indicates that if the lower asymptotic density of A + B is small, then A and B are large subsets of the union of |F| and |F |, respectively, a.p.’s of difference g. Next we generalize some part of Theorem 10.3.7 to sets in cuts. So we need an analogy of lower asymptotic density of a set in a cut. Definition 10.3.9 Let U be a cut. A set A ⊆ U is called U -internal if A = A ∩ U for some internal set A ⊆ ∗ N. For a U -internal set A, let d U (A) = sup {inf {st(Dx (A)) : x ∈ U [0, m]} : m ∈ U } .
(10.2)
We call d U (A) the lower U -density of A. When U = N, the lower U -density coincides with the usual lower asymptotic density. The set A is assumed to be U -internal to insure that Dx (A) is well defined for all x ∈ U . Note that if A ⊆ U and A is an internal set, then A is bounded in U and is trivially U -internal. A U -internal set may not be internal. For example, U itself is U -internal but not internal if U = ∗ N. Note that d U (A) = d U (A [0, a]) for any a ∈ U . If we allow A to contain negative integers for convenience, we always assume d U (A) := d U (A ∩ U ). It is easy to see that d U (A) = d U (A + a) for every integer a with |a| < U . Exercise 10.3.10 Let U be a cut with an uncountable cofinality and A is U -internal. Prove that for all sufficiently large x ∈ U we have Dx (A) d U (A) and there exists sufficiently large x ∈ U such that d U (A) ≈ Dx (A). In this subsection we present a slightly weaker version of the main theorem in [18] so that the length of the proof can be shortened. However, the weaker version is sufficient for the proof of Theorem 10.3.20. Let A ⊆ U . The set A is called a U -truly unbounded subset of a b.p. {a1 , a2 } + g ∗ U if A ⊆ {a1 , a2 } + g ∗ U and A ∩ (ai + g ∗ U ) is unbounded in U for i = 1, 2. Theorem 10.3.11 Let U be a cut with uncountable cofinality and A0 ⊆ U be U – internal. Suppose 0 ∈ A0 and 0 < d U (A0 ) = α ≤ 35 . If D2x (2 A0 )
5 Dx (A0 ) 3
(10.3)
for all sufficiently large x ∈ A0 , then (a) either A0 is a subset of an a.p. g ∗ U for some g > 1 (b) or A0 is a U -truly unbounded subset of a b.p. (g ∗ U ) (a + g ∗ U ) with some g > 2 and a ∈ [1, g − 1].
10 Density Problems and Freiman’s Inverse Problems
421
The assumption that 0 ∈ A0 in Theorem 10.3.11 is for convenience only. The assumption allows the description of Part (b) to be short. It is easy to convert Theorem 10.3.11 to a theorem without the assumption. In the proof of Theorem 10.3.11 we need an important tool called e -transforms [24, page 42] (or τ –transformations in [13].) Definition 10.3.12 Let A, B ⊆ U ∪ [−n, 0] for some n ∈ U . An ec –transform of (A, B) is the pair (A , B ) = ec (A, B) such that c = a − b for some a ∈ A and b ∈ B, A = (A ∪ (B + c)), and B = (B ∩ (A − c)). Let E = Ec1 ,c2 ,...,ck = ec1 ◦ ec2 ◦ · · · ◦ eck represent a finite sequence of successive applications of e-transforms. If ec is one of the e -transforms in E, we say that ec occurs in E. We call E an E-transform. Note that if b ∈ B, a ∈ A, c = a − b, and (A , B ) = ec (A, B), then b ∈ B . When we apply an ec -transform to (A, B) we always assume that c = a − b for some a ∈ A and b ∈ B. Hence B = ∅ implies B = ∅ where (A , B ) is an E-transform of (A, B). Proposition 10.3.13 The following are important properties of ec –transforms. Let A, B be U –internal and (A , B ) = ec (A, B). Then A and B are U –internal and (a) A ⊇ A and B ⊆ B, (b) A + B ⊆ A + B, (c) if x ∈ U and c/x ≈ 0, then for every y with x < y < U we have D y (A) + D y (B) ≈ D y (A ) + D y (B ). Exercise 10.3.14 Prove Proposition 10.3.13. Note that if (A , B ) = ec (A, B) is replaced by (A , B ) = E(A, B) in Proposition 10.3.13, (a) and (b) are still true. Also (c) is true if ci /x ≈ 0 for all eci occurring in E. For simplifying the derivation process, we will cite the following two theorems. The proofs of Theorems 10.3.15 and 10.3.16 can be found in [24] and in [13, page 18], respectively. Theorem 10.3.15 (M. Kneser) Let G be an abelian group and A, B be finite subsets of G. Let S = {g ∈ G : g + A + B = A + B} be the stabilizer of A + B. If |A + B| < |A| + |B|, then |A + B| = |A + S| + |B + S| − |S|. Note that the stabilizer S of A + B is always a subgroup of G and A + B is a union of S-cosets.
422
R. Jin
Theorem 10.3.16 (van der Corput) If 0 ∈ A ∩ B, n ∈ N, and β is a non-negative real number such that A(0, m) + B(1, m) ≥ β(m + 1) for m = 1, 2, . . . , n, then Dm (A + B) ≥ β for m = 0, 1, 2, . . . , n. Proof of Theorem 10.3.11: The reader is warned that the length of this proof is substantial. First we can assume that gcd(A0 ) = 1 because otherwise Part (a) is true. Second, it suffices to show that A0 is a subset of a b.p. {0, a} + g ∗ U because if A0 is a subset of a b.p. but not a U -truly unbounded subset of the b.p., then D2x (2 A0 ) 2D2x (A0 ) 2α ≈ 2Dx (A0 )
5 Dx (A0 ) 3
for some sufficiently large x ∈ U , which contradicts (10.3). If 21 < α ≤ 35 , then d U (2 A0 ) = 1 ≥ 53 α by the fact that U [0, m] ⊆ 2 A0 for some m ∈ U because A0 (x) > 21 (x + 1) implies x ∈ 2 A0 . This contradicts (10.3). So we can assume that α ≤ 21 . For a U -internal set B with b¯ = min B, define f (B) by f (B) = min{b − b : b, b ∈ B, b < b } if there exist two elements b < b in B with b − b ∈ N and f (B) = ∞ otherwise, and define g(B) by ¯ g(B) = gcd(B − b) ¯ ∈ N and g(B) = ∞ otherwise. Clearly, g(B) ≤ f (B) and (A , B ) = if gcd(B − b) E(A, B) implies f (B ) ≥ f (B) and g(B ) ≥ g(B). Suppose that (A, B) is an E-transform of (A0 , A0 ) and Dx (B) 13 Dx (A0 ) for some sufficiently large x ∈ U . Let x = min{y > x : y ∈ B}. Then D2x (2 A0 ) D2x (A + B) ¯ x }) Dx (A) ≈ 2Dx (A0 ) − Dx (B) D2x (A + {b, x x 5 2Dx (A0 ) − Dx (B) 2Dx (A0 ) − Dx (A0 ) Dx (A0 ), x 3x 3 which contradicts (10.3). Hence we can assume that Dx (B)
1 Dx (A0 ) 3
for any E-transform (A, B) of (A0 , A0 ) and for all sufficiently large x ∈ U .
(10.4)
10 Density Problems and Freiman’s Inverse Problems
423
If f (B) ≥ 3/α, then Dx (B) 1/ f (B) ≤ α/3 for all sufficiently large x ∈ U , which contradicts (10.4). Hence we can assume that f (B) < 3/α. In particular, f (B) is bounded in N. As a consequence g(B) is also bounded in N. Fix an E-transform (A, B) of (A0 , A0 ) and constant positive integers f and g such that f (B ) = f and g(B ) = g
(10.5)
for any E-transform (A , B ) of (A, B). Note that B ⊆ b¯ + g ∗ U . Let FC = {c ∈ [0, g − 1] : ∃x ∈ C, x ≡ c (mod g)} for any set C. If y ∈ FC , let C y := C ∩ (y + g ∗ ∗ Z). Let k = |FA | and k0 = |FA0 |. Let a y = min A y for y ∈ FA . We denote b¯ for min B and b¯ for min B , etc. By applying a few more e-transforms, we can assume, without loss of generality, that a y − b¯ + B ⊆ A y for every y ∈ FA . Let FB = {b0 }. So b0 ≡ b¯ ( mod g). Note that 2Dx (A0 ) ≈ Dx (A) + Dx (B) (|FA | + 1)Dx (B)
1 (|FA | + 1)Dx (A0 ) 3
for all sufficiently large x ∈ U . So we have that k = |FA | ≤ 4. Note that FA0 ⊕ FA0 ⊇ FA ⊕ b0 because A + B ⊆ A0 + A0 where ⊕ is the addition modulo g and FA ⊕ b0 means FA ⊕ {b0 }. Note that it is possible that |FA0 ⊕ FA0 | > |FA |. Example 10.3.17 Let A0 = 4 ∗ U ∪ (1 + 8 ∗ U ) and (A, B) = e4 (A0 , A0 ). Then A = {0, 1} + 4 ∗ U , B = 4 ∗ U , g = 4, k = 2, and k0 = 2. Also FA0 = {0, 1} and |FA0 ⊕ FA0 | = 3 > |FA | = 2. In the case of k ≥ 2, we can prove that Dx (A y ) + Dx (B) g1 for all y ∈ FA and all sufficiently large x ∈ U by the following argument. Suppose, in contrary, that Dx (A y1 ) + Dx (B) g1 for some y1 ∈ FA and for some sufficiently large x ∈ U . Note that Dx (B) 1 because a y1 − b¯ + B ⊆ A y1 2g
and 3Dx (B) Dx (B) + Dx (A) 2Dx (A0 ). Let y2 ∈ FA and y2 = y1 . Let x = min{z ∈ B : z > x}. Since 2Dx (A0 ) ≈ Dx (A) + Dx (B) (k + 1)Dx (B) ¯ = g, 3Dx (B), we have Dx (B) 23 Dx (A0 ). By Theorem 10.3.4 and gcd(B − b) we have ¯ x ]| ≥ 2 A y1 (a y , x ) + B(b, ¯ x ) − q |A y1 ∩ [a y , x ] + B ∩ [b, with q/x ≈ 0 for some sufficiently large x ∈ B. Hence D2x (2 A0 ) D2x (A + B) D2x (A y + B) D2x (A y1 + B) + y= y1
1 Dx (A y ) Dx (A y1 ) + Dx (B) + 2 y= y1
424
R. Jin
1 1 Dx (A) + Dx (B) 2Dx (A0 ) − Dx (B) 2 2 1 5 2Dx (A0 ) − Dx (A0 ) = Dx (A0 ) 3 3 for some sufficiently large x ∈ U , which contradicts (10.3). As a consequence of Dx (A y ) + Dx (B) g1 , when k ≥ 2, we have that (FA ⊕ b0 + g ∗ U ) [0, m] ⊆ (A + B) ∩ U ⊆ FA ⊕ b0 + g ∗ U for some m ∈ U and Dx (A + B) ≈ gk for all sufficiently large x ∈ U . Next we divide the proof into cases according to the possible values of k = |FA |. Recall that |FA | ≤ 4. Case 1: k = 4. The possible values of k0 = |FA0 | are 3 and 4. If |FA0 ⊕ FA0 | = 4, then FA0 ⊕ FA0 = FA ⊕ b0 and 4 = |FA0 ⊕ FA0 | = 2|FA0 ⊕ S| − |S| for a non-trivial stabilizer S = d of FA0 ⊕ FA0 in Z/gZ. Note that |S| is a factor of 4 because FA0 ⊕ FA0 is a union of S-cosets. It is impossible for |S| = 2 because otherwise |FA0 ⊕ S| = 3, which contradicts that FA0 ⊕ S is a union of S-cosets. Hence |S| = 4 and FA ⊕ b0 = FA0 ⊕ FA0 = S, which implies that d = 1 because of gcd(A0 ) = 1 and hence g = 4. Now D2x (2 A0 ) D2x (A + B) ≈ 1 53 α ≈ 53 Dx (A0 ) for some sufficiently large x ∈ U , a contradiction to (10.3). If k0 = 3 and |FA0 ⊕ FA0 | = 5, then there is an a ∈ FA0 ⊕ FA0 FA ⊕ b0 . Let y1 , y2 ∈ FA0 = {y1 , y2 , y3 } with y1 ⊕ y2 = a and let p2x = D2x ((2 A0 )a ). Since 4 5 5 25 + p2x D2x (2 A0 ) Dx (A0 ) ≈ (Dx (A) + Dx (B)) , g 3 6 6g we have that Dx (A0 ) y y A01 or A02 is thin, i.e.,
12 5g
and p2x
1 6g
for all sufficiently large x ∈ U . Hence
y
y
min{Dx (A01 ), Dx (A02 )}
y
Without loss of generality, let Dx (A01 )
1 6g .
1 . 6g
So
2 12 1 67 2 1 y y Dx (A02 ) + Dx (A03 ) Dx (A0 ) − − = , g 6g 5g 6g 30g g which is absurd. If k0 = 3 and |FA0 ⊕ FA0 | = 6, then clearly, D2x (2 A0 ) 2Dx (A0 ) 53 Dx (A0 ) for all sufficiently large x ∈ U , which contradicts (10.3). If k0 = 4 and |FA0 ⊕FA0 | = 5, then 5 = 2|FA0 ⊕S|−|S| for a non-trivial stabilizer S = d of FA0 ⊕ FA0 . Clearly, |S| = 5 because 5 is a prime number and FA0 ⊆ S. Since gcd(A0 ) = 1, we have that d = 1 and g = 5. Let a ∈ FA0 ⊕ FA0 FA ⊕ b0 , FA0 = {y1 , y2 , y3 , y4 }, and a = y3 ⊕ y4 where y3 = y4 . Since 45 ≈ D2x (A + B)
10 Density Problems and Freiman’s Inverse Problems
425
D2x (2 A0 ) 53 Dx (A0 ) , we have that Dx (A0 ) 12 25 for all sufficiently large x ∈ U . Since 12 2 2 2 y y − = , Dx (A03 ) + Dx (A04 ) Dx (A0 ) − 5 25 5 25 we have that D2x ((2 A0 )a ) D2x (2 A0 )
1 25 .
Hence
4 4 1 21 5 5 5 + D2x ((2 A0 )a ) + = ≥ α ≈ Dx (A0 ) 5 5 25 25 6 3 3
for some sufficiently large x ∈ U , which contradicts (10.3). If k0 = 4 and |FA0 ⊕ FA0 | ≥ 6, let FA0 ⊕ FA0 = FA ⊕ b0 ∪ {a1 , a2 , . . .} where 5 / FA ⊕ b0 . Since 2Dx (A0 ) ≈ Dx (A) + Dx (B) g5 , we have Dx (A0 ) 2g . a1 , a2 ∈ a i Let pi = D2x ((2 A0 ) ) for i = 1, 2 and p = p1 + p2 . Since 5 4 + p ≈ D2x (A + B) + p D2x (2 A0 ) Dx (A0 ), g 3 we have that Dx (A0 )
12 5g
+ 35 p and p
y1 , y2 ∈ FA0 such that y1 ⊕ y2 =
1 6g . Let i ∈ {1, 2}. If there are two distinct y y 1 ai , then Dx (A01 ) + Dx (A02 ) 2 pi 3g . Hence
2 1 2 12 y y y y Dx (A03 ) + Dx (A04 ) ≈ Dx (A0 ) − Dx (A01 ) − Dx (A02 ) − g 5g 3g g for all sufficiently large x ∈ U where {y3 , y4 } = FA0 {y1 , y2 }. This is absurd. So we y y can assume that yi ⊕yi = ai where yi ∈ FA0 for i = 1, 2. Thus Dx (A01 )+Dx (A02 ) 1 1 . Hence again we have the contradiction g2 Dx (A0 ) − 3g g2 p1 + p2 = p 3g for all sufficiently large x ∈ U . Case 2: k = 3. Clearly, g > 2. The only possible values of k0 are 2 and 3. If k0 = 2 , then |FA0 ⊕ FA0 | = 3. Hence A0 is a subset of the b.p. {0, a} + g ∗U where FA0 = {0, a}. So Part (b) is true. Suppose that k0 = 3. If FA0 ⊕ FA0 = FA ⊕ b0 , then the stabilizer S = d of FA0 ⊕ FA0 has three elements. Hence FA0 = S, which implies d = 1 and D2x (2 A0 ) ≈ 1 53 α ≈ 53 Dx (A0 ) for some sufficiently large x ∈ U . If |FA0 ⊕ FA0 | = 4, then the stabilizer S = d of FA0 ⊕ FA0 has four elements because |S| = 2 is impossible for the equality 4 = 2|FA0 ⊕ S| − |S|. Now |S| = 4 implies d = 1. Let a ∈ FA0 ⊕ FA0 FA ⊕ b0 and p = D2x ((2 A0 )a ). Then 3 5 5 + p ≈ D2x (A + B) + D2x ((2 A0 )a ) D2x (2 A0 ) Dx (A0 ) 4 3 6 1 for some sufficiently large x ∈ U with Dx (A0 ) ≈ α 21 , which implies p 12 . 5 3 9 Since 3 Dx (A0 ) D2x (2 A0 ) 4 , we have that Dx (A0 ) 20 . In Z/4Z, there are
426
R. Jin
two distinct y1 , y2 ∈ FA such that either y1 ⊕ y2 = a or yi ⊕ yi = a for i = 1, 2. In y y each case, Dx (A01 ) + Dx (A02 ) 16 . Hence 9 1 27 − 10 1 1 1 y Dx (A03 ) Dx (A0 ) − − = 4 6 20 6 60 4 for some sufficiently large x ∈ U , which is absurd. If |FA0 ⊕ FA0 | ≥ 5, then FA0 ⊕ FA0 = FA ⊕ b0 ∪ {a1 , a2 , . . .}. Let pi = D2x ((2 A0 )ai ) for i = 1, 2 and p = p1 + p2 . Then 3 5 5 10 + p D2x (2 A0 ) Dx (A0 ) (Dx (A) + Dx (B)) . g 3 6 3g 9 1 Hence Dx (A0 ) 5g and p 3g . Let FA0 = {y1 , y2 , y3 } be such that either y1 ⊕ y2 = ai for some i ∈ {1, 2} and distinct y1 and y2 or yi ⊕ yi = ai for i = 1, 2 y y . In each case we have that Dx (A01 ) + Dx (A02 ) 2 p. Hence 1 2 1 9 y Dx (A03 ) Dx (A0 ) − 2 p − > g 5g 3g g
for some sufficiently large x ∈ U , which is absurd. Case 3: k = 2. Clearly, the only value possible for k0 is 2. Let FA0 = {0, a}. If 2a = g, then a = gcd(A0 ) = 1. So we we have that 2a = g = 2. Hence D2x (2 A0 ) ≈ 1 53 α ≈ 5 3 Dx (A0 ) for some sufficiently large x ∈ U , which contradicts (10.3). If 2a = g, then A0 is a subset of a b.p. {0, a} + g ∗ U . So Part (b) is true. Case 4: k = 1. This case is the hardest to prove. We aim to derive a contradiction to (10.3). Since A0 ⊆ A ⊆ g ∗ ∗ N and gcd(A0 ) = 1, we have that g = 1. Claim 1: If for every n ∈ N there is an E-transform (A , B ) of (A, B), such that A contains n consecutive numbers, then D2x (2 A0 ) 2Dx (A0 ) for some sufficiently large x ∈ U . Proof of Claim 1: Suppose that D2x (2 A0 ) 2Dx (A0 ) for all sufficiently large x ∈ U . Then there exists a standard positive real number β < 2 such that such that D2x (2 A0 ) βDx (A0 ) for all sufficiently large x ∈ U . In particular, d U (2 A0 ) < βα. Let β be a positive standard real number with β < β < 2. Let n ∈ N be such that n ≥ 1/((β − β)α). Without loss of generality we can assume that A contains n consecutive numbers a, a + 1, . . . , a + n − 1. We can also assume that if b ∈ B, then b − b¯ + a, b − b¯ + a + 1, . . . , b − b¯ + a + n − 1 ∈ A because otherwise we can replace (A, B) by (A , B ) = Ea+n−1−b,...,a− ¯ b¯ (A, B). ¯ ¯ ¯ ¯ then Let B = B − b and A = A − a. Then 0, 1, . . . , n − 1 ∈ A¯ and if b ∈ B, ¯ b, b + 1, . . . , b + n − 1 ∈ A. Let ¯ + Dx−1 ( B) ¯ < β α} . x0 = max {0} ∪ {x ∈ U : Dx−1 ( A)
10 Density Problems and Freiman’s Inverse Problems
427
The number x0 exists because otherwise we have 2α ≤ β α which is absurd. ¯ + Dx0 ,x0 +m ( B) ¯ ≥ β α for any By the definition of x0 we have that Dx0 ,x0 +m ( A) m ∈ U . Let x1 = min B¯ ∩ (x0 + U ) and B¯ 0 = B¯ − (x1 − x0 ). So x0 ∈ A¯ ∩ B¯ 0 and B¯ ∩ [x0 , x1 − 1] = ∅. We now want to verify the following condition ¯ 0 , x0 + m) + B¯ 0 (x0 + 1, x0 + m) ≥ βα(m + 1) A(x
(10.6)
for every m ∈ U so that Theorem 10.3.16 can be applied. If 0 ≤ m < x1 − x0 , then ¯ 0 , x0 + m) ¯ 0 , x0 + m) + B¯ 0 (x0 + 1, x0 + m) ≥ A(x A(x ¯ ¯ = A(x0 , x0 + m) + B(x0 , x0 + m) ≥ β α(m + 1) > βα(m + 1), which implies (10.6). So we can assume that m ≥ x1 − x0 . If m < x1 − x0 + n, then ¯ Hence x1 ≤ x0 + m < x1 + n and x1 , x1 + 1, . . . , x0 + m ∈ A. ¯ 0 , x1 − 1) + (x0 + m − x1 + 1) ¯ 0 , x0 + m) = A(x A(x ≥ βα(x1 − x0 ) + βα(x0 + m − x1 + 1) = βα(m + 1), which again implies (10.6). So we can assume that m ≥ x1 − x0 + n ≥ n. Now ¯ 0 , x0 + m) + B¯ 0 (x0 + 1, x0 + m) A(x ¯ 0 , x0 + m) + B(x ¯ 1 + 1, x0 + m) ≥ A(x ¯ ¯ 0 , x0 + m) − 1 = A(x0 , x0 + m) + B(x 1 1 (m + 1) ≥ β α − (m + 1) ≥ βα(m + 1) ≥ β α− m+1 n by the choice of n. So (10.6) is true. By Theorem 10.3.16 we conclude that D2x0 ,2x0 +m ( A¯ + B¯ 0 ) ≥ βα for every m ∈ U , which implies d U (A + B) = d U ( A¯ + B¯ 0 ) ≥ βα, a contradiction to d U (A + B) ≤ d U (2 A0 ) < βα. This completes the proof of Claim 1. Claim 2: If there exist sufficiently large x ∈ U such that Dx (A) + Dx (B) 1, then for every n ∈ N, there exists an E-transform (A , B ) of (A, B) such that A contains n consecutive integers. Proof of Claim 2: Given any E-transform (A , B ) of (A, B), we have that Dx (A ) 21 because Dx (B ) Dx (A ) for some sufficiently large x ∈ U . Hence A contains two consecutive integers. Suppose that m is the maximal integer such m because that some A contains m consecutive integers. Note that Dx (A ) m+1 every interval of length m + 1 can contain at most m integers for all sufficiently large x ∈ U where (A , B ) is any E-transform of (A , B ). Hence Dx (B ) m1 for all x ∈ U with Dx (A ) + Dx (B ) 1. In particular, B contains two integers b1 < b2 such that b2 − b1 < m. We now construct a pair (A , B ) as the following: Let a + i ∈ A , b¯ = min B , ci = a + i − b¯ for i = 0, 1, . . . , n − 1, and
428
R. Jin
(A , B ) = Ec0 ,c1 ,...,cn−1 (A , B ). Then B contains two elements b1 , b2 such that 0 < b2 − b1 ≤ m. Clearly, {a + b1 − b¯ , a + 1 + b1 − b¯ , . . . , a + m − 1 + b1 − b¯ } ∪ {a + b2 − b¯ , a + 1 + b2 − b¯ , . . . , a + m − 1 + b2 − b¯ } is a set of at least m + 1 consecutive integers in A , a contradiction to the maximality of m. This completes the proof of Claim 2. Let (A , B ) be any E-transform of (A, B). By Claim 1 and Claim 2, we can assume that Dx (A ) + Dx (B ) 1 for all sufficiently large x ∈ U . Note that for all sufficiently large x ∈ B with gcd(B ∩ [0, x]) = 1 it is true that 5 1 1 Dx (A0 ) D2x (A + B ) Dx (A ) + Dx (B ) ≈ Dx (A0 ) + Dx (A ) 3 2 2 by Theorem 10.3.4. So we have Dx (A ) 43 Dx (A0 ), Dx (B ) Dx (A ) 2Dx (B ) for all sufficiently large x ∈ B . For each U -internal set C define h(C) by h(C) := max{C(x, x + f − 1) : x ∈ U }
2 3 Dx (A0 ),
and
(10.7)
where f = f (B) is the minimal distance between two distinct elements in B. If h(A ) ≥ 2 for some E-transform (A , B ) of (A, B). Let a1 , a2 ∈ A ∩ [x, x + f − 1] be distinct and (A , B ) = Ea1 −b¯ ,a2 −b¯ (A , B ). Then a1 − b¯ + B and a2 − b¯ + B are disjoint subsets of A . This contradicts that Dx (A ) 2Dx (B ) for all sufficiently large x ∈ B . Hence we can assume h(A ) = 1 for any E-transform (A , B ) of (A, B). Without loss of generality, we assume that h(A) = 1. Claim 3: If {x j : j = 0, 1, 2 . . .} is a U -internal sequence unbounded in U with x j+1 − x j being hyperfinite such that D2x j ,2x j+1 (A + B)
5 (Dx j ,x j+1 (A) + Dx j ,x j+1 (B)) 6
for every j, then D2x j (A + B) 56 (Dx j (A) + Dx j (B)) for all sufficiently large x j in U . Exercise 10.3.18 Prove Claim 3. Claim 4: For every b ∈ B we have gcd((B − b) ∩ U ) = 1. Proof of Claim 4: Suppose that gcd((B − b) ∩ U ) = d > 1 for some b ∈ B. ¯ b − 1] ⊆ b + d ∗ U and there exists a ∈ A such that a ≡ 0 (mod d). Let Then B [b, ¯ b − 1]. Then B0 − b and B0 + a − b (A , B ) = E0−b,a−b (A, B) and B0 = B [b, are disjoint subsets of A and so Dx (A ) 2Dx (B0 ) ≈ 2Dx (B ) for all sufficiently large x ∈ U , which contradicts the fact that Dx (A ) 2D(B ). This completes the proof of Claim 4. Claim 5: If v is hyperfinite and C, D ⊆ [0, v] are internal such that Dv (C) + Dv (D) > v+2 v+1 , then for every x ∈ [0, v], either x ∈ C + D or v + x ∈ C + D. Proof of Claim 5: Suppose the claim is not true. Then C ∩ (x − D) ∩ [0, x] = ∅ and C ∩ (x + v − D) ∩ [x, v] = ∅. Hence C(0, x) + D(0, x) ≤ x + 1 and
10 Density Problems and Freiman’s Inverse Problems
429
C(x, v) + D(x, v) ≤ v − x + 1. Thus |C| + |D| ≤ v + 2, which contradicts Dv (C) + Dv (D) > v+2 v+1 . Claim 6: Let n ∈ N, m ∈ U , and a, a + m, c1 , c2 ∈ A. Then there exists an E-transform (A , B ) of (A, B) such that c1 , c1 + m, c1 + 2m, . . . , c1 + nm, c2 ∈ A . Proof of Claim 6: Let i < n and (A1 , B1 ) be an E-transform of (A, B) such that c1 , c1 + m, . . . , c1 + im, c2 ∈ A1 . Note that a, a + m ∈ A ⊆ A1 . Let (A1 , B1 ) = Ea−b¯1 ,a+m−b¯1 (A1 , B1 ). Then a − b¯1 + B1 and a + m − b¯1 + B1 are two subsets of A1 . If (a − b¯1 + B1 ) ∩ (a + m − b¯1 + B1 ) = ∅, then Dx (A1 ) 2Dx (B1 ) 2Dx (B1 ) for all sufficiently large x ∈ U , which contradicts the fact that Dx (A1 ) 2Dx (B1 ). Hence there is a b ∈ (a − b¯1 + B1 ) ∩ (a + m − b¯1 + B1 ). Let a = b − a − m + b¯1 . Then a , a + m ∈ B1 . Let (A2 , B2 ) = ec1 + im−a (A1 , B1 ). Then (c1 + im − a ) + (a + m) = c1 + (i + 1)m ∈ A2 . So c1 , c1 + m . . . , c1 + im, c1 + (i + 1)m, c2 ∈ A2 . Now the claim is true by induction. We now continue to prove Theorem 10.3.19 under Case 4 when k = 1 and g = 1. We also have h(A ) = 1 for every E-transform (A , B ) of (A, B) where h(A ) is defined in ( 10.7). If f = 1, then A contains two consecutive integers. By Claim 6, for every n ∈ N, there exists an E-transform (A , B ) of (A, B) such that A contains a sequence of n consecutive integers. Hence D2x (2 A0 ) 2Dx (A0 ) 53 Dx (A0 ) for all sufficiently large x ∈ U by Claim 1, which contradicts (10.3). Thus we can assume that f > 1. It is now suffices to construct a U -internal sequence {x0 , x2 , . . .} ⊆ B unbounded in U such that x j+1 − x j is hyperfinite for every j and D2x j ,2x j+1 (A + B)
5 (Dx (A) + Dx (B)), 6
(10.8)
which leads to a contradiction to (10.3) by Claim 3. Let x0 = min(B ∩ U ). Suppose we have found required xi for every i ≤ j. Let y = min{y ∈ A : y > x j and y ≡ x j (mod f )}, x = max B ∩ [x j , y − 1] , and x j+1 = min{x ∈ B : x ≥ y }. The number y and x j+1 are well defined by Claim 4 and the assumption that f > 1. The way of defining x j+1 guarantees that the sequence {x j } is U -internal. If y − x j is finite and let n ∈ N be such that x j + n f < y < xi + (n + 1) f , there is an E-transform (A , B ) of (A, B) such that x j + n f, y ∈ A by Claim 6. This implies that h(A ) ≥ 2, a contradiction to h(A ) = 1 for every E-transform (A , B ) of (A, B). Thus y − x j as well as x j+1 − x j must be hyperfinite. If Dx j ,x j+1 (B) ≈ 0, then (10.8) is true trivially. So we can assume that Dx j ,x j+1 (B) 0. This implies that x − x j x − x j 0. y − xj x j+1 − x j Hence there is an n ∈ N such that x j + n(x − x j ) < y < x j + (n + 1)(x − x j ). Let d = gcd(B ∩ [x j , x ] − x j ). Clearly, d = q f for some q ∈ N. Let A0 = A ∩ [x j , x ] ∩ (x j + q f ∗ U ), A1 = A ∩ [x j , x ] A0 , and A2 = A ∩ [x + 1, x j+1 ].
430
R. Jin
Let B 0 = B ∩ [x j , x ]. If Dx j ,x (A0 ) + Dx j ,x (B 0 )
1 qf
, then D2x j ,2x (B 0 + A0 )
1 0 0 0 0 2 Dx j ,x (B ) + Dx j ,x (A ) by Theorem 10.3.4. Note that the sets A + B , B 0 , y + B 0 , (A2 + x ) {y + x }, A2 + x j+1 are pairwise disjoint. Hence
A1 +
1 D2x j ,2x j+1 (A + B) Dx j ,x j+1 (A0 ) + Dx j ,x j+1 (B 0 ) 2 1 1 0 + Dx j ,x j+1 (A ) + Dx j ,x j+1 (B ) + Dx j ,x j+1 (A2 ) 2 5 ≈ Dx j ,x j+1 (A) + Dx j ,x j+1 (B) (Dx j ,x j+1 (A) + Dx j ,x j+1 (B)). 6 So we can assume that Dx j ,x (A0 ) + Dx j ,x (B 0 )
1 qf
. Let
m = max{i : x j + n(x − x j ) + iq f < y }. Then mq f < x − x j by the definition of n. By Claim 5, there exist x j + a ∈ A and x j + b ∈ B such that a + b = mq f or a + b = mq f + (x − x j ). Let δ = 0 or 1 be such that a + b = δ(x − x j ) + mq f . Let (A1 , B1 ) = ea (A, B). Note that ea is well-defined because a = x j + a − x j ∈ A − B. Then x j + b + a ∈ B + a ⊆ A1 . Since x j , x j + (x − x j ), x j + a + b, y ∈ A1 , there is an E-transform (A2 , B2 ) of (A1 , B1 ) such that x = x j + a + b + (n − δ)(x − x j ), y ∈ A2 by Claim 6. Note that 0 < y − x < q f . Let m = max{i : x + i f < y }. Then m < q. Since a , a + f ∈ A2 for some a by the definition of f , we have an E-transform (A3 , B3 ) of (A2 , B2 ) such that x +m f, y ∈ A3 by Claim 6 again. This implies that h(A3 ) ≥ 2, which contradicts the assumption that h(A ) = 1 for any E-transform (A , B ) of (A, B). This completes the proof of Theorem 10.3.11. We now strengthen Theorem 10.3.11 by adding more quantitative information to the structure of A0 . Theorem 10.3.19 Let U be a cut with uncountable cofinality and A0 ⊆ U be U –internal. Suppose 0 ∈ A0 and 0 < d U (A0 ) = α ≤ 35 . If D2x (2 A0 )
5 Dx (A0 ) 3
(10.9)
for all sufficiently large x ∈ A0 , then one of the following must be true: 3 . (a) A0 ⊆ g ∗ U for some g > 1 and α ≥ 5g (b) A0 ⊆ {0, a} + g ∗ U for some g > 2, a ∈ [1, g − 1] and 2a = g such that 2 where α ≥ (1+β)g
3+γ β = inf γ ≥ 0 : ∃y ∈ U, ∀x ∈ U, x > y ⇒ D2x (2 A0 ) Dx (A0 ) . 2
10 Density Problems and Freiman’s Inverse Problems
431
Note that 0 ≤ β < 13 by (10.9). If β = 0, then A0 is a full subset of {0, a} + g ∗ U in Part b, i.e., the U -density of A0 equals the U -density of {0, a} + g ∗ U . When β increases from 0, the proportion of A0 in {0, a} + g ∗ U could decrease. Proof of Theorem 10.3.19: Assume that gcd(A0 ) = 1. We show that Part (b) of the theorem is true. Note that A0 is a U -truly unbounded subset of a b.p. {0, a} + g ∗ U by Theorem 10.3.11. Let A00 = A0 ∩ g ∗ U and Aa0 = A0 ∩ (a + g ∗ U ). We can assume that g = gcd(A00 ∪ (Aa0 − a)). If gcd(A00 ) = g0 > g and gcd(Aa0 − min Aa0 ) = ga > g, then gcd(g0 , ga ) = g. Hence D2x (A00 + Aa0 ) 43 Dx (A0 ) for all sufficiently large x ∈ U because |A00 ∩ [0, x] + Aa0 ∩ [0, x]| ≥ A00 (0, x) + Aa0 (0, x) + max{A00 (0, x), Aa0 (0, x)} − q for some q ∈ N. So we have D2x (2 A0 ) D2x ((2 A00 ) ∪ (A00 + Aa0 ) ∪ (2 Aa0 )) 3 7 5 Dx (A0 ) + Dx (A0 ) Dx (A0 ) Dx (A0 ) 4 4 3 for all sufficiently large x ∈ U . Suppose that g0 = g and ga = gd > g . If Dx (Aa0 ) 21 Dx (A00 ) for some sufficiently large x ∈ U , then Dx (Aa0 ) 13 Dx (A0 ) and D2x (A00 + Aa0 )
1 1 1 2 Dx (A00 ) + Dx (Aa0 ) Dx (A0 ) + Dx (A0 ) = Dx (A0 ) 2 2 6 3
for some sufficiently large x ∈ U . Hence again 2 5 D2x (2 A0 ) Dx (A0 ) + Dx (A0 ) Dx (A0 ) 3 3 for some sufficiently large x ∈ U . So we can assume that Dx (Aa0 ) 21 Dx (A00 ) for all sufficiently large x ∈ U . Now we have Dx (A00 ) 23 Dx (A0 ). For each x ∈ U let x = min{y ∈ Aa0 : y ≥ x}. Then D2x (A00 + Aa0 ) Dx (A00 )
2 Dx (A0 ). 3
Hence D2x (2 A0 ) D2x (2 A00 ) + D2x (A00 + Aa0 ) + D2x (2 Aa0 ) 2 5 Dx (A0 ) + Dx (A0 ) = Dx (A0 ) 3 3 for all sufficiently large x0a . By the same reasoning, we can derive a contradiction if g0 > g and ga = g. Thus we can assume that g0 = ga = g.
432
R. Jin
If Dx (A00 ) + Dx (Aa0 ) g1 , then D2x (A00 + Aa0 ) 43 Dx (A0 ) by Theorem 10.3.4, which implies D2x (2 A0 ) 47 Dx (A0 ) for some sufficiently large x ∈ U , which contradicts ( 10.9). Thus we can assume that Dx (A0 ) = Dx (A00 ) + Dx (Aa0 ) g1 for all sufficiently large x ∈ U . In particular, α > g1 . Let x ∈ A0 be sufficiently large such that Dx (A0 ) ≈ α. Without loss of generality let x ∈ A00 . Let x0a : y ≥ x}. If x/x 1, then Dx (A0 ) ≈ Dx (A0 )
x x x − x 0 x −x (A ) + D α +
α, x,x 0 x x x gx
which contradicts the minimality of α. So we can assume that x/x ≈ 1. 1 Since D2x (2 A0 ) 3+β 2 Dx (A0 ) where β < 3 is defined in Part (2), there exists a term t with t/x ≈ 0 such that |2(A0 ∩ [0, x ])| ≤ 3A0 (0, x ) − 3 + β A0 (0, x ) + t. Note that β A0 (0, x ) + t < 13 A0 (0, x ). By Theorem 10.3.6 we have that x − min Aa0 x − min A00 2 ≈ + g gx gx A0 (0, x ) β A0 (0, x ) + t + ≈ Dx (A0 )(1 + β) ≈ α(1 + β). x x Therefore, α ≥
2 (1+β)g .
Assume gcd(A0 ) = g > 1. Suppose Part (a) is not true. Then gα ≤ 35 . Let A0 = {a/g : a ∈ A0 }. Then gcd(A0 ) = 1, 0 < d U (A0 ) = gα ≤ 35 , and D2x (2 A0 )
5 3 Dx (A0 ) for all sufficiently large x ∈ U . Hence A0 is a U -truly unbounded subset 2 . Therefore, A0 is a U -truly unbounded of a b.p. {0, a } + g ∗ U and gα ≥ (1+β)g 2 subset of the b.p. {0, ga} + gg ∗ U and α ≥ (1+β)gg , which implies Part (b) of Theorem 10.3.19. This completes the proof of Theorem 10.3.19.
10.3.2 Freiman’s 3| A| − 3 + b Conjecture In this subsection we introduce a weak solution to Conjecture 10.3.5. Since the complete proof of the theorem is too long to be included here, we will supply only a skeletal sketch. The reader should consult [19] for the details. The following is the theorem. Theorem 10.3.20 There exists a positive real number and a natural number K such that for every finite set of natural numbers A, if |A| > K and |2 A| = 3|A|−3 + b for 0 ≤ b ≤ |A|, then A is either a subset of an a.p. of length at most 2|A| − 1 + 2b or a subset of a b.p. of length at most |A| + b.
10 Density Problems and Freiman’s Inverse Problems
433
As pointed out by Freiman, the upper bounds 2|A| − 1 + 2b and |A| + b for the length of the a.p. and b.p. containing A, respectively, in Theorem 10.3.20 are optimal. We provide two examples to show this fact. Example 10.3.21 For sufficiently large k let A = [0, k − 3] ∪ {k + 10, 2k + 20}. Then |A| = k and |2 A| = 3k − 3 + 11. The shortest a.p. containing A has length 2k − 1 + 2 × 11 and A is not a subset of a b.p. Example 10.3.22 For sufficiently large k let A = [0, k − 3] ∪ {3k, 3k + 12}. Then |A| = k and |2 A| = 3k − 3 + 11. The shortest b.p. containing A has length k + 11 and A is not a subset of an a.p. of length 2k − 1 + 2 × 11. The first step to prove Theorem 10.3.20 is to convert it to a nonstandard version. Theorem 10.3.23 Let H be a hyperfinite integer and A ⊆ [0, H ] be an internal set. Suppose 0 = min A, H = max A, D H (A) 0 , gcd(A) = 1, and |2 A| = 3|A| − 3 + b forb ≥ 0 and
b ≈ 0. H
(10.10)
Then either H + 1 ≤ 2|A| − 1 + 2b or A is a subset of a b.p. of length at most |A| + b. Exercise 10.3.24 Prove that Theorem 10.3.23 implies Theorem 10.3.20. Hint for Exercise 10.3.24: Suppose that Theorem 10.3.20 is false. By the overspill principle, Theorem 10.3.20 has a hyperfinite counterexample in [0, H ] with 0, H ∈ A, gcd(A) = 1, and ≈ 0. If D H (A) 0, then A is a counterexample of Theorem 10.3.23. Suppose D H (A) ≈ 0. By Freiman’s theorem for large doubling constants, the set A can be considered as a subset of a two dimensional arithmetic progression P with |A|/|P| 0 in ∗ Z2 . So A is not a subset of a straight line. If A cannot be covered by two parallel lines, one can directly verify that |2 A|/|A| 3. So A must be a subset of a b.p. Now apply Theorem 10.3.6. Sketch of the proof of Theorem 10.3.23: The letter b in this proof is reserved for the value in (10.10), i.e., b = |2 A| − 3|A| + 3. Recall that 0 ≤ b/H ≈ 0. Assume that the theorem is not true. We derive a contradiction by showing that either D2H (2 A) 23 D H (A), or H + 1 ≤ 2|A| − 1 + 2b, or A is a subset of a b.p., which implies that the length of the b.p. can be less than or equal to |A| + b by Theorem 10.3.6. Let α = inf{st (D H (A)) : H is hyperfinite and A ⊆ [0, H ] is a counterexample of Theorem 10.3.23.} Fix a hyperfinite counterexample A ⊆ [0, H ] of Theorem 10.3.23 such that st (D H (A)) = α. Such a counterexample exists by countable saturation. Clearly, D H (A) 21 because otherwise H + 1 < 2|A| − 1 ≤ 2|A| − 1 + 2b is true. Let U = U H be defined in (10.1). Note that U has an uncountable cofinality. Note also that we have D2H (2 A) ≈ 23 D H (A) because b/H ≈ 0. For convenience we give a name to each of the following configurations of sets.
434
R. Jin
Definition 10.3.25 Let u, v ∈ ∗ N be such that v − u is hyperfinite. • An internal set F ⊆ [u, v] is called a forward triangle in [u, v] if Du,v (F) ≈ 21 and Du,x (F) 21 for every x ∈ [u, v] with 0 (x − u)/(v − u) 1. • An internal set F ⊆ [u, v] is called a backward triangle in [u, v] if u + v − F is a forward triangle in [u, v]. Exercise 10.3.26 Suppose that F is a forward triangle in [0, v] ⊆ [0, H ] with v/H 0, c > v in [0, H ], and F = F ∪ {c}. Let a = max F. Prove that (a) D2H (2F ) 3D2H (F ); (b) if min{c, 2a}/H v/H , then D2H (2F ) 3D2H (F ); (c) if w = min{c, 2a}, then D2H (2F ) 3D2H (F ) + (w − v)/(4H ). Note that Part (b) is a consequence of Part (c) in the exercise above. It is easy to formulate a parallel exercise for a backward triangle. Let B be a hyperfinite set. By saying that A is a full subset of B if A ⊆ B and |A|/|B| ≈ 1. The following list of twenty-three items are lemmas in [19]. Many of these lemmas require long process of verifications, which are omitted here. Note that the later items may depend on some previous items. In many of these lemmas we assume that A has some structural properties. With these properties, we can derive the contradiction. Then we apply Theorem 10.3.19 to obtain these structural properties of A. 1. If there are u, v ∈ A such that 0 u/H ≤ v/H 1 and A = F ∪ P where F is a forward triangle in [0, u] and P is a subset of an a.p. of difference d > 1 in [v, H ] with |P|/H 0, then either A is a subset of a b.p. of difference 3 or D2H (2 A) 23 D H (A). 2. Suppose there are u, v ∈ [0, H ] such that 0 u/H < v/H 1 and A = F ∪ B where F is a forward triangle in [0, u] and B is a backward triangle in [v, H ]. If D2H (2 A) ≈ 23 D H (A), then (max F)/H ≈ u/(2H ) and (min B)/H ≈ (v + H )/(2H ). Hence A is a full subset of the b.p. [0, max A] [min B, H ] of difference 1. 3. Suppose there are u, v ∈ [0, H ] such that 0 u/H v/H 1 and A = F ∪ C, where F is a forward triangle in [0, u] and C ⊆ [v, H ] with v ∈ C, Dv,H (C) ≈ 21 , and gcd(C − v) = 1. Then D2H (2 A) 23 D H (A). 4. Suppose there are u, v, c ∈ [0, H ] such that 0 u/H < v/H c/H 1 and A = F ∪ B ∪ C where F is a forward triangle in [0, u], B is a backward triangle in [v, c], and C ⊆ [c + 1, H ] with Dc,H (C) 21 . Then D2H (2 A) 23 D H (A). 5. Suppose there is an u with 0 u/H 1 such that F = A ∩ [0, u] is a forward triangle in [0, u] and Du,H (A) 21 . If D2H (2 A) ≈ 23 D H (A), then A is a full subset of a b.p. of difference 3 or a full subset of a b.p. of difference 1. 6. Suppose there are u, v ∈ [0, H ] with 0 u/H v/H 1 such that Du (A) 21 , Dv,H (A) 21 , A ∩ [u, v] is a forward triangle in [u, v], and A ∩ [u, H ] is not a subset of a b.p. of difference 3. Then D2H (2 A) 23 D H (A).
10 Density Problems and Freiman’s Inverse Problems
435
7. Suppose there are u, v ∈ [0, H ] with 0 u/H v/H 1 such that Du (A) 21 , Dv,H (A) 21 , A ∩ [u, v] is a forward triangle in [u, v], and A ∩ [v, H ] is not a subset of an a.p. of difference 3. Then D2H (2 A) 23 D H (A). 8. Assume D H (A) 21 , d U (A ∩ U ) > 0, A ∩ U is a subset of an a.p. of difference d > 1. If A is not a subset of a b.p., then D2H (2 A) 23 D H (A). 9. Assume D H (A) 21 , 0 < d U (A∩U ) ≤ 21 , A∩U is a U -truly unbounded subset of a b.p. of difference d. If A is not a subset of a b.p., Then D2H (2 A) 23 D H (A). 10. Let z ∈ [0, H ] A and A = A ∪ {z}. Suppose |(2 A ) (2 A)| ≤ 2 and |2 A | = 3|A | − 3 + b . If H + 1 > 2|A| − 1 + 2b, then 0 ≤ b ≤ b − 1, |A | ≤ 21 (H + 1), and H + 1 > 2|A | − 1 + 2b .
11. 12.
13.
14. 15.
16. 17.
18.
Remark: Item 10 is a standard combinatorial argument. Item 10 shows that if increasing the size of A by 1 in [0, H ] would increase the size of 2 A by at most 2, then that A is a counterexample of Theorem 10.3.23 implies that A ∪ {a} is a counterexample of Theorem 10.3.23. Hence we can assume that the set A has a maximal number of elements under the condition that increasing the size of A by 1 will increase the size of 2 A by at most 2. With this property, it becomes easier to should that A cannot be a counterexample of Theorem 10.3.23, which contradicts the assumption that the original set A is a counterexample of Theorem 10.3.23. In the following lemmas it is always assumed that D H (A) ≈ 21 and D2H (2 A) ≈ 3 2 D H (A). If there is an a ∈ U such that A ∩ [a + 1, H ] is a subset of a b.p. of difference 3, then either A is a subset of a b.p. or H + 1 ≤ 2|A| − 1 + 2b. Let Ai = {z ∈ A : z ≡ i (mod 3)} for i = 0, 1, 2. If Ai = ∅ for i = 0, 1, 2 and there is an i ∈ [0, 2] such that (max Ai − min Ai )/H ≈ 0, then H + 1 ≤ 2|A| − 1 + 2b. Suppose there are a, c ∈ [0, H ] with 0 a/H ≈ c/H H such that (a) A ∩ [0, a] is a backward triangle and also a subset of a b.p. of difference 3 and (b) A ∩ [c, H ] is a forward triangle and also a subset of a b.p. of difference 3. Then either A is a subset of a b.p. of difference 3 or H + 1 ≤ 2|A| − 1 + 2b. If there is an a ∈ A with a/H ≈ 0 such that gcd(A[a, H ] − a) = d > 1, then H + 1 ≤ 2|A| − 1 + 2b. Let A = Ae ∪ Ao , where Ae is the set of all even numbers in A and Ao is the set of all odd numbers in A. Let u i = max Ai for i = e, o and lo = min Ao . Note that le = 0. If (a) u e = H and (u o − lo )/H ≈ 0 or (b) u o = H , 0 lo /H < u e /H 1, and (u e − lo )/H ≈ 0, then H + 1 ≤ 2|A| − 1 + 2b. If A is a forward triangle in [0, H ], then either A is a subset of a b.p. of difference 3 or H + 1 ≤ 2|A| − 1 + 2b. Suppose 0 a/H 1 such that A ∩ [0, a] is a backward triangle in [0, a] and A ∩ [a + 1, H ] is a forward triangle in [a + 1, H ]. Then either A is a subset of a b.p. of difference 3 or H + 1 ≤ 2|A| − 1 + 2b. Suppose 0 a/H 1 such that A ∩ [0, a] is a backward triangle and A ∩ [a + 1, H ] is a subset of an a.p. of difference d > 1. Then H + 1 ≤ 2|A| − 1 + 2b.
436
R. Jin
19. If d U (A) > 21 , then either A is a subset of a b.p. of difference 3 or H + 1 ≤ 2|A| − 1 + 2b. 20. Let d U (A) = 21 . If there is an a ∈ A with a/H 0 such that gcd(A ∩ [a, H ] − a) = 1, A is not a subset of a b.p., and H + 1 > 2|A| − 1 + 2b, then A ∩ U is either a subset of an a.p. of difference d > 1 or a U -truly unbounded subset of a b.p. 21. Let d U (A) = 21 . If A ∩ U is a U -truly unbounded subset of a b.p., then H + 1 ≤ 2|A| − 1 + 2b. 22. Let d U (A) = 21 . If A ∩ U is a subset of an a.p. of difference > 1, then either A is a subset of a b.p. of difference 2 or H + 1 ≤ 2|A| − 1 + 2b. 23. Suppose d U (A) < 21 . Then either A is a subset of a b.p. of difference 3 or H + 1 ≤ 2|A| − 1 + 2b. We continue to prove Theorem 10.3.23 by using all items in the list above. The remaining proof is divided into two parts for the cases of D H (A) 21 and D H (A) ≈ 21 . Part 1: D H (A) 21 . Since b/H ≈ 0 implies that H + 1 is significantly greater than 2|A| − 1 + 2b, the main goal of this part is to derive a contradiction by showing that A is a subset of a b.p. We sub-divide the proof of Part 1 into three cases according to the values of d U (A ∩ U ). Case 1.1: d U (A ∩ U ) > 21 . Let a ∈ [0, H ] be such that A ∩[0, a] is a forward triangle. Then the contradiction follows from Item 5 in the list above. Case 1.2: 0 < d U (A ∩ U ) ≤ 21 . If D2x ((2 A) ∩ U ) 23 Dx (A ∩ U ) for some sufficiently large x ∈ U , then there exist arbitrarily small a > U with a ∈ A such that D2a,2H (2 A ) 23 Da,H (A ) where A = A ∩ [a, H ]. By Part 1 of Theorem 10.3.3, Da,H (A) 21 , which is impossible because a > U can be arbitrarily small and α < 21 , or A is a subset of an a.p. of difference > 1, which will lead to a contradiction by Item 8 for H − A. If D2x ((2 A) ∩ U ) 23 Dx (A ∩ U ) for all sufficiently large x ∈ U , then A ∩ U 3 is either a subset of difference g > 1 with d U (A ∩ U ) ≥ 5g or a subset of a b.p. of difference g > 3 with 21 ≥ d U (A ∩ U ) = g2 by Theorem 10.3.19. Hence there is an a ∈ [0, H ] with a/H 0 such that A ∩ [0, a] is either a subset of an a.p. of difference > 1 or a subset of a b.p. of difference d > 2, which will lead to a contradiction by either Item 8 or Item 9 in the list above. Case 1.3: d U (A ∩ U ) = 0. Suppose that there is an a ∈ A such that a/H 0 and Da (A) ≈ 0. If Da,H 21 , then we can find 0 c /H a/H c/H ≤ 1 such that, Dc,H (A) ≈ 21 , c, c + 1 ∈ A, and A ∩ [c , c] is a backward triangle in [c , c]. If c/H ≈ 1, then a contradiction follows from Item 5. Suppose c/H 1. Note that
10 Density Problems and Freiman’s Inverse Problems
437
A∩[0, c+1] cannot be a subset of a b.p. of difference 3 because otherwise D2H (2 A) ≈ 3 3 3 2 D H (A) and D2c,2H (2 A) 2 Dc,H (A) imply D2(c+1) (2 A) ≈ 2 Dc+1 (A), which implies that A ∩ [0, c + 1] is a full subset of a b.p. of difference 3 by Theorem 10.3.6. Hence d U (A ∩ U ) ≥ 13 , which contradicts the assumption of this case. Thus a contradiction now follows from Item 6 for H − A. If Da,H (A) 21 and A ∩ [a, H ] is a subset of an a.p. of difference >1, then a contradiction follows from Item 8. If Da,H (A) 21 and A ∩ [a, H ] is not a subset of an a.p. of difference >1, then D2H (2 A) D2H (a + A ∩ [a, 2a]) + D2a,2H (2 A )
3 3 D H (A ) ≈ D H (A), 2 2
which contradicts the assumption that b/H ≈ 0. Suppose that for every x ∈ A with x/H 0 we have Dx (A) 0. Note that A cannot be a subset of a b.p. of difference 1 because otherwise Da (A) ≈ 0 for some a > U . By symmetry we can assume that d U (H − A) = 0 and Dx (H − A) 0 for every x ∈ A with x/H 0. Choose a ∈ A with 0 a/H 1 such that Da (A) ≈ α and Dx (A) α for every x ∈ [0, a] with 0 x/H a/H . Without loss of generality we can assume that a ∈ A because otherwise replace a by max A ∩ [0, a]. If gcd(A∩[0, a]) = d1 > 1 and gcd(A∩[a, H ]−a) = d2 > 1, then gcd(d1 , d2 ) = 1 because otherwise A is a subset of a b.p. Let c := min{x ∈ A : x ≡ 0 (mod d1 )} and a := max{x ∈ A ∩ [0, c − 1] : x ≡ c }. Then the intersection of the two sets 2(A ∩ [0, c ]) and (a + A ∩ [c , H ]) ∪ (2(A ∩ [c , H ])) contains only one element a + c . Hence 3|A| − 3 + b ≥ |2 A| ≥ 3A(0, c ) + 3A(c , H ) − 6. This implies D2a (2(A ∩ [0, a ])) ≈ 2Da (A) and D2c ,2H (2(A ∩ [c , H ])) ≈ 2Dc ,H (A). By Theorem 10.3.3 we have that A ∩ [0, c ] is a full subset of an a.p. of difference d1 and A ∩ [c , H ] is a full subset of an a.p. of difference d2 . So d U (A) ≥ d11 , which contradicts the assumption that d U (A) = 0. If d1 > 1 and d2 = 1, then D2a,2H (2(A ∩ [a, H ])) 23 Da,H (A). So again A ∩ [0, a ] is a full subset of an a.p. of difference d1 , which implies d U (A) ≥ d11 . If d1 = 1 and d2 > 1, then we have d U (H − A) ≥ d12 by symmetry. Therefore, we can now assume that d1 = 1 and d2 = 1. Note that we now have D2a (2 A) ≈ 23 Da (A) and D2a,2H (2 A) ≈ 23 Da,H (A). Note also that d U (a − A ∩ [0, a]) ≥ α by the choice of a. So the set a − A ∩ [0, a] satisfies the assumption of Case 1.1 or Case 1.2. So we can conclude that A ∩ [0, a] is a full subset of a b.p. of difference d > 1, which implies d U (A) > 0.
438
R. Jin
Part 2: D H (A) ≈ 21 . If gcd(A ∩ [a, H ] − a) > 1 for some a ∈ A with a/H ≈ 0, then a contradiction follows from Item 14. So we can assume that there is an a ∈ A with a/H 0 such that gcd(A ∩ [a, H ] − a) = 1. Case 2.1: d U (A) > 21 . A contradiction follows Item 19. Case 2.2: d U (A) < 21 . A contradiction follows from Item 23. Case 2.3: d U (A) = 21 . If A ∩ U is a subset of an a.p. of difference > 1, then a contradiction follows from Item 22. If A ∩ U is a U -truly unbounded subset of a b.p., then a contradiction follows from Item 21. Therefore, by Theorem 10.3.19, we can conclude that there are sufficiently large x ∈ U such that D2x ((2 A) ∩ U ) 53 Dx (A ∩ U ). This implies that for every x > U there is an a ∈ A with 0 a/H ≤ x/H such that D2a (2 A) 19 12 12 Da (A) and Da (A) 25 . Fix such an a such that A ∩[a, H ] is not a subset of an a.p. of difference >1. Since D2H (2 A) ≈ 23 D H (A), we have that D2a,2H (2 A) 23 Da,H (A). Let A = A ∩ [0, a] and A = A ∩ [a, H ]. Let |2 A | = 3|A | − 3 + b . Note that b /a 16 Da (A ). Since |2 A | + |2 A | − 1 ≤ |2 A| = 3|A| − 3 + b, we have that |2 A | ≤ 3|A| − 3 + b − 3|A | + 3 − b + 1 = 3|A | − 3 + b − b + 1 = 2|A | − 1 + (|A | − 1 + b − b ). Note that (b − b)/H 0, which implies that |A | − 1 + b − b < |A | − 2. By Theorem 10.3.3 we have that H − a + 1 ≤ |A | + (|A | − 1 + b − b ). So H + 1 ≤ 2|A | + a − 1 + b − b < 2|A | + < 2|A| − 1 +
25 |A | − 2 + b − b 12
1 1 |A | + 1 + b − |A | < 2|A| − 1 + 2b 12 6
because D H (A ) 0 and b/H ≈ 0. This contradicts the assumption that A is a counterexample of Theorem 10.3.23. Summary of the strategy If a significant part of A is well structured and D2H (2 A) ≈ 23 D H (A), one can show that A itself is well structured. In order to obtain the structural information on a significant part of A, one checks up various cases according to the values of d U (A ∩ U ) and D2x (2 A) for x ∈ U . Roughly speaking, if
10 Density Problems and Freiman’s Inverse Problems
439
d U (A ∩ U ) > 21 , then A ∩ [0, a] is a forward triangle in [0, a], if d U (A ∩ U ) < 21 , then A ∩ [0, a] may be a backward triangle in [0, a], for some a/H 0. The structure of A ∩ [0, a] then strongly influences the structure of A ∩ [a, H ]. Suppose d U (A ∩ U ) = 21 . If D2x 53 Dx (A) for all sufficiently large x ∈ U , then A ∩ U is either a subset of an a.p. of difference > 1 or a U -truly unbounded subset of a b.p. by Theorem 10.3.19. So A has nice structure in [0, a] for some a/H 0 by the overspill principle. If D2x 53 Dx (A) for some sufficiently large x ∈ U , then there exists an a ∈ A with a/H 0 such that D2a,2H (2 A) 23 Da,H (A). Hence A ∩ [a, H ] has nice structure. From the structure of a part of A to the structure of the whole set A, a long list of lemmas needs to be verified.
10.3.3 Freiman’s Inverse Problem for Upper Asymptotic Density In this subsection we present an application of Theorem 10.3.23 to the inverse problem for upper asymptotic density. The following theorem first appeared in [20]. Theorem 10.3.27 Let A ⊆ N be such that 0 ∈ A, 0 < d(A) = α < gcd(A) = 1. If d(2 A) = 23 α, then
1 2,
and
(a) either there exist g > 4 and a ∈ [1, g − 1] with 2a = g such that A ⊆ {0, a} + g ∗ N and α = g2 , (b) or for every increasing sequence h n : n ∈ N with limn→∞ Dh n (A) = α, there exist two sequences 0 ≤ cn ≤ bn ≤ h n such that A ∩ [cn + 1, bn − 1] = ∅ for each n ∈ N, cn = 0, and lim Dbn ,h n (A) = 1. lim n→∞ h n n→∞ Remarks (1) The assumption 0 ∈ A is for convenience only. If 0 ∈ / A, one can consider A − min A instead. (2) If the condition g = 1 where g = gcd(A) is removed, one can easily convert Theorem 10.3.27 to a theorem about A in an a.p. of difference g. (3) Part (b) above implies limn→∞ (h n − bn )/ h n = α. (4) If gcd(A − min A) = 1 and d(A) ≤ 21 , then d(2 A) ≥ 23 d(A) by Theorem 10.3.3. So 23 d(A) is the least value d(2 A) can achieve. The following two examples indicate that the conclusion of Theorem 10.3.27 cannot be improved. Example 10.3.28 For every real number 0 ≤ α ≤ 1, let A=
∞ n n !(1 − α)22 ", 22 . n=1
¯ ¯ A) = min Then d(A) = α and d(2
1+α 2
, 23 α .
440
R. Jin
Example 10.3.29 Let g ∈ N be such that g > 4 and a ∈ [1, g − 1] be such that 2a = g. Let A = {0, a} + g ∗ N. Then d(A) = g2 = α < 21 and d(2 A) = g3 = 23 α. The following example shows the necessity of the condition α < 21 . Example 10.3.30 Let A ⊆ N be defined by A := {0} ∪
∞ 1 n=2
4
n
· 22 ,
1 2n 2n 1 2n . ·2 +n ∪ · 2 ,2 4 2
Then 0 ∈ A, gcd(A) = 1, α = d(A) = 21 , and d(2 A) = 43 = 23 α. But A has neither the structure stated in Part (a) nor the structure stated in Part (b) of Theorem 10.3.27. We first present a nonstandard version of Theorem 10.3.27. Then we prove the nonstandard version. Theorem 10.3.31 Let A ⊆ [0, H ] be internal, 0, H ∈ A, and 0 < α < 1. 2. 3. 4.
1 2
such that
gcd(A[0, x]) = 1 for any hyperfinite x ≤ H , D H (A) ≈ α, for every hyperfinite x ≤ H , Dx (A) α, for every hyperfinite x ≤ H , D2x (2 A) 23 α. Then
a. either there are a, g ∈ N such that a ∈ [1, g − 1], 2a = g, α = g2 , and A ⊆ [0, H ] ∩ ({0, a} + g ∗ ∗ N) b. or there are 0 ≤ u < v ≤ H such that u/H ≈ 0, [u + 1, v − 1] ∩ A = ∅, and Dv,H (A) ≈ 1. Exercise 10.3.32 Prove that Theorem 10.3.31 implies Theorem 10.3.27. Proof of Theorem 10.3.31: If |2 A| < 3|A| − 3, then H + 1 ≤ 2|A| − 1. Hence D H (A) 21 α, which contradicts the condition 2. So we can assume that |2 A| = 3|A| − 3 + b for some b with 0 ≤ b/H ≈ 0. If H + 1 ≤ 2|A| − 1 + 2b, then again we have D H (A) α. Hence by Theorem 10.3.23, we conclude that A is a full subset of a b.p. of difference g. If g > 1, then A ⊆ [0, H ] ∩ ({0, a} + g ∗ ∗ N), which implies Part (a) of the theorem. If d = 1, then A ⊆ [0, u] [v, H ]. If u/H 0, then Du (A) ≈ 1 α, which contradicts the condition 3. Hence u/H ≈ 0 and Dv,H (A) ≈ 1.
References 1. I. Bardaji, D. J. Grynkiewicz, Long arithmetic progressions in small sumsets. Integers 10 (2010) 2. M. Beiglböck, An ultrafilter approach to Jin’s theorem. Isr. J. Math. 185, 369–374 (2011)
10 Density Problems and Freiman’s Inverse Problems
441
3. M. Beiglböck, V. Bergelson, A. Fish, Sunset phenomenon in countable amenable groups. Adv. Math. 223(2), 416–432 (2010) 4. V. Bergelson, H. Furstenberg, B. Weiss, Piecewise-Bohr sets of integers and combinatorial number theory. Topics in Discrete Mathematics (Dedicated to Jarik Nesetril on the occasion of his 60th birthday) (Springer, Heidelberg, 2006), pp. 13–37 5. H. Davenport, On Waring’s problem for cubes. Acta Math. 71, 123–143 (1939) 6. M. Di Nasso, An elementary proof of Jin’s Theorem with a bound. Electron. J. Comb. 21(2) (2014) 7. M. Di Nasso, I. Goldbring, R. Jin, S. Leth, M. Lupini, K. Mahlburg, High density syndeticity of sumsets. Advance in Mathematics 278, 1–33 (2015) 8. G.A. Freiman, Foundations of a structural theory of set addition. Translated from the Russian. Translations of Mathematical Monographs, vol. 37 (American Mathematical Society, Providence, 1973) 9. G.A. Freiman, Inverse additive number theory, XI. Long arithmetic progressions in sets with small sumsets. Acta Arithmetica 137(4), 325–331 (2009) 10. G.A. Freiman, On the detailed structure of sets with small additive property. In: Combinatorial Number Theory and Additive Group Theory. Advanced Courses in Mathematics (CRM Barcelona, Birkhäuser Vergal AG, Basel, 2009), pp. 233-239 11. G.A. Freiman, Structure theory of set addition. II. Results and problems. Paul Erdös and his Mathematics, I (Budapest, 1999), pp. 243-260 (Bolyai Soc. Math. Stud., vol. 11, János Bolyai Math. Soc., Budapest, 2002) 12. H. Furstenberg, Recurrence in Ergodic Theory and Combinatorial Number Theory (Princeton University Press, Princeton, 1981) 13. H. Halberstam, K.F. Roth, Sequences (Oxford University Press, Oxford, 1966) 14. R. Jin, Sumset phenomenon. Proc. Am. Math. Soc. 130(3), 855–861 (2002) 15. R. Jin, Plünnecke’s theorem for other densities. Trans. Am. Math. Soc. 363, 5059–5070 (2011) 16. R. Jin, Density versions of Plünnecke inequality—epsilon-delta approach, in Combinatorial and Additive Number Theory. CANT 2011-2012, Springer Proceedings in Mathematics and Statistics, vol. 101, ed. by M. Natheanson (2014) 17. R. Jin, Detailed structure for Freiman’s 3k − 3 theorem. INTEGERS: J Electron Com Number Theory (To appear) 18. R. Jin, Inverse problem for cuts. Log. Anal. 1(1), 61–89 (2007). http://logicandanalysis.org/ index.php/jla/issue/view/10 19. R. Jin, Freiman’s inverse problem with small doubling property. Adv. Math. 216(2), 711–752 (2007) 20. R. Jin, Solution to the inverse problem for upper asymptotic density. J. für die Reine und Angewandte Mathematik (Crelle’s Journal). 595, 121–166 (2006) 21. R. Jin, Characterizing the structure of A + B when A + B has small upper Banach density. J. Number Theory 130, 1785–1800 (2010) 22. H.J. Keisler, S.C. Leth, Meager sets on the hyperfinite time line. J. Symb. Log. 56, 71–102 (1991) 23. V. Lev, P.Y. Smeliansky, On addition of two distinct sets of integers. Acta Arithmetica 70(1), 85–91 (1995) 24. M.B. Nathanson, Additive Number Theory–Inverse Problems and the Geometry of Sumsets (Springer, New York, 1996) 25. H. Plünnecke, Eine zahlentheoretische Anwendung der Graphentheorie. J. für die Reine und Angewandte Mathematik 234, 171–183 (1970) 26. I.Z. Ruzsa, Sumsets and structure, combinatorial number theory and additive group theory. Advanced Courses in Mathematics (CRM Barcelona, Birkhäuser, 2009), pp. 87–210 27. Y.V. Stanchescu, On addition of two distinct sets of integers. Acta Arithmetica 75(2), 191–194 (1996)
Chapter 11
Hypernatural Numbers as Ultrafilters Mauro Di Nasso
11.1 Introduction Ultrafilters are really peculiar and multifaced mathematical objects, whose study turned out a fascinating and often elusive subject. Researchers may have diverse intuitions about ultrafilters, but they seem to agree on the subtlety of this concept; e.g., read the following quotations: “The space βω is a monster having three heads” (J. van Mill [43]); “… the somewhat esoteric, but fascinating and very useful object βN” (V. Bergelson [5]). The notion of ultrafilter, introduced by H. Cartan [11] in 1937, can be formulated in diverse languages of mathematics: in set theory, ultrafilters are maximal families of sets that are closed under supersets and intersections; in measure theory, they are described as {0, 1}-valued finitely additive measures defined on the family of all subsets of a given space; in algebra, they exactly correspond to maximal ideals in rings of functions F I where I is a set and F is a field. Ultrafilters and the corresponding construction of ultraproduct are a common tool in mathematical logic, but they also have many applications in other fields of mathematics, most notably in topology (the ˇ notion of limit along an ultrafilter, the Stone-Cech compactification β X of a discrete space X , etc.), and in Banach spaces (the so-called ultraproduct technique). In 1975, F. Galvin and S. Glazer found a beautiful ultrafilter proof of Hindman’s theorem, namely the property that for every finite partition of the natural numbers N = C1 ∪ · · · ∪ Cr , there exists an infinite set X and a piece Ci such that all sums of distinct elements from X belong to Ci . Since this time, ultrafilters on N have been successfully used also in combinatorial number theory and in Ramsey theory. The key fact is that the compact space βN of ultrafilters on N can be equipped with a pseudo-sum operation, so that the resulting structure (βN, ⊕) is a compact topological left semigroup. Such a space satisfies really intriguing properties that M. Di Nasso (B) Dipartimento di Matematica, Università di Pisa, Pisa, Italy e-mail: [email protected] © Springer Science+Business Media Dordrecht 2015 P.A. Loeb and M.P.H. Wolff (eds.), Nonstandard Analysis for the Working Mathematician, DOI 10.1007/978-94-017-7327-0_11
443
444
M. Di Nasso
have direct applications in the study of structural properties of sets of integers (See the monograph [29], where the extensive research originated from that approach is surveyed.) Nonstandard analysis and ultrafilters are intimately connected. In one direction, ultrapowers are the basic ingredient for the usual constructions of models of nonstandard analysis since W.A.J. Luxemburg’s lecture notes [41] of 1962. Actually, by a classic result of H.J. Keisler, the models of nonstandard analysis are characterized up to isomorphisms as limit ultrapowers, a class of elementary submodels of ultrapowers which correspond to direct limits of ultrapowers (see [34] and [12, Sect. 4.4]). In the other direction, the idea that elements of a nonstandard extension ∗ X correspond to ultrafilters on X goes back to the golden years of nonstandard analysis, starting from the seminal paper [42] by W.A.J. Luxemburg appeared in 1969. This idea was then systematically pursued by C. Puritz in [45] and by G. Cherlin and J. Hirschfeld in [13]. In those papers, as well as in Puritz’ follow-up [46], new results about the Rudin-Keisler ordering were proved by nonstandard methods, along with new characterizations of special ultrafilters, such as P-points and selective ultrafilters. (See also [44], where the study of similar properties as in Puritz’ papers was continued.) In [7], A. Blass pushed that approach further and provided a comprehensive treatment of ultrafilter properties as reflected by the nonstandard numbers of the associated ultrapowers. Several years later, J. Hirschfeld [30] showed that hypernatural numbers can also be used as a convenient tool to investigate certain Ramsey-like properties. In the last years, a new nonstandard technique based on the use of iterated hyper-extensions has been developed to study partition regularity of equations (see [19, 38]). This paper aims at providing a self-contained introduction to a nonstandard theory of ultrafilters; several examples are also included to illustrate the use of such a theory in applications. For gentle introductions to ultrafilters, see the papers [23, 25, 35]; a comprehensive reference is the monograph [15]. Recent surveys on applications of ultrafilters across mathematics can be found in the book [1]. As for nonstandard analysis, a short but rigorous presentation can be found in [12, Sect. 4.4]; organic expositions covering virtually all aspects of nonstandard methods are provided in the books [1, 17, 37]. We remark that here we adopt the so-called external approach, based on the existence of a star-map ∗ that associates an hyper-extension (or nonstandard extension) ∗A to each object A under study, and satisfies the transfer principle. This is to be confronted to the internal viewpoint as formalized by E. Nelson’s Internal Set Theory IST or by K. Hrbá˘cek’s Nonstandard Set Theories. (See [33] for a thorough treatise of nonstandard set theories.) Let us recall here the saturation property. A family F has the finite intersection property (FIP for short) if A1 ∩· · ·∩ An = ∅ for any choice of finitely many elements Ai ∈ F.
11 Hypernatural Numbers as Ultrafilters
445
Definition 11.1.1 Let κ be an infinite cardinal. A model of nonstandard analysis is κ-saturated if it satisfies the property: • Let F be a family of internal sets with cardinality |F| < κ. If F has the FIP then A∈F A = ∅. When κ-saturation holds, then every infinite internal set A has a cardinality |A| ≥ κ. Indeed, the family of internal sets {A \ {a} | a ∈ A} has the FIP, and has the same cardinality as A. If by contradiction |A| < κ, then by κ-saturation we would obtain A \ {a} = ∅, which is absurd. a∈A With the exceptions of Sects. 11.3 and 11.4, throughout this paper we will work in a fixed c+ -saturated model of nonstandard analysis, where c is the cardinality of the continuum. (We recall that κ+ denotes the successor cardinal of κ. So, κ+ -saturation applies to families |F| ≤ κ.) In consequence, our hypernatural numbers will have cardinality |∗ N| ≥ c+ .
11.2 The u-equivalence There is a canonical way of associating an ultrafilter on N to each hypernatural number. Definition 11.2.1 The ultrafilter generated by a hypernatural number α ∈ ∗ N is the family Uα = {X ⊆ N | α ∈ ∗X }. It is easily verified that Uα actually satisfies the properties of ultrafilter. Notice that Uα is principal if and only if α ∈ N is finite. Definition 11.2.2 We say that α, β ∈ ∗ N are u-equivalent, and write α ∼ u β, if they generate the same ultrafilter, i.e. Uα = Uβ . The equivalence classes u(α) = {β | β∼ u α} are called u-monads. Notice that α and β are u-equivalent if and only if they cannot be separated by any hyper-extension, i.e. if α ∈ ∗X ⇔ β ∈ ∗X for every X ⊆ N. In consequence, the equivalence classes u(α) are characterized as follows: u(α) =
{∗X | X ∈ Uα }.
(The notion of filter monad μ(F) = {∗ F | F ∈ F} of a filter F was first introduced by W.A.J. Luxemburg in [42].) For every ultrafilter U on N, the family {∗X | X ∈ U} is a family of cardinality c with the FIP and so, by c+ -saturation, there exist hypernatural numbers α ∈ ∗ N such that Uα = U. (Actually, the c+ -enlargement property suffices: see Definition 11.4.1.) Therefore, βN = {Uα | α ∈ ∗ N}.
446
M. Di Nasso
Thus one can identify βN with the quotient set ∗ N/∼u of the u-monads. ∗ Example 11.2.3 Let f : N → R be bounded. If α ∼ u β are u-equivalent then f (α) ≈ are at infinitesimal distance. To see this, for every real number r ∈ R consider the set
∗ f (β)
(r ) = {n ∈ N | f (n) < r }. Then, by the hypothesis, one has α ∈ ∗ (r ) ⇔ β ∈ ∗ (r ), i.e. ∗ f (α) < r ⇔ ∗ f (β) < r . As this holds for all r ∈ R, it follows that the bounded hyperreal numbers ∗ f (α) ≈ ∗ f (β) are infinitely close. (This example was suggested to the author by E. Gordon.) Proposition 11.2.4
∗f
(u(α)) = u (∗ f (α)). Indeed:
∗ ∗ f (β). 1. If α ∼ u β then f (α) ∼ u ∗ 2. If f (α) ∼u γ then γ = ∗ f (β) for some β ∼u α.
Proof (1) For every A ⊆ N, one has the following chain of equivalences: ∗
f (α) ∈ ∗A ⇔ α ∈ ∗ {n | f (n) ∈ A} ⇔
⇔ β ∈ ∗ {n | f (n) ∈ A} ⇔
∗
f (β) ∈ ∗A.
(2) For every A ⊆ N, α ∈ ∗A ⇒ ∗ f (α) ∈ ∗ ( f (A)) ⇔ γ ∈ ∗ ( f (A)), i.e. γ = ∗ f (β) for some β ∈ ∗A. But then the family of internal sets {∗ f −1 (γ) ∩ ∗A | α ∈ ∗A} has the finite intersection property. By c+ -saturation, there exists an element β in the intersection of that family. Clearly, ∗ f (β) = γ and β ∼u α. Before starting to develop our nonstandard theory, let us consider a well-known combinatorial property which constitutes a fundamental preliminary step in the theory of ultrafilters. The proof given below consists of two steps: we first show a finite version of the desired property, and then use a non-principal ultrafilter to obtain the global version. Although the result is well-known, this particular argument seems to be new in the literature. Lemma 11.2.5 Let f : N → N be such that f (n) = n for all n. Then there exists a 3-coloring χ : N → {1, 2, 3} such that χ(n) = χ( f (n)) for all n. Proof We begin by showing the following “finite approximation” to the desired result: For every finite F ⊂ N there exists χ F : F → {1, 2, 3} such that χ F (x) = χ F ( f (x)) whenever both x and f (x) belong to F.
11 Hypernatural Numbers as Ultrafilters
447
We proceed by induction on the cardinality of F. The basis is trivial, because if |F| = 1 then it is never the case that both x, f (x) ∈ F. For the inductive step, notice that by the pigeonhole principle there must be at least one element x ∈ F which is the image under f of at most one element in F, i.e. |{y ∈ F | f (y) = x}| ≤ 1. Now let F = F \ {x} and let χ : F → {1, 2, 3} be a 3-coloring as given by the inductive hypothesis. We want to extend χ to a 3-coloring χ of F. To this end, define χ(x) in such a way that χ(x) = χ ( f (x)) if f (x) ∈ F, and χ(x) = χ (y) if f (y) = x. This is always possible because there is at most one such element y, and because we have 3 colors at disposal. We now have to glue together the finite 3-colorings so as to obtain a 3-coloring of the whole set N. (Of course, this cannot be done directly, because two 3-colorings do not necessarily agree on the intersection of their domains.) One possible way is the following. For every n ∈ N, fix a 3-coloring χn : {1, . . . , n} → {1, 2, 3} such that χn (x) = χn ( f (x)) whenever both x, f (x) ∈ {1, . . . , n}. Then pick any non-principal ultrafilter U on N and define the map χ : N → {1, 2, 3} by putting χ(k) = i ⇐⇒ i (k) = {n ≥ k | χn (k) = i} ∈ U. The definition is well-posed because for every k the disjoint union 1 (k) ∪ 2 (k) ∪ 3 (k) = {n ∈ N | n ≥ k} ∈ U, and so exactly one set i (k) belongs to U. The function χ is the desired 3-coloring. In fact, if by contradiction χ(k) = χ( f (k)) = i for some k, then we could pick n ∈ i (k) ∩ i ( f (k)) ∈ U and have χn (k) = χn ( f (k)), against the hypothesis on χn . (The same argument could be used to extend this lemma to functions f : I → I over arbitrary infinite sets I .) Remark 11.2.6 The second part of the above proof could also be easily carried out by using nonstandard methods. Indeed, by saturation one can pick a hyperfinite set H ⊂ ∗ N containing all (finite) natural numbers. By transfer from the “finite approximation” result proved above, there exists an internal 3-coloring : H → {1, 2, 3} such that (ξ) = (∗ f (ξ)) whenever both ξ, ∗ f (ξ) ∈ H . Then the restriction χ = N : N → {1, 2, 3} gives the desired 3-coloring. As a corollary, we obtain the Theorem 11.2.7 Let f : N → N and α ∈ ∗ N. If ∗ f (α) ∼u α then ∗ f (α) = α. Proof If ∗ f (α) = α, then α ∈ ∗A where A = {n | f (n) = n}. Pick any function g : N → N that agrees with f on A and such that g(n) = n for all n ∈ N. Since α ∈ ∗A ⊆ ∗ {n | g(n) = f (n)}, we have that ∗g(α) = ∗ f (α). Apply the previous theorem to g and pick a 3-coloring χ : N → {1, 2, 3} such that χ(n) = χ(g(n)) for all n. Then ∗ χ(∗ f (α)) = ∗ χ(∗g(α)) = ∗ χ(α). Now let X = {n ∈ N | χ(n) = i} / ∗X , and hence ∗ f (α) u α. where i = ∗ χ(α). Clearly, α ∈ ∗X but ∗ f (α) ∈ Two important properties of u-equivalence are the following.
448
M. Di Nasso
Proposition 11.2.8 Let α ∈ ∗A, and let f be 1–1 when restricted to A. Then 1. There exists a bijection ϕ such that ∗ f (α) = ∗ϕ(α); 2. For every g : N → N, ∗ f (α) ∼u ∗g(α) ⇒ ∗ f (α) = ∗g(α). Proof (1) We can assume that α ∈ ∗ N \ N infinite, as otherwise the thesis is trivial. Then α ∈ ∗A implies that A is infinite, and so we can partition A = B ∪ C into two disjoint infinite sets B and C where, say, α ∈ ∗ B. Since f is 1–1, we can pick a bijection ϕ that agrees with f on B, so that ∗ϕ(α) = ∗ f (α) as desired. (2) By the previous point, ∗ f (α) = ∗ϕ(α) for some bijection ϕ. Then ∗
g(α) ∼u ∗ϕ(α) ⇒ ∗ϕ−1 (∗g(α)) ∼u ∗ϕ−1 (∗ϕ(α)) = α ⇒ ∗ϕ−1 (∗g(α)) = α,
and hence ∗g(α) = ∗ϕ(α) = ∗ f (α). We remark that property (2) of the above proposition does not hold if we drop the hypothesis that f is 1–1. (In Sect. 11.3 we shall address the question of the existence of infinite points α ∈ ∗ N with the property that ∗ f (α) ∼u ∗g(α) ⇒ ∗ f (α) = ∗g(α) for all f, g : N → N.) Proposition 11.2.9 If ∗ f (α) ∼u β and ∗g(β) ∼u α for suitable f and g, then ∗ϕ(α) ∼u β for some bijection ϕ. Proof By the hypotheses, ∗g(∗ f (α)) ∼u ∗g(β) ∼u α and so ∗g(∗ f (α)) = α. If A = {n | g( f (n)) = n}, then α ∈ ∗A and f is 1–1 on A. By the previous proposition, there exists a bijection ϕ such that ∗ f (α) = ∗ϕ(α), and hence ∗ϕ(α) ∼u β. We recall that the image of an ultrafilter U under a function f : N → N is the ultrafilter f (U) = {A ⊆ N | f −1 (A) ∈ U}. Notice that if f ≡U g, i.e. if {n | f (n) = g(n)} ∈ U, then f (U) = g(U). Proposition 11.2.10 For every f : N → N and α ∈ ∗ N, the image ultrafilter f (Uα ) = U∗ f (α) . Proof For every A ⊆ N, one has the chain of equivalences: A ∈ U∗ f (α) ⇔
∗
f (α) ∈ ∗A ⇔ α ∈ ∗ ( f −1 (A)) ⇔
⇔ f −1 (A) ∈ Uα ⇔ A ∈ f (Uα ). Let us now show how the above results about u-equivalence are just reformulation in a nonstandard context of fundamental properties of ultrafilter theory.
11 Hypernatural Numbers as Ultrafilters
449
Theorem 11.2.11 Let f : N → N and let U be an ultrafilter on N. If f (U) = U then {n | f (n) = n} ∈ U. Proof Let α ∈ ∗ N be such that U = Uα . By the hypothesis, Uα = f (Uα ) = U ∗ f (α) , ∗ f (α) and so, by the previous theorem, ∗ f (α) = α. But then {n | f (n) = i.e. α ∼ u n} ∈ U because α ∈ ∗ {n | f (n) = n}. Recall the Rudin-Keisler pre-ordering ≤ R K on ultrafilters: V ≤ R K U ⇐⇒ f (U) = V for some function f. In this case, we say that V is Rudin-Keisler below U (or U is Rudin-Keisler above V). It is readily verified that g( f (U)) = (g ◦ f )(U), so ≤ R K satisfies the transitivity property, and ≤ R K is actually a pre-ordering. Notice that Uα ≤ R K Uβ means that ∗ f (β) ∼ α for some function f . u Proposition 11.2.12 U ≤ R K V and V ≤ R K U if and only if U ∼ = V are isomorphic, i.e. there exists a bijection ϕ : N → N such that ϕ(U) = V. Proof Let U = Uα and V = Uβ . If U ≤ R K V and V ≤ R K U, then there exist functions f, g : N → N such that ∗ f (α) ∼u β and ∗g(β) ∼u α. But then, by Proposition 11.2.9, there exists a bijection ϕ : N → N such that ∗ϕ(α) ∼u β, and hence ϕ(U) = U∗ϕ(α) = Uβ = V, as desired. The other implication is trivial. We close this section by showing that all infinite numbers α have “large” and “spaced” u-monads, in the sense that u(α) is both a left and a right unbounded subset of the infinite numbers ∗ N \ N, and that different elements of u(α) are placed at infinite distance. (The property of c+ -saturation is essential here.) Theorem 11.2.13 ([45, 46]) Let α ∈ ∗ N \ N be infinite. Then: 1. For every ξ ∈ ∗ N, there exists an internal 1–1 map ϕ : ∗ N → u(α) ∩ (ξ, +∞). In consequence, the set u(α) ∩ (ξ, +∞) contains |∗ N|-many elements and it is unbounded in ∗ N. 2. For every infinite ξ ∈ ∗ N \ N, the set u(α) ∩ [0, ξ) contains at least c+ -many elements. In consequence, u(α) is unbounded leftward in ∗ N \ N. 3. If α ∼u β and α = β, then the distance |α − β| ∈ ∗ N \ N is infinite. Proof (1) Since α is infinite, every X ∈ Uα is an infinite set and so for each k ∈ N there exists a 1–1 function f : N → X ∩ (k, +∞). By transfer, for every ξ ∈ ∗ N the following internal set is non-empty: (X ) = { ϕ : ∗ N → ∗X ∩ (ξ, +∞) | ϕ internal 1–1 }. Notice that (X 1 ) ∩ · · · ∩ (X n ) = (X 1 ∩ · · · ∩ X n ), and hence the family + { (X ) | X ∈ Uα } has the finite intersection property. By c -saturation, we can pick ϕ ∈ X ∈Uα (X ). Clearly, range(ϕ) is an internal subset of u(α)∩(ξ, +∞) with the same cardinality as ∗ N. Since range(ϕ) is internal and hyperinfinite, it is necessarily unbounded in ∗ N.
450
M. Di Nasso
(2) For any given ξ ∈ ∗ N \ N, the family {∗X ∩ [0, ξ) | X ∈ Uα } is closed under finite intersections, and all its elements are non-empty. So, by c+ -saturation, there exists ∗ X ∩ [0, ξ). ζ∈ X ∈Uα
Clearly ζ ∈ u(α) ∩ [0, ξ), and this shows that u(α) is unbounded leftward in \ N. Now fix ξ infinite. By what we have just proved, the family of open intervals G = {(k, ζ) | k ∈ N and ζ ∈ u(α) ∩ [0, ξ)}
∗N
has empty intersection. Since G satisfies the finite intersection property, and c+ -saturation holds, it must be |G| ≥ c+ , and hence also |u(α) ∩ [0, ξ)| ≥ c+ . (3) For every n ≥ 2, let kn be the remainder of the Euclidean division of α by n, and consider the set X n = {x · n + kn | x ∈ N}. Then α ∈ ∗X n and α ∼u β implies that also β ∈ ∗X n , so α − β is a multiple of n. Since β = α, it must be |α − β| ≥ n. As this is true for all n ≥ 2, we conclude that α and β have infinite distance.
11.3 Hausdorff S-topologies and Hausdorff Ultrafilters It is natural to ask about properties of the ultrafilter map: U : ∗ N → βN where U : α → Uα . We already noticed that if one assumes c+ -saturation then U is onto βN, i.e. every ultrafilter on N is of the form Uα for a suitable α ∈ ∗ N. However, in this section no saturation property will be assumed. As a first (negative) result, let us show that the ultrafilter map is never a bijection. Proposition 11.3.1 In any model of nonstandard analysis, if the ultrafilter map U : βN is onto then, for every non-principal U ∈ βN, the set {α ∈ ∗ N | Uα = U} contains at least c-many elements.
∗N
Proof Given a non-principal ultrafilter U on N, for X ∈ U and k ∈ N let (X, k) = {F ∈ Fin(N) | F ⊂ X & |F| ≥ k} , where we denoted by Fin(N) = {F ⊂ N | F is finite}. Notice that the family of sets F = {(X, k) | X ∈ U , k ∈ N} has the finite intersection property. Indeed, (X 1 , k1 ) ∩ · · · ∩ (X m , km ) = (X, k) where X = X 1 ∩ · · · ∩ X m ∈ U and k = max{k1 , . . . , km }; and every set (X, k) = ∅ since all X ∈ U are infinite. Now fix a bijection : Fin(N) → N, and let (X, k) = {(F) | F ∈ (X, k)}.
11 Hypernatural Numbers as Ultrafilters
451
Then also the family {(X, k) | X ∈ U , k ∈ N} ⊆ P(N) has the FIP, and so we can extend it to an ultrafilter V on N. By the hypothesis the ultrafilter map there exists on ∗ (X, k), and so β = ∗ ((G)) = V; in particular, β ∈ β ∈ ∗ N such that U β X,k for a suitable G ∈ X,k ∗ (X, k). Then G ⊆ ∗X for all X ∈ U, and hence Uγ = U for all γ ∈ G. Moreover, |G| ≥ k for all k ∈ N, and so G is an infinite internal set. Finally, we use the following general fact: “Every infinite internal set has at least the cardinality of the continuum”. To prove this last property, notice that if A is infinite and internal then there exists a (internal) 1–1 map f : {1, . . . , ν} → A for some infinite ν ∈ ∗ N \ N. Now, consider the unit real interval [0, 1] and define : [0, 1] → {1, . . . , ν} by putting (r ) = min{1 ≤ i ≤ ν | r ≤ i/ν}. The map is 1–1 because (r ) = (r ) ⇒ |r − r | ≤ 1/ν ≈ 0 ⇒ r = r , and so we conclude that c = |[0, 1]| ≤ |{1, . . . , ν}| ≤ |A|, as desired. (When c+ -saturation holds, then |{α ∈ ∗ N | Uα = U}| ≥ c+ by Theorem 11.2.13.) We now show that the ultrafilter map is tied up with a topology that is naturally considered in a nonstandard setting. (The notion of S-topology was introduced by A. Robinson himself, the “inventor” of nonstandard analysis.) Definition 11.3.2 For every set X , the S-topology on ∗X is the topology having the family {∗A | A ⊆ X } as a basis of open sets. The capital letter “S” stands for “standard”, and in fact hyper-extensions ∗A are often called standard sets in the literature of nonstandard analysis. The adjective “standard” originated from the distinction between a standard universe and a nonstandard universe, according to the most used approaches to nonstandard analysis. However, such a distinction is not needed, and indeed one can adopt a foundational framework where there is a single mathematical universe, and take hyper-extensions of any object under study (see, e.g., [2]). Every basic open set ∗A is also closed because ∗X \ ∗A = ∗ (X \ A), and so the S-topologies are totally disconnected. A first relationship between S-topology and ultrafilter map is the following. Proposition 11.3.3 The S-topology on ∗ N is compact if and only if the ultrafilter map U : ∗ N → βN is onto. Proof According to one of the equivalent definitions of compactness, the S-topology is compact if and only ifevery non-empty family C of closed sets with the FIP has non-empty intersection C∈C C = ∅. Without loss of generality, one can assume that C is a family of hyper-extensions. Notice that C = {∗Ai | i ∈ I } has the FIP if and only if C = {Ai | i ∈ I } ⊂ P(N) has the FIP, and so we can extend C to an ultrafilter V on N. If the ultrafilter map is onto βN, then V = Uα for a suitable α, and therefore α ∈ i∈I ∗Ai = ∅. Conversely, if U is an ultrafilter on N, then C = {∗X | X ∈ U} is a family of closed sets with the FIP. If α is any element in the intersection of C, then Uα = U.
452
M. Di Nasso
In consequence of the above proposition, the S-topology on ∗ N is compact when the c+ -saturation property holds. More generally, κ-saturation implies that the Stopology is compact on every hyper-extension ∗X where 2|X | < κ. (Actually, the κ-enlarging property suffices: see Definition 11.4.1.) A natural question that one may ask is whether the S-topologies are Hausdorff or not. This depends on the considered model, and giving a complete answer turns out to be a difficult issue involving deep set-theoretic matters, which will be briefly discussed below. So, it is not surprising that this simple question was not addressed explicitly in the early literature in nonstandard analysis, despite the fact that the S-topology was a common object of study. As a first remark, notice that having a Hausdorff S-topology on ∗X is preserved when passing to lower cardinalities. Proposition 11.3.4 If the S-topology is Hausdorff on ∗X and |Y | ≤ |X |, then the S-topology is Hausdorff on ∗ Y as well. Proof Fix a 1–1 map f : Y → X . Given ξ = η in ∗ Y , consider ∗ f (ξ) = ∗ f (η) in By the hypothesis, we can pick disjoint sets A, B ⊆ X with ∗ f (ξ) ∈ ∗A and ∗ f (η) ∈ ∗ B. Then C = f −1 (A) and D = f −1 (B) are disjoint subsets of Y such that ξ ∈ ∗ C and η ∈ ∗ D. ∗X .
Recall a notion that was introduced in [31]: A model of nonstandard analysis is κ-constrained if the following property holds: ∀X ∀ξ ∈ ∗X ∃A ⊆ X such that |A| ≤ κ and ξ ∈ ∗A. We remark that any ultrapower model of nonstandard analysis constructed by means of an ultrafilter over a set of cardinality κ is κ-constrained. In the countable case, the notion of constrained already appeared at the beginnings of nonstandard analysis, under the name of σ-quasi standardness (see W.A.J. Luxemburg’s lecture notes [41]). The existence of nonstandard universes which are not κ-constrained for any κ is problematic and appears to be closely related to the existence large cardinals. The reader interested in this foundational issue is referred to [31]. Proposition 11.3.5 Assume that our model of nonstandard analysis is κ-constrained. Then the S-topology is Hausdorff on ∗ κ if and only if the S-topology is Hausdorff on every hyper-extension ∗X . Proof Let ξ, η ∈ ∗X be given. By the property of κ-constrained, we can pick sets A, B with ξ ∈ ∗A, η ∈ ∗ B and |A|, |B| ≤ κ. Since |A ∪ B| ≤ κ, by the previous Proposition 11.3.4, the S-topology on ∗ (A ∪ B) is Hausdorff. Pick disjoint subsets C, D ⊆ A ∪ B with ξ ∈ ∗ C and η ∈ ∗ D. Then ξ ∈ ∗ (C ∩ X ) and η ∈ ∗ (D ∩ X ) where C ∩ X and D ∩ X are disjoint subsets of X . So, in any ultrapower model of nonstandard analysis determined by an ultrafilter on N, if the S-topology is Hausdorff on ∗ N then it is Hausdorff on all hyper-extensions ∗X .
11 Hypernatural Numbers as Ultrafilters
453
Proposition 11.3.6 The S-topology on ∗X is Hausdorff if and only if the ultrafilter map U : ∗X → β X is 1–1. Proof By definition, the S-topology on ∗X is Hausdorff if and only for every pair of elements ξ = η in ∗X there exist basic open sets ∗A, ∗ B ⊆ ∗X such that ξ ∈ ∗A, η ∈ ∗ B and ∗A ∩ ∗ B = ∅. But this means that A ∈ Uξ and B ∈ Uη for suitable disjoint sets A ∩ B = ∅. We reach the thesis by noticing that this last property holds if and only if the ultrafilters Uξ and Uη are different. Hausdorff S-topologies are tied up with special ultrafilters. Proposition 11.3.7 Let U be a non-principal ultrafilter on the set I . Then the following are equivalent: 1. In the ultrapower model of nonstandard analysis determined by U, the S-topology on ∗ I is Hausdorff. 2. In any model of nonstandard analysis, if U = Uα is generated by a point α ∈ ∗ I , then: ∗ f (α) ∼u ∗g(α) =⇒ ∗ f (α) = ∗g(α). 3. For every f, g : I → I , f (U) = g(U) =⇒ f ≡U g, i.e. {i ∈ I | f (i) = g(i)} ∈ U. Proof (1) ⇔ (3). Notice first that if ξ = [ f ]U is the element of ∗ I given by the U-equivalence class of the function f : I → I , then f (U) = Uξ . Indeed, for every A ⊆ I , one has A ∈ f (U) ⇔ f −1 (A) ∈ U ⇔ {i ∈ I | f (i) ∈ A} ∈ U ⇔ ξ = [ f ]U ∈ ∗A ⇔ A ∈ Uξ . Now let ξ = [ f ]U and η = [g]U be arbitrary elements of ∗ I = I I /U. By definition, ξ = η ⇔ f ≡ g; besides, by what just seen above, U f (U) = g(U) ⇔ Uξ = Uη . Now, by the previous proposition the S-topology on I is Hausdorff if and only if the ultrafilter map on I is 1–1, and hence the thesis follows. (2) ⇔ (3). Notice that ∗ f (α) = ∗g(α) ⇔ α ∈ ∗ {i ∈ I | f (i) = g(i)} ⇔ f ≡Uα g. The thesis follows by recalling that U∗ f (α) = f (Uα ) and U∗g(α) = g(Uα ). Because of the above equivalences, non-principal ultrafilters that satisfy property (3), were named Hausdorff in [21]. To the author’s knowledge, the problem of existence of such ultrafilters was first explicitly considered by A. Connes in his paper [16] of 1970, where he needed special ultrafilters U with the property that the maps [ϕ]U → ϕ(U) defined on ultrapowers K N /U (K a field) be injective into βN. He noticed that such a property was satisfied by selective ultrafilters, introduced three years before by G. Choquet [14] under the name of ultrafiltres absolus. In consequence, Hausdorff ultrafilters are consistent. Indeed, selective ultrafilters exist under the continuum hypothesis (this was already proved by G. Choquet [14] in 1968). However, we remark that the existence of selective ultrafilters cannot be proved in ZFC, as first shown by K. Kunen [36]. Independently, in their 1972 paper [13], G. Cherlin and J. Hirschfeld proved that non-principal ultrafilters exist which are not
454
M. Di Nasso
Hausdorff, and asked whether Hausdorff ultrafilters exist at all in ZFC. It is worth remarking that this problem is still open to this day (see [3, 21]). We close this section by mentioning another result, proved in [20], that connects Hausdorff ultrafilters and nonstandard analysis Theorem 11.3.8 Assume that N ⊂ βN is a set of ultrafilters on N such that • N N, i.e. N properly contains all principal ultrafilters; • Every non-principal U ∈ N is Hausdorff; • N is RK-downward closed, i.e. U ∈ N implies that f (U) ∈ N for every f : N → N; • N is “strongly” RK-filtered in the following sense: For every U, V ∈ N there exist W ∈ N and f, g : N → N such that f (W) = U and g(W) = V. Then N is a set of hypernatural numbers of nonstandard analysis where: • For A ⊆ Nk , the hyper-extension ∗A ⊆ Nk is defined by letting for every U ∈ N and for every f 1 , . . . , f k : N → N: ( f 1 (U), . . . , f k (U)) ∈ ∗A ⇐⇒ {n ∈ N | ( f 1 (n), . . . , f k (n)) ∈ A} ∈ U • For F : Nk → N, the hyper-extension ∗F : Nk → N is defined by letting for every U ∈ N and for every f 1 , . . . , f k : N → N: ∗
F( f 1 (U), . . . , f k (U)) = (F ◦ ( f 1 , . . . , f k ))(U).
(F ◦ ( f 1 , . . . , f k ) : N → N is the function n → F( f 1 (n), . . . , f k (n)).)
11.4 Regular and Good Ultrafilters A fundamental notion used in the theory of ultrafilters is that of regularity. We recall that an ultrafilter U on an infinite set I is called regular if there exists a family {Ai | i ∈ I } ⊆ U such that i∈I0 Ai = ∅ for every infinite I0 ⊆ I . When I is countable, it is easily seen that U is regular if and only if it is non-principal, but in general regularity is a stronger condition. A simple nonstandard characterization holds. Recall the following weakened version of saturation, where only families of hyperextensions are considered (compare with Definition 11.1.1.) Definition 11.4.1 Let κ be an infinite cardinal. A model of nonstandard analysis is a κ-enlargement if it satisfies the property: • Let G ∗be a family of sets with cardinality |G| < κ. If G has the FIP, then B∈G B = ∅
11 Hypernatural Numbers as Ultrafilters
455
Proposition 11.4.2 Let U be an ultrafilter over the infinite set I . Then the following are equivalent: 1. U is regular. 2. In the ultrapower model of nonstandard analysis determined by U, the |I |+ enlarging property holds. Proof (1) ⇒ (2). Pick a family {C x | x ∈ I } ⊆ U such that x∈ C x = ∅ whenever ⊆ I is infinite. Given a family {A x | x ∈ I } with the FIP, for every y ∈ I pick an element ϕ(y) ∈ y∈C x A x (this is possible because {x | y ∈ C x } is finite). If ξ = [ϕ]U is the element given by the U-equivalence class of the function ϕ, then ξ ∈ ∗A x for every x ∈ I , since {y ∈ I | ϕ(y) ∈ A x } ⊇ {y ∈ I | y ∈ C x } = C x ∈ U. (2) ⇒ (1). Let Fin(I ) be the set of finite parts of I , and for every x ∈ I , let x = {a ∈ Fin(I ) | x ∈ a}. The family{ x | x ∈ I } satisfies the FIP and so, by the κ+ x . Now pick a function ϕ : I → Fin(I ) enlarging property, there exists A ∈ x∈I ∗ such that A = [ϕ]U is the U-equivalence class of ϕ. Then for every x ∈ I , one has x if and only if that A ∈ ∗ x } = {y ∈ I | x ∈ ϕ(y)} ∈ U. C x = {y ∈ I | ϕ(y) ∈ ⊆ U witnesses the regularity of U because for every infinite The family {C x | x ∈ I } ⊆ I , the intersection x∈ C x = {y ∈ I | ⊆ ϕ(y)} = ∅. One can also easily characterize points that generate regular ultrafilters. Theorem 11.4.3 Given any model of nonstandard analysis, let α ∈ ∗ I . Then the following are equivalent: 1. The ultrafilter Uα on I generated by α is regular; 2. There exists a function ϕ : I → Fin(I ) such that ∗ x ∈ ∗ϕ(α) for all x ∈ I . Proof (1) ⇒ (2). Given a family {C x | x ∈ I } that witnesses the regularity of Uα , let ϕ : I → Fin(I ) be the function defined by setting ϕ(x) = {y ∈ I | x ∈ C y }. If ϑ is the function y → C y , then we obtain the thesis by the following equivalences, that hold for any x ∈ I : C x ∈ Uα ⇔ α ∈ ∗ (C x ) = ∗ ϑ(∗ x) ⇔
∗
x ∈ {ξ ∈ ∗ I | α ∈ ∗ ϑ(ξ)} = ∗ ϕ(α).
(2) ⇒ (1). Pick ϕ as in the hypothesis, and let C x = {y ∈ I | x ∈ ϕ(y)}. Notice that ∗ x ∈ ∗ϕ(α) ⇔ α ∈ ∗ (C x )⇔ C x ∈ Uα . Moreover, for X ⊆ I , whenever it is possible to pick an element y ∈ x∈X C x , one has that X ⊆ ϕ(y), and so X is finite.
456
M. Di Nasso
As a corollary of the above characterizations of regularity, we can now give a nonstandard proof of a limiting result about the existence of Hausdorff ultrafilters. We recall that the ultrafilter number u denotes the minimum cardinality of any (non-principal) ultrafilter base on N. (A family B is an ultrafilter base on N if {A ⊆ N | ∃B ∈ B, B ⊆ A} is an ultrafilter.) It is not hard to show that ℵ0 < u ≤ c (see [8, Sect. 9]). Theorem 11.4.4 ([21]) If U is a regular ultrafilter on a cardinal κ ≥ u, then U is not Hausdorff. Proof By contradiction, pick U a regular Hausdorff ultrafilter on κ, and consider the corresponding ultrapower model of nonstandard analysis. Then the S-topology on ∗ κ, and hence on ∗ N, is Hausdorff. Moreover, by the characterization of Proposition 11.4.2, the κ+ -enlarging property holds. Now pick G ⊂ P(N) a non-principal ultrafilter base of cardinality u, and for every A ∈ G, let (A) = {X ⊆ A | X is infinite}. Clearly the family {(A) | A ∈ G} has the FIP, and so by the enlarging property there exists H ∈ A∈G ∗ (A). Then H ⊂ ∗ N is hyperinfinite and Uξ = V for every ξ ∈ H , where V is the ultrafilter on N generated by G. This shows that the ultrafilter map U : ∗ N → βN is not 1–1, and hence the S-topology on ∗ N is not Hausdorff, a contradiction. We recall that an ultrafilter U is countably incomplete if it is not closed under countable intersections, i.e. if there exists a countable family {An } of elements of U / U (equivalently, one may ask for that intersection to be empty). such that n An ∈ We remark that an ultrapower modulo U determines a model of nonstandard analysis if and only if U is countably incomplete, as otherwise one would have ∗ N = N. Indeed, if ξ = [ f ]U is an infinite element in the ultrapower ∗ N = N I /U, then for every n ∈ N, the set n = {i ∈ N | f (i) = n} ∈ U but the countable intersection n n = ∅. An ultrafilter U on an infinite set I is good if for every antimonotonic function f : Fin(I ) → U there exists an antiadditive g : Fin(I ) → U such that g(a) ⊆ f (a) for all a ∈ Fin(I ). f is called antimonotonic if f (a) ⊇ f (b) whenever a ⊆ b; and g is called antiadditive if g(a ∪ b) = g(a) ∩ g(b). (See, e.g., [12, Sect. 6.1].) Theorem 11.4.5 Let U be a countably incomplete ultrafilter over the infinite set I . Then the following are equivalent: 1. U is good. 2. In the ultrapower model of nonstandard analysis determined by U, the κ+ saturation property holds, where κ = |I |. Proof (1) ⇒ (2). Let {A x | x ∈ I } be a family of internal subsets of an hyperextension ∗ Y with the FIP. For every x ∈ I , pick a function ϕx : I → P(Y ) such that A x = [ϕx ]U . By countable incompleteness, we can fix a sequence of sets I1 ⊇ I2 ⊇ · · · ⊇ In ⊇ In+1 ⊇ . . .
11 Hypernatural Numbers as Ultrafilters
457
such that In ∈ U for all n and n In = ∅. Given a ∈ Fin(I ) of cardinality n, define f (a) = {i ∈ In | x∈a ϕx (i) = ∅}. By the hypothesis, the finite intersection A x∈a x = ∅, and so f (a) ∈ U. Since f is antimonotonic, by the goodness property of U, we can pick an antiadditive g : Fin(I ) → U such that g(a) ⊆ f (a) for all a ∈ Fin(I ). For j ∈ I , let j = {x | j ∈ g({x})}. If there exist distinct elements y1 , . . . , yk ∈ j , then j ∈ g({y1 }) ∩ · · · ∩ g({yk }) = g({y1 , . . . , yk }) ⊆ f ({y1 , . . . , yk }) ⊆ Ik . By recalling that k Ik = ∅, we can conclude that every j must be finite. Notice that j ∈ x∈ j g({x}) = g( j ) ⊆ f ( j ) and so we can pick an element h( j) ∈ x∈ j ϕx ( j) = ∅. In this way, for every x ∈ I , we have j ∈ g({x}) ⇔ x ∈ j ⇒ h( j) ∈ ϕx ( j). If ξ = [h]U ∈ ∗ Y is the element in the ultrapower model that corresponds to the function h : I → Y , then ξ ∈ x∈I A x , because for every x one has the inclusion { j ∈ I | h( j) ∈ ϕx ( j)} ⊇ g({x}) ∈ U. (2) ⇒ (1). Let f : Fin(I ) → U be antimonotonic. For a ∈ Fin(I ), let G a : I → P(Fin(I )) be the function where G a (i) = {b ⊇ a | i ∈ f (b)}. Notice that G a (i) ∩ G b (i) = G a∪b (i). For every x ∈ I , A x = [G {x} ]U is aninternal | x ∈ I } has the FIP because ∗ a ∈ x∈a A x subset of ∗ Fin(I ), and the family {A x for every a ∈ Fin(I ). Indeed, a ∈ x∈a G {x} (i) = G a (i) ⇔ i ∈ f (a), and → Fin(I ) such that the f (a) ∈ U. By κ+ -saturation, we can pick a function ϑ : I corresponding element in the ultrapower model ξ = [ϑ]U ∈ x∈I A x . Finally, define g(a) = {i ∈ I | ϑ(i) ∈ G a (i)}. Clearly, g(a) ∈ U because ξ ∈ x∈a A x . Moreover, i ∈ g(a) ⇔ ϑ(i) ∈ G a (i) ⇔ ϑ(i) ⊇ a and i ∈ f (ϑ(i)), and so i ∈ f (ϑ(i)) ⊆ f (a). This shows that g(a) ⊆ f (a). Besides, g is antiadditive because i ∈ g(a) ∩ g(b) ⇔ ϑ(i) ∈ G a (i) and ϑ(i) ∈ G b (i) ⇔ ϑ(i) ∈ G a∪b (i) ⇔ i ∈ g(a ∪ b). Points that generate good ultrafilters are characterized as follows. Theorem 11.4.6 Given a model of nonstandard analysis, let α ∈ ∗ I be such that Uα is countably incomplete. Then the following are equivalent: 1. Uα is a good ultrafilter; 2. If F : I → P(Fin(I )) has the property that for every finite a ⊂ I there exists A ∈ ∗ F(α) with ∗ a ⊆ A, then there exists a function ϑ : I → Fin(I ) such that ∗ x ∈ ∗ ϑ(α) ∈ ∗ F(α) for all x ∈ I . Proof The proof is similar to the one of the previous proposition. (1) ⇒ (2). Fix a sequence I1 ⊇ I2 ⊇ · · · ⊇ In ⊇ In+1 ⊇ . . . of sets in Uα such that n In = ∅. Now let F satisfy the conditions in (2). For every finite a = {x1 , . . . , xn } ⊂ I with n-many elements, put: f (a) = {i ∈ In | ∃b ∈ F(i) such that a ⊆ b}. By the hypothesis, there exists A ∈ ∗ F(α) with ∗ x1 , . . . , ∗ xn ∈ A. But then α is an element of ∗ In such that ∗ a ⊆ A for a suitable A ∈ ∗ F(α), and this means that
458
M. Di Nasso
α ∈ ∗ ( f (a)), i.e. f (a) ∈ Uα . Moreover, it directly follows from the definition that f : Fin(I ) → Uα is antimonotonic. So, we can apply the hypothesis and pick an antiadditive function g : Fin(I ) → Uα such that g(a) ⊆ f (a) for all a. For i ∈ I , put (i) = {x | i ∈ g({x})}. Notice that if a⊆ (i) has cardinality n, then i ∈ In = ∅, this shows that (i) must x∈a g({x}) = g(a) ⊆ f (a) ⊆ In ; and since be finite. Finally, pick any function ϑ : I → Fin(I ) such that (i) ⊆ ϑ(i) ∈ F(i) for all i ∈ f ((i)). Since α ∈ ∗ ( f ((i)), it is ∗ (α) ⊆ ∗ ϑ(α) ∈ ∗ F(α). Besides, for every x ∈ I we have that α ∈ ∗ (g({x})) = ∗g({∗ x}), so ∗ x ∈ ∗ (α), and hence ∗ x ∈ ∗ ϑ(α). (2) ⇒ (1). Given an antimonotonic function f : Fin(I ) → Uα , define F : I → P(Fin(I )) by putting F(i) = {a ∈ Fin(I ) | i ∈ f (a)}. For every a = {x1 , . . . , xn } ∈ Fin(I ), α ∈ ∗ ( f (a)) = ∗ {i ∈ I | a ∈ F(i)}, and so ∗ a = {∗ x1 , . . . , ∗ xn } ∈ ∗ F(α). Then, by the hypothesis there exists a function ϑ : I → Fin(I ) such that ∗ ϑ(α) ∈ ∗ F(α) and ∗ x ∈ ∗ ϑ(α) for every x ∈ I . Finally, put g(a) = {i ∈ I | a ⊆ ϑ(i) ∈ F(i)}. Notice that α ∈ ∗ (g(a)), because ∗ a ⊆ ∗ ϑ(α) ∈ ∗ F(α). It is readily verified that g is antiadditive. Moreover, notice that ϑ(i) ∈ F(i) ⇔ i inf(ϑ(i)), so i ∈ g(a) ⇒ a ⊆ ϑ(i) and i ∈ f (ϑ(i)) ⇒ i ∈ f (a), and also the desired inclusion g(a) ⊆ f (a) is verified. Remark 11.4.7 A much simplified proof of the above theorem could also be obtained by employing a known property from nonstandard set theory. Precisely, let M be a given model of nonstandard analysis, and let α ∈ ∗ I be such that the generated ultrafilter Uα is countably incomplete. Then the model of nonstandard analysis determined by Uα is isomorphic to the elementary submodel M[α] ≺ M whose universe is given by the elements that are standard relative to α, i.e. by the elements of the form ∗ f (α) where f is any function defined on I (see [31, Sect. 6] and references therein). By working inside M[α], one can directly use the equivalence of Theorem 11.4.5 to derive Theorem 11.4.6.
11.5 Ultrafilters Generated by Pairs As in Sect. 11.2, we now work in a fixed c+ -saturated model of nonstandard analysis, and extend the u-equivalence to pairs. Definition 11.5.1 The ultrafilter generated by an ordered pair (α, β) ∈ ∗ N × ∗ N is the family U(α,β) = {X ⊆ N × N | (α, β) ∈ ∗X }. The c+ -saturation property guarantees that every ultrafilter on N × N is generated by some pair (α, β) ∈ ∗ N× ∗ N. The u-equivalence ∼u relation on ∗ N× ∗ N is defined in exactly in the same way as it was defined on ∗ N, that is we set (α, β) ∼u (α , β ) when U(α,β) = U(α ,β ) . We recall that the Cartesian product of filters U × V = {A × B | A ∈ U, B ∈ V}
11 Hypernatural Numbers as Ultrafilters
459
is a filter; however, if U and V are non-principal ultrafilters, then U ×V is not an ultrafilter, and indeed there may be plenty of ultrafilters W ⊃ U × V (see Remark 11.5.5 below). A canonical class of ultrafilters on the Cartesian product is given by the so-called tensor products: If U and V are ultrafilters on N, the tensor product U ⊗ V is the ultrafilter on N × N defined by setting: X ∈ U ⊗ V ⇐⇒ {n | {m | (n, m) ∈ X } ∈ V} ∈ U. It is easily seen that U ⊗ V ⊇ U × V. We recall that tensor products are not commutative in all non-trivial cases; indeed, if we denote by + = {(n, m) | n < m} / V ⊗ U for all non-principal U and V. then + ∈ U ⊗ V and + ∈ A first easy observation about pairs is the following. Proposition 11.5.2 U(α,β) ⊇ Uα × Uβ . Proof If A ∈ Uα and B ∈ Uβ , then trivially (α, β) ∈ ∗A × ∗ B = ∗ (A × B), i.e., A × B ∈ U(α,β) . It is also easy to improve on the above property, and characterize those products of ultrafilters that are contained in a given ultrafilter on N × N. Proposition 11.5.3 U(α,β) ⊇ Uγ ×Uδ if and only if α ∼u γ and β ∼u δ. In consequence, (α, β) ∼u (γ, δ) ⇒ α ∼ u γ and β ∼ u δ. Proof If α ∼u γ and β ∼u δ, then U(α,β) ⊇ Uα × Uβ = Uγ × Uδ . Conversely, assume / ∗A. Then that α u γ or β u δ, say α u γ, and pick A ⊆ N such that α ∈ ∗A and γ ∈ c ∗ c ∗ c / U(α,β) because A ×N ∈ Uγ ×Uδ because γ ∈ A and trivially δ ∈ N, but A ×N ∈ (α, β) ∈ / ∗ (Ac × N). Finally, if (α, β) ∼u (γ, δ) then U(α,β) = U(γ,δ) ⊇ Uγ × Uδ , and so by what just proved above, it must be α ∼ u γ and β ∼ u δ. The above proposition can be reformulated in “standard” terms as follows: Proposition 11.5.4 Let U, V be ultrafilters on N and let W be an ultrafilter on N×N. Then W ⊇ U × V if and only if U = π1 (W) and V = π2 (W) where π1 and π2 are the canonical projections on the first and second coordinate, respectively. Proof Pick α, β, γ, δ ∈ ∗ N such that W = U(α,β) , U = Uγ and V = Uδ . By Proposition 11.2.4, π1 (W) = U ∗π1 (α,β) = Uα and similarly π2 (W) = Uβ . Then apply the previous proposition. Remark 11.5.5 The implication (α, β) ∼u (γ, δ) ⇒ α ∼u γ and β ∼ u δ cannot be reversed. Indeed, it is well known that for every non-principal ultrafilters U, V on N there are at least two different ultrafilters W ⊃ U × V, namely U ⊗ V and σ(V ⊗ U) where σ(n, m) = (m, n) is the map that permutes the coordinates. Actually, provided there are no P-points RK-below U or V, there exist infinitely many W ⊃ U × V (see [10] and references therein).
460
M. Di Nasso
About the existence of pairs of hypernatural numbers that generate ultrafilters W ⊇ U × V, the following result holds. Proposition 11.5.6 Let U, V be ultrafilters on N, and let X ∈ ∗ U and Y ∈ ∗ V. Then for every ultrafilter W ⊃ U × V, there exist α ∈ X and β ∈ Y such that U(α,β) = W. Proof Since W ⊃ U × V, the intersection (U × V ) ∩ W = ∅ is non-empty for every U ∈ U, V ∈ V, and W ∈ W. Then, by transfer, (X × Y ) ∩ Z = ∅ for all Z ∈ ∗ W. In particular, the family {(X × Y ) ∩ ∗ W | W ∈ W} has the finite intersection property. By c+ -saturation, we can pick a pair (α, β) ∈ X × Y such that (α, β) ∈ ∗ W for all W ∈ W, so that U(α,β) = W. (This proof was communicated to the author by Karel Hrbá˘cek, and it is reproduced here under his permission.) Now let us fix some useful notation. For X ⊆ N × N, n ∈ N, and ξ ∈ ∗ N: • X n = { m ∈ N | (n, m) ∈ X } is the vertical n-fiber of X ; • ∗X ξ = {ζ ∈ ∗ N | (ξ, ζ) ∈ ∗X } is the vertical ξ-fiber of ∗X . Notice that ∗X ξ = ∗ χ(ξ) is the value taken at ξ by the hyper-extension of the sequence χ(n) = X n . Notice also that for finite k ∈ N, one has ∗X k = ∗ (X k ). It directly follows from the definitions that X ∈ Uα ⊗ Uβ ⇐⇒ α ∈ ∗ {n | X n ∈ Uβ } ⇐⇒ ∗X α ∈ ∗ Uβ . Theorem 11.5.7 Let α, β ∈ ∗ N be infinite numbers. Then the following properties are equivalent: 1. 2. 3. 4. 5. 6. 7.
U(α,β) = Uα ⊗ Uβ ; (α, β) generates a tensor product; For every F : N → P(N), if β ∈ ∗F(α), then ∗F(α) ∈ ∗ Uβ ; For every F : N → P(N), if ∗F(α) ∈ ∗ Uβ , then β ∈ ∗F(α); For every X ⊆ N × N, if (n, β) ∈ ∗X for all n ∈ N, then (α, β) ∈ ∗X ; For every X ⊆ N × N, if (α, β) ∈ ∗X , then (n, β) ∈ ∗X for some n ∈ N; / N, then ∗ f (β) > α. For every f : N → N, if ∗ f (β) ∈
Proof (1) ⇔ (2). One implication is trivial. Conversely, let us assume that U(α,β) = Uγ ⊗ Uδ for some γ, δ. We have seen in Proposition 11.5.2 that U(α,β) extends Uα × Uβ ; moreover, the tensor product Uγ ⊗ Uδ extends Uγ × Uδ . But then it must be Uα = Uγ and Uβ = Uδ , as otherwise there would be disjoint sets in U(α,β) . (1) ⇔ (3) ⇔ (4). Given a function F : N → P(N), let (F) = {(n, m) ∈ N × N | m ∈ F(n)} be the set of pairs whose vertical n-fibers (F)n = F(n). Then, we have (F) ∈ Uα ⊗ Uβ ⇔ ∗ (F)α = ∗F(α) ∈ ∗ Uβ . Moreover, β ∈ ∗F(α) ⇔ (α, β) ∈ ∗ (F) ⇔ (F) ∈ U(α,β) . Now notice that for every X ⊆ N × N, one has X = (F) where F(n) = X n is the sequence of the n-fibers of X . In consequence, properties (3)
11 Hypernatural Numbers as Ultrafilters
461
and (4) are equivalent to the conditions U(α,β) ⊆ Uα ⊗ Uβ and Uα ⊗ Uβ ⊆ U(α,β) , respectively. The proof is complete because inclusions between ultrafilters imply equalities. (5) ⇔ (6). This is straightforward, because property (5) for a set X is the contrapositive of property (6) for the complement X c . (1) ⇒ (5). Notice that (n, β) ∈ ∗X ⇔ β ∈ ∗ (X n ) ⇔ X n ∈ Uβ . So, by the hypothesis, the set {n | X n ∈ Uβ } = N. As trivially N ∈ Uα , we conclude that X ∈ Uα ⊗ Uβ = U(α,β) , and hence (α, β) ∈ ∗X . (5) ⇒ (7). Let X = {(n, m) | n < f (m)}. Since ∗ f (β) is infinite, we have that (n, β) ∈ ∗X for all finite n ∈ N. But then (α, β) ∈ ∗X , i.e. α < ∗ f (β). (7) ⇒ (1) ([46], Theorem 3.4). The set + = {(n, m) | n < m} belongs to Uα ⊗ Uβ ; moreover, since α < β, it is also + ∈ U(α,β) . Thus, it suffices to show that the implication X ∈ Uα ⊗Uβ ⇒ X ∈ U(α,β) holds for all X ⊆ + . By definition, X ∈ Uα ⊗ Uβ ⇔ α ∈ ∗ {n | X n ∈ Uβ }, and so X n ∈ Uβ for infinitely many n. In consequence, for every m one can always find n > m, and hence (n, m) ∈ / X , such that X n ∈ Uβ . This means that / X } = ∅. F(m) = {n | X n ∈ Uβ & (n, m) ∈ If f : N → N is the function f (m) = min F(m), then the number ∗ f (β) is infinite. Indeed, if by contradiction ∗ f (β) = k ∈ N, then we would have k ∈ ∗F(β), that / ∗X , and hence X k ∈ Uβ and X k ∈ / Uβ , a is (∗X )k = ∗ (X k ) ∈ ∗ Uβ and (k, β) ∈ / ∗F(β). contradiction. Now, by (7) it follows that α < ∗ f (β) = min ∗F(β), and so α ∈ / ∗X . We reach This means that it is not the case that both ∗X α ∈ ∗ Uβ and (α, β) ∈ ∗ ∗ ∗ the thesis (α, β) ∈ X by recalling that X α ∈ Uβ ⇔ X ∈ Uα ⊗ Uβ . Definition 11.5.8 We say that (α, β) ∈ ∗ N × ∗ N is a tensor pair if it satisfies all the equivalent conditions in the previous theorem. Notice that if n ∈ N is finite, then all pairs (n, β) are trivially tensor pairs. The property of generating tensor products is preserved under a special class of maps. Proposition 11.5.9 If (α, β) is a tensor pair then for every f, g : N → N also (∗ f (α), ∗g(β)) is a tensor pair. In “standard” terms, this means that if ( f, g) is the map (n, m) → ( f (n), g(m)), then for all ultrafilters U and V the image ( f, g)(U ⊗ V) = f (U) ⊗ g(V). Proof We use the characterization of tensor pairs as given by (6) of Proposition 11.5.7. Notice that ∗ ( f, g) = (∗ f, ∗g), and so (∗ f (α), ∗g(β)) ∈ ∗X ⇔ (α, β) ∈ (∗ f, ∗g)−1 (∗X ) = ∗ [( f, g)−1 (X )]. By the hypothesis, we can pick n ∈ N such that (n, β) ∈ ∗ [( f, g)−1 (X )], that is (∗ f (n), ∗g(β)) ∈ ∗X , where ∗ f (n) = f (n) ∈ N. As we already noticed in Remark 11.5.5, both the tensor product U ⊗ U and its image σ(U ⊗ U) under the map σ(n, m) = (m, n) extend the Cartesian product
462
M. Di Nasso
U × U. However, there is another canonical ultrafilter that extend the product of an ultrafilter U by itself, namely the diagonal ultrafilter determined by U: U = {X ⊆ N × N | {n | (n, n) ∈ X } ∈ U}. Clearly U ∼ = U. Notice that the diagonal = {(n, n) | n ∈ N} ∈ U , but ∈ / U ⊗ U and ∈ / σ(U ⊗ U) whenever U is non-principal. Notice also that if U = Uα , then U = U(α,α) . Proposition 11.5.10 Let U be non-principal, and let W be an ultrafilter that extends the product filter U × U. Then W ≤ R K U if and only if W = U . Proof Let U = Uα . If h is the function such that h(n) = (n, n) for all n, then h(U) = U∗h(α) = U(α,α) = U , and so U ≤ R K U. Conversely, let W ⊃ U × U and assume that F(Uα ) = W for a suitable function F : N → N × N, say F(n) = ( f (n), g(n)). By Proposition 11.5.3, we can pick β ∼u γ ∼u α such that W = U(β,γ) . Then U(β,γ) = F(Uα ) = U∗F(α) = U(∗ f (α),∗g(α)) . Thus, ∗ f (α) ∼u β ∼u α and ∗g(α) ∼u γ ∼u α and so, by Theorem 11.2.7, ∗ f (α) = = α. We conclude that W = U(α,α) = U .
∗g(α)
Corollary 11.5.11 If U is a non-principal ultrafilter, then U ⊗U R K U (and hence U ⊗ U U). We close this section by showing that tensor pairs are found in abundance. Theorem 11.5.12 Let α, β ∈ ∗ N be infinite. Then: 1. The set Rα,β = {β | β ∼u β & (α, β ) tensor pair} contains |∗ N|-many elements and it is unbounded in ∗ N; 2. The set L α,β = {α | α ∼u α & (α , β) tensor pair} is unbounded leftward in ∗ N \ N, and hence it contains at least c+ -many elements.
11.6 Hyper-Shifts The following notion was introduced by M. Beiglböck in [4]: Definition 11.6.1 Let A ⊆ N and let U be an ultrafilter on N. The ultrafilter-shift of A by U is defined as the set A − U = {n ∈ N | A − n ∈ U}. We now introduce a class of subsets of N, which are found as segments of hyperextensions, and that precisely correspond to ultrafilter-shifts.
11 Hypernatural Numbers as Ultrafilters
463
Definition 11.6.2 The hyper-shift of a set A ⊆ N by a number γ ∈ ∗ N is the following set: Aγ = (∗A − γ) ∩ N = {n ∈ N | γ + n ∈ ∗A}. It is readily seen that hyper-shifts are coherent with respect to finite shifts, intersections, unions, and complements. Proposition 11.6.3 For every A, B ⊆ N, for every n ∈ N, and for every γ ∈ ∗ N, the following equalities hold: 1. 2. 3. 4.
(A − n)γ = Aγ − n; (A ∩ B)γ = Aγ ∩ Bγ ; (A ∪ B)γ = Aγ ∪ Bγ ; (Ac )γ = (Aγ )c .
Proposition 11.6.4 For every A ⊆ N and for every γ ∈ ∗ N, Aγ = A − Uγ . Proof The following chain of equivalences is directly obtained from the definitions: n ∈ Aγ ⇔ γ + n ∈ ∗A ⇔ γ ∈ ∗A − n = ∗ (A − n) ⇔ A − n ∈ Uγ ⇔ n ∈ A − Uγ . Ultrafilter-shifts (and their nonstandard counterparts, the hyper-shifts) have a precise combinatorial meaning corresponding to a notion of embeddability. Definition 11.6.5 Let A, B ⊆ N. We say that A is exactly embedded in B, and write A ≤e B, if for every finite interval I there exists x such that x +(A∩ I ) = B ∩(x + I ). A similar relation between sets of natural numbers, named “finite embeddability”, has been considered in [18, Sect. 4] (see also [9], where that notion was extended to ultrafilters). The difference is that “ A finitely embedded in B” only requires the inclusion x + (A ∩ I ) ⊆ B ∩ (x + I ). With respect to finite configurations, A ≤e B tells that B is at least as combinatorially rich as A. For example, if A contains arbitrarily long arithmetic progressions and A ≤e B, then also B contains arbitrarily long arithmetic progressions. Proposition 11.6.6 For A, B ⊆ N, the following are equivalent: 1. A ≤e B; 2. A is an ultrafilter-shift of B; 3. A = Bγ for some γ ∈ ∗ N. Proof (2) ⇔ (3) is given by Proposition 11.6.4. (1) ⇒ (3). Given n ∈ N, let us consider the sets n = {x ∈ N | x + (A ∩ [1, n]) = B ∩ [x + 1, x + n]} .
464
M. Di Nasso
By the hypothesis, n = ∅ for all n ∈ N and so, by overspill, we can pick γ ∈ ∗ ν for some infinite ν ∈ ∗ N. (We denoted by ∗ ν = ∗F(ν) the value taken at ν by the hyperextension of the function F(n) = n .) Then γ + (∗A ∩ [1, ν]) = ∗ B ∩ [γ + 1, γ + ν], which implies A = (∗ B − γ) ∩ N = Bγ . (3) ⇒ (1). If A = Bγ , then for every interval I ⊂ N we have ∗A ∩ I = A ∩ I = Bγ ∩ I = (∗ B − γ) ∩ I . So, γ is a witness of the following property: “∃ξ ∈ ∗ N, ξ + (∗A ∩ I ) = ∗ B ∩ (ξ + I ).” By transfer we obtain the thesis: “∃x ∈ N, x + (A ∩ I ) = B ∩ (x + I ).” Let us now see how ultrafilter-shifts and hyper-shifts can be used to characterize density. Recall the following Definition 11.6.7 The Schnirelmann density of A ⊆ N is defined as |A ∩ [1, n]| . n n∈N
σ(A) = inf
The asymptotic density of A ⊆ N is defined as d(A) = lim
n→∞
|A ∩ [1, n]| , n
when the limit exists. Otherwise, one defines the upper density d(A) and the lower density d(A) by taking the limit superior or the limit inferior of the above sequence, respectively. Another notion that is used in combinatorial number theory is the following uniform version of the asymptotic density. Definition 11.6.8 The Banach density of A ⊆ N is defined as BD(A) = lim
n→∞
maxk∈N |A ∩ [k + 1, k + n]| . n
Notice that the sequence an = maxk |A ∩ [k + 1, k + n]| is subadditive, i.e. it satisfies an+m ≤ an + am . In consequence, one can show that limn an /n actually exists, and in fact limn an /n = inf n an /n. It is readily checked that for every A ⊆ N: σ(A) ≤ d(A) ≤ d(A) ≤ BD(A). On the other hand, there exist sets where all the above inequalities are strict. Positive Banach densities are preserved under exact embeddings (but the same property does not hold neither for Schnirelmann nor for asymptotic densities).
11 Hypernatural Numbers as Ultrafilters
465
Proposition 11.6.9 If B ≤e A then BD(B) ≤ BD(A). In consequence: 1. BD(A − U) ≤ BD(A) for all ultrafilters U on N; 2. BD(Aγ ) ≤ BD(A) for all γ ∈ ∗ N. Proof Let In | n ∈ N be a sequence of intervals with length |In | = n and such that limn |B ∩ In |/n = BD(B). By the hypothesis, for every n there exists xn such that xn + (B ∩ In ) = A ∩ Jn where Jn = xn + In is the interval of length n obtained by shifting In by xn . Then BD(A) ≥ limn |A ∩ Jn |/n = limn |B ∩ In |/n = BD(B). The Banach density of a set equals the maximum density of its ultrafilter-shifts. Theorem 11.6.10 For every A ⊆ N there exists a hyper-shift Aγ such that σ(Aγ ) = d(Aγ ) = BD(Aγ ) = BD(A). Equivalently, there exists an ultrafilter U = Uγ on N such that σ(A − U) = d(A − U) = BD(A − U) = BD(A). Proof By the nonstandard characterization of limit, given any infinite N , we can pick an interval I = [ + 1, + N ] ⊂ ∗ N of length N such that !∗A ∩ I !/N = a ≈ BD(A). (We use delimiters !X ! to denote the internal cardinality of an internal set X , and ξ ≈ η to mean that ξ and η are infinitely close, that is, ξ − η is infinitesimal.) Now fix an infinite ν such that ν/N ≈ 0. Claim. There exists γ such that for every 1 ≤ i ≤ ν: ν !∗A ∩ [γ + 1, γ + i]! ≥ a− . i N Notice that the above claim yields the thesis. Indeed, for every finite n ∈ N, trivially n ≤ ν, and so |Aγ ∩ [1, n]| !∗A ∩ [γ + 1, γ + n]! ν = ≥ a− ≈ BD(A). n n N This implies that σ(Aγ ) ≥ BD(A), and the thesis follows because σ(Aγ ) ≤ d(Aγ ) ≤ d(Aγ ) ≤ BD(Aγ ) ≤ BD(A). To prove the claim, let us proceed by contradiction, and for every γ, let !∗A ∩ [γ + 1, γ + i]! ν
466
M. Di Nasso μ−1 !∗A ∩ [ + 1, ξμ ]! 1 ∗ ν ≤ = a− ! A ∩ [ξ j + 1, ξ j+1 ]! < N N N j=1
<
μ−1 ξμ − ξ1 ν
ν ν
1 = a− < a− , ψ(ξ j ) a − N N N N N j=1
that gives the desired contradiction. A series of relevant results in additive combinatorics have been recently obtained by R. Jin by nonstandard analysis. In some of them, nonstandard properties of Banach density like the one stated in above theorem, play an important role (see, e.g., the survey [32] and references therein).
11.7 Nonstandard Characterizations in the Space (βN, ⊕) The space of ultrafilters βN can be equipped with a“pseudo-sum” operation ⊕ that extends the usual sum between natural numbers. The resulting space (βN, ⊕) and its generalizations have been extensively studied during the last decades, producing plenty of interesting applications in Ramsey theory and combinatorics of numbers (see the monography [29]). Definition 11.7.1 The pseudo-sum U ⊕ V of two ultrafilters U, V on N is the image S(U ⊗ V) of the tensor product U ⊗ V under the sum function S(n, m) = n + m. Equivalently, for every A ⊆ N: A ∈ U ⊕ V ⇐⇒ {n | A − n ∈ V} ∈ U, where A − n = {m ∈ N | m + n ∈ A} is the leftward n-shift of A. Notice that ⊕ actually extends the sum on N; indeed, if Un and Um are the principal ultrafilters determined by n and m respectively, then Un ⊕Um = Um+n is the principal ultrafilter determined by n + m. The following property is a straight consequence of the definitions. Proposition 11.7.2 U ⊕ V = W if and only if there exists a tensor pair (α, β) such that Uα = U, Uβ = V and Uα+β = W. In consequence: {U ⊕ V | U, V ∈ βN} = {Uα+β | (α, β) tensor pair }. Proof Given W = Uξ ⊕ Uη , pick a pair (α, β) with U(α,β) = Uξ ⊗ Uη . Then (α, β) is a tensor pair such that Uα = Uξ , Uβ = Uη , and W = S(Uξ ⊗ Uη ) = S(U(α,β) ) = Uα+β . Conversely, if (α, β) is a tensor pair, then Uα ⊕ Uβ = S(Uα ⊗ Uβ ) = S(U(α,β) ) = Uα+β .
11 Hypernatural Numbers as Ultrafilters
467
We recall that the space βN of ultrafilters on N is endowed with the topology as determined by the following family of basic (cl)open sets, for A ⊆ N: O A = {U | A ∈ U}. ˇ In this way, βN is the Stone-Cech compactification of the discrete space N. Indeed, βN is Hausdorff and compact, and if one identifies each n ∈ N with the corresponding principal ultrafilter, then N is dense in βN. Moreover, one can prove that every f : N → K where K is Hausdorff compact space K has a (unique) continuous extension β f : βN → K . When equipped with the pseudo-sum operation, βN has the structure of a compact topological left semigroup, because for any fixed V, the “product on the left” by U: ψV : U → U ⊕ V is a continuous function. We remark that the pseudo-sum operation is associative, but it fails badly to be commutative (see Proposition 11.7.4). Connections between pseudo-sums and hyper-shifts are established in the next proposition. Proposition 11.7.3 Let α, β, γ ∈ ∗ N. Then: 1. 2. 3. 4. 5.
A ∈ Uα ⊕ Uβ ⇔ Aβ ∈ Uα ⇔ α ∈ ∗ (Aβ ). For every n ∈ N, (A − n)β = Aβ − n. For every n ∈ N, A − n ∈ Uα ⊕ Uβ ⇔ n ∈ (Aβ )α . Uγ = Uα ⊕ Uβ ⇔ Aγ = (Aβ )α for every A ⊆ N. If (α, β) is a tensor pair, then Aα+β = (Aβ )α for every A ⊆ N.
Proof Equivalences in (1) directly follow from the definitions. (2) is proved by the chain of equivalences: k ∈ (A − n)β ⇔ k + β ∈ ∗ (A − n) = ∗A − n ⇔ k + β + n ∈ ∗A ⇔ k + n ∈ A ⇔ k ∈ A − n. (3) By using the previous properties, we obtain β β A − n ∈ Uα ⊕ Uβ ⇔ α ∈ ∗ [(A − n)β ] = ∗ (Aβ − n) = ∗ (Aβ ) − n ⇔ α + n ∈ ∗A ⇔ n ∈ (A ) . (4) Assume first U = U ⊕ U . For every n ∈ N, by (3) one has γ α β β α β that n ∈ (Aβ )α ⇔ A − n ∈ Uα ⊕ Uβ ⇔ A − n ∈ Uγ ⇔ γ ∈ ∗ (A − n) = ∗A − n ⇔ n ∈ Aγ , and so (Aβ )α = Aγ . Conversely, assume by contradiction that we can pick / Uγ . Then α ∈ ∗ (Aβ ) and γ ∈ / ∗A, and hence (Aβ )α = Aγ A ∈ Uα ⊕ Uβ with A ∈ because 0 ∈ (Aβ )α but 0 ∈ / Aγ . (5) By Proposition 11.7.2, Uα ⊕ Uβ = Uα+β , and so the thesis directly follows from (4). As a first example of use of hyper-shifts in (βN, ⊕), let us prove the continuity of −1 the “product on the left” functions. This is easily done by showing that ψU (O A ) = β ∗ ∗ O Aβ for every β ∈ N and for every A ⊆ N; indeed, for every α ∈ N one has: −1 (O A ) ⇔ Uα ⊕ Uβ ∈ O A ⇔ Uα ∈ ψU β
A ∈ Uα ⊕ Uβ ⇔ Aβ ∈ Uα ⇔ Uα ∈ O Aβ .
468
M. Di Nasso
As one can readily verify, Un ⊕ V = V ⊕ Un for every principal ultrafilter Un . We now use hyper-shifts to show that the center of (βN, ⊕) actually contains only principal ultrafilters. Theorem 11.7.4 For every non-principal ultrafilter U there exists a non-principal ultrafilter V such that U ⊕ V = V ⊕ U. Proof Pick an infinite γ such that U = Uγ , and let ν be such that ν 2 ≤ γ < (ν + 1)2 . Let us assume that ν is even (the case ν odd is entirely similar), and let A =
[n 2 , (n + 1)2 ).
n even
We distinguish two cases. If (ν + 1)2 − γ is infinite, let β = (ν + 1)2 . Notice that / ∗A for all n. Aγ = N because γ + n ∈ ∗A for all n, and Aβ = ∅ because β + n ∈ / Uγ ⊕ Uβ because trivially Then A ∈ Uβ ⊕ Uγ because trivially Aγ ∈ Uβ , and A ∈ / Uγ . If (ν + 1)2 − γ is finite, let β = ν 2 . In this case Aγ is finite, and Aβ = N. Aβ ∈ / Uβ , and A ∈ Uγ ⊕ Uβ because Aβ ∈ Uγ . Then A ∈ / Uβ ⊕ Uγ because Aγ ∈
11.8 Idempotent Ultrafilters Ultrafilters that are idempotent with respect to pseudo-sums play an instrumental role in applications. Definition 11.8.1 An ultrafilter U on N is called idempotent if U ⊕ U = U, i.e. if A ∈ U ⇐⇒ {n | A − n ∈ U} ∈ U. Theorem 11.8.2 Let α ∈ ∗ N. The following properties are equivalent: 1. 2. 3. 4. 5. 6. 7. 8. 9.
Uα is idempotent; There exists a tensor pair (α, β) such that α ∼ u β∼ u α + β; For every A ⊆ N, Aα = (Aα )α ; For every A ⊆ N, (A ∩ Aα )α = Aα ; If α ∈ ∗A then α ∈ ∗ (Aα ); If α ∈ ∗A then A ∩ Aα = ∅; If α ∈ ∗A then there exists B ⊆ A ∩ Bα such that α ∈ ∗ B. For every A ∈ Uα there exists a ∈ A such that A − a ∈ Uα ; For every A ∈ Uα there exists B ⊆ A such that B ∈ Uα and B − b ∈ Uα for all b ∈ B.
Proof (1) ⇔ (2). By Theorem 11.5.12, we can pick β such that (α, β) is a tensor pair and (α, β) ∼u (α, α). If Uα is idempotent, then Uβ = Uα = Uα ⊕ Uα = S(Uα ⊗ Uα ) = S(U(α,β) ) = Uα+β ,
11 Hypernatural Numbers as Ultrafilters
469
where S denotes the sum function. Conversely, by Proposition 11.7.2, Uα ⊕ Uα = Uα ⊕ Uβ = Uα+β = Uα . (1) ⇔ (3) is given by property (4) of Proposition 11.7.3. (3) ⇔ (4). Notice that (A ∩ Aα )α = Aα ∩ (Aα )α , so property (4) is equivalent to Aα ⊆ (Aα )α for every A ⊆ N, and one implication is trivial. The converse implication (Aα )α ⊆ Aα is proved by considering complements: (Aα )c = (Ac )α ⊆ [(Ac )α ]α = [(Aα )c ]α = [(Aα )α ]c . (1) ⇔ (5). We recall that α ∈ ∗ (Aα ) ⇔ A ∈ Uα ⊕ Uα . So (5) states the inclusion between ultrafilters Uα ⊆ Uα ⊕ Uα , which is equivalent to equality Uα = Uα ⊕ Uα . (5) ⇒ (6). Since α ∈ ∗A ∩ ∗ (Aα ) = ∗ (A ∩ Aα ), by transfer we obtain the thesis. / ∗ (Aα ). Then α ∈ ∗ B (6) ⇒ (5). Assume by contradiction that α ∈ ∗A but α ∈ c where B = A ∩ (Aα ) . This is against the hypothesis because B ∩ Bα = (A ∩ (Aα )c ) ∩ (Aα ∩ [(Aα )α ]c ) = ∅. (3) & (5) ⇒ (7). Let B = A ∩ Aα . Then, α ∈ ∗A ∩ ∗ (Aα ) = ∗ B. Moreover, trivially B ⊆ A, and also B ⊆ Aα = Aα ∩ (Aα )α = (A ∩ Aα )α = Bα . (7) ⇒ (6). Given α ∈ ∗A, pick B as in the hypothesis. Notice that B ⊆ A; moreover B ⊆ Bα ⊆ (A ∩ Bα )α = Aα ∩ (Bα )α ⊆ Aα , and hence B ⊆ A ∩ Aα . We conclude by noticing that α ∈ ∗ B, and so, by transfer, B = ∅. Finally, it is easily verified that properties (8) and (9) are the “standard” counterparts of properties (6) and (7), respectively. We remark that the existence of idempotent ultrafilters is not a trivial matter: it is proved as an application of a general result by R. Ellis about the existence of idempotents in every compact Hausdorff topological left semigroup (see, e.g., [29]). Indeed, (βN, ⊕) is a compact Hausdorff topological left semigroup. Historically, the first application of idempotent ultrafilters in combinatorics of numbers was a short and elegant proof of Hindman’s theorem found by F. Galvin and S. Glazer. The original argument used by N. Hindman in his proof [27] was really intricate. Actually, Hindman himself wrote in [28]: “If the reader has a graduate student that she wants to punish, she should make him read and understand that original proof”. A detailed report about the discovery of the ultrafilter proof can be found in [28]. Theorem 11.8.3 (Hindman) For every finite partition N = C1 ∪ · · · ∪ Cr of the natural numbers, there exists an infinite set X such that every (finite) sum of distinct elements from X belongs to the same piece Ci .
470
M. Di Nasso
Proof Pick α such that Uα is idempotent, and let Ci be the piece of the partition such that α ∈ ∗ Ci . By (7) of Theorem 11.8.2, we can fix a set B ⊆ Ci ∩ Bα with α ∈ ∗ B. Notice that x ∈ B ⇒ x ∈ Bα ⇔ α + x ∈ ∗ B ⇔ α ∈ ∗ (B − x). Now pick any x1 ∈ B. Then α witnesses the existence in ∗ B of elements larger than x1 that belong to ∗ (B − x1 ). By transfer, we obtain the existence in B of an element x2 > x1 such that x2 ∈ B − x1 . Notice that x1 , x2 , x1 + x2 ∈ B, and hence α ∈ ∗ (B − x2 ) and α ∈ ∗ (B − x1 − x2 ). Similarly as above, α witnesses the existence in ∗ B of elements larger than x2 that belong to ∗ [(B − x1 )∩(B − x2 )∩(B − x1 − x2 )]. Again by using transfer, we get the existence in B of an element x3 > x2 such that x3 ∈ (B − x1 ) ∩ (B − x2 ) ∩ (B − x1 − x2 ), and so we have that x1 , x2 , x3 , x1 + x2 , x1 + x3 , x2 + x3 , x1 + x2 + x3 ∈ B. By iterating the process, we eventually obtain a set X = {x1 < x2 < · · · < xn < . . .} such that every sum of distinct elements from X belongs to B, and hence to the same piece Ci of the partition, as desired. We recall that in the definition of the pseudo-sum U ⊕ V, one considers leftward shifts A − n = {m | m + n ∈ A}. By considering instead rightward shifts A + n = {a + n | a ∈ A}, one obtains the following operation. Definition 11.8.4 Let U, V be ultrafilters on N, where V is non-principal. The ultrafilter U V is defined by setting for every A ⊆ N: A ∈ U V ⇔ {n | A + n ∈ V} ∈ U. Notice that one can identify U V with the image D(U ⊗ V) of the tensor product U ⊗ V under the difference function D(n, m) = m − n. Indeed, although D takes values in Z, one has that N ∈ U V whenever V is non-principal and so, in this case, one can restrict to subsets of N. In a similar way as done in Theorem 11.8.2, one can prove several nonstandard characterizations of ultrafilters that are idempotent with respect to . In particular, corresponding to item (2) in Theorem 11.8.2, it is shown that Uα Uα = Uα if and only if there exists a tensor pair (α, β) such that α ∼ u β∼ u β − α. The problem is that there can be no such pair! Theorem 11.8.5 There exist pairs (α, β) such that α ∼ u β∼ u β − α, but not one of them is a tensor pair. In consequence, there exist no ultrafilters U such that U U = U. Proof For every A ⊆ N, let us consider the set (A) = {(a, b) ∈ N × N | either a, b, b − a ∈ A or a, b, b − a ∈ Ac }. We want to show that the family {(A) | A ⊆ N} has the finite intersection property. Once this is proved, by c+ -saturation one can pick a pair (α, β) ∈ A⊆N ∗ (A). It is then easily verified that α ∼ u β∼ u β − α. Given A1 , . . . , An , pick a finite partition N = C1 ∪ · · · ∪ Cr such that, for every piece Ci and every set A j , one has that either Ci ⊆ A j or Ci ⊆ Acj . Now, by Rado’s theorem, the equation X − Y = Z is partition regular on N, and so we can
11 Hypernatural Numbers as Ultrafilters
471
pick elements x, y, z ∈ Ciin one piece of the partition that satisfy x − y = z; in consequence, (x, y) ∈ ri=1 (C j ) ⊆ nj=1 (Ai ) = ∅. We recall that an equation f (X 1 , . . . , X n ) = 0 is called partition regular on N when for every finite partition of N there exist x1 , . . . , xn in the same piece of the partition that solve the equation, i.e. f (x1 , . . . , xn ) = 0. Rado’s theorem states that a linear equation c1 X 1 + · · · + cn X n = 0 (where the ci = 0) is partition regular on N if and only if there exists a sum of distinct coefficients that equals zero (see [24, Chap. 3]). Let us now turn to the negative result, and assume α ∼ u β ∼ u β − α. Notice that both α and β must be multiples of every (finite) natural number. Indeed, given n ≥ 2, the u-equivalence of α and β implies that α ≡ β mod n, and hence β −α ≡ 0 mod n. But β − α is u-equivalent to both α and β, and so α ≡ β ≡ β − α ≡ 0 mod n. Now, let us consider the functions f, g : N → N ∪ {0} where f (n) is the greatest exponent such that 3 f (n) divides n, and g(n) = n/3 f (n) . By what is proved above, both ∗ f (α) ∗ ∗ ∗ ∗ and ∗ f (β) are infinite. Since α ∼ u β, we have g(α) ∼ u g(β), and so g(α) ≡ g(β) = j mod 3, where either j = 1 or j = 2. Now assume to obtain a contradiction that (α, β) is a tensor pair. By Proposition 11.5.9, also (∗ f (α), ∗ f (β)) is a tensor pair, and since both components are infinite, it is ∗ f (α) < ∗ f (β). Then we have: ∗ f (β)
β−α = 3
∗ f (α)
· ∗g(β) − 3
∗ f (α)
· ∗g(α) = 3
· (3ν · ∗g(β) − ∗g(α))
where ν = ∗ f (β) − ∗ f (α) > 0. Consequently, ∗ f (β − α) = ∗ f (α) and − α) = 3ν · ∗ g(β) − ∗ g(α) ≡ − ∗ g(α) ≡ − j mod 3, while β − α ∼u α ⇒ ∗g(β − α) ∼ ∗g(α) ⇒ ∗g(β − α) ≡ ∗g(α) ≡ j mod (3). We must conclude that u − j ≡ j mod 3, and hence j = 0, a contradiction. (The last part of this argument is essentially the same as the one used by Hindman in [26, Sect. 4] to prove the non-existence of ultrafilters U = U U.) ∗ g(β
11.9 Final Remarks and Open Questions Since a first draft of this paper was written in 2009, several applications of the presented nonstandard approach to the use of ultrafilters appeared in the literature. In [19], iterated nonstandard extensions were used to characterize idempotent ultrafilters along the lines of Theorem 11.8.2; and by using suitable linear combinations of idempotent ultrafilters, a new proof of a version of Radó’s Theorem was given. Partition regularity of (nonlinear) polynomial equations by nonstandard methods is the subject-matter of the paper [38]. In [9], a notion of finite embeddability between sets and between ultrafilters is investigated, also with the use of the hyper-shifts of Sect. 5. The papers [39, 40] continue that line of research: the nonstandard approach is exploited to further investigating the relationships between finite embeddability relations, algebraic properties in (βN, ⊕), and combinatorial structure of sets of natural numbers.
472
M. Di Nasso
We like to close this paper with some remarks about idempotent ultrafilters. To this day, basically the only known proof of their existence is grounded on an old result by R. Ellis, namely the fact that every compact Hausdorff topological left semigroup has idempotents (see [22]). Since idempotent ultrafilters are widely used in applications, it seems desirable to better understand them; to this end, a solution to the following problem would be valuable. • Open problem #1: Find an alternative, nonstandard proof of the existence of idempotent ultrafilters. Our notions of u-equivalence and of tensor pair, and hence of idempotent ultrafilter, can be generalized to models M of any first-order theory T ⊇ PA that extends Peano Arithmetic. (By this we mean that T is a collection of sentences in a first-order language L that extends the language of PA.) We recall that the type of an element a ∈ M is the set of all formulas with one free variable that are satisfied by a in M: t p(a) = {ϕ(x) | M |= ϕ(a)}. Another notion that makes sense for models M of theories T ⊇ PA is the following. Call a pair (a, b) ∈ M × M independent when for every formula ϕ(x, y), if M |= ϕ(a, b) then M |= ϕ(k, b) for some k ∈ N. (This definition corresponds to the notion of heir of a type, as used in stable theories.) If Th(N) is the first-order theory of the natural numbers in the full language that contains a symbol for every relation, function and constant of N, then M |= Th(N) means that M = ∗ N is the set of hypernatural numbers of a model of nonstandard analysis. In this case, trivially every subset A ⊆ N is definable, and hence t p(a) = t p(b) if and only if a ∼ u b. Moreover, (a, b) is independent means that (a, b) is a tensor pair (see (6) of Theorem 11.5.7). Thus, by using property (2) of Theorem 11.8.2, one could propose the following generalization of idempotent ultrafilter. Definition 11.9.1 Let M |= T ⊇ PA. We say that an element α ∈ M is idempotent if there exists an independent pair (α, β) such that t p(α) = t p(β) = t p(α + β). • Open problem # 2: Given a first-order theory T ⊇ PA, find sufficient conditions for models M |= T to contain idempotent elements. We recall that in any c+ -saturated model of nonstandard analysis, every ultrafilter on N is generated by some element α ∈ ∗ N. In consequence, all c+ -saturated models of Th(N) contain idempotent elements. Isolating model-theoretic properties that guarantee the existence of idempotent elements would probably be useful also to attack the previous open problem.
11 Hypernatural Numbers as Ultrafilters
473
References 1. L.O. Arkeryd, N.J. Cutland, C.W. Henson (eds.), Nonstandard Analysis—Theory and Applications. NATO ASI Series C, vol. 493 (Kluwer A.P., Dordrecht, 1997) 2. D. Ballard, K. Hrbá˘cek, Standard foundations for nonstandard analysis. J. Symb. Log. 57, 471–478 (1992) 3. T. Bartoszynski, S. Shelah, On the density of Hausdorff ultrafilters, in Logic Colloquium 2004. Lecture Notes in Logic, vol. 29, ed. by A. Andretta, K. Kearnes, D. Zambella (A.S.L., Cambridge University Press, Cambridge, 2008) 4. M. Beiglböck, An ultrafilter approach to Jin’s theorem. Isr. J. Math. 185, 369–374 (2011) 5. V. Bergelson, Ergodic Ramsey Theory – an update, in Ergodic Theory of Zd -Actions (Warwick 1993–94), London Mathematical Society Lecture Note Series, vol. 228 (Cambridge University Press, Cambridge, 1996), pp. 1–61 6. V. Bergelson, A. Blass, M. Di Nasso, R. Jin (eds.), Ultrafilters across Mathematics, Contemporary Mathematics, vol. 530 (American Mathematical Society, 2010) 7. A. Blass, A model-theoretic view of some special ultrafilters, in Logic Colloquium ’77, ed. by A. Macintyre, L. Paciolski, J. Paris (North-Holland, 1978), pp. 79–90 8. A. Blass, Combinatorial characteristics of the continuum, in Handbook of Set Theory, ed. by M. Foreman, A. Kanamori (Springer, Berlin, 2010), pp. 395–489 9. A. Blass, M. Di Nasso, Finite embeddability of sets and ultrafilters, arXiv:1405.2841, submitted 10. A. Blass, G. Moche, Finite preimages under the natural map from β(N×N). Topol. Proc. 26, 407–432 (2001–2002) 11. H. Cartan, Théorie des filtres. C. R. Acad. Sci. Paris 205, 595–598 (1937) 12. C.C. Chang, H.J. Keisler, Model Theory, 3rd edn. (North-Holland, Amsterdam, 1990) 13. G. Cherlin, J. Hirschfeld, Ultrafilters and ultraproducts in non-standard analysis, in Contributions to Non-Standard Analysis, ed. by W.A.J. Luxemburg, A. Robinson (North Holland, Amsterdam, 1972), pp. 261–279 14. G. Choquet, Deux classes remarquables d’ultrafiltres sur N. Bull. Sci. Math. 92, 143–153 (1968) 15. W.W. Comfort, S. Negropontis, The Theory of Ultrafilters (Springer, Berlin, 1974) 16. A. Connes, Ultrapuissances et applications dans le cadre de l’analisi non standard. Séminaire Choquet, Initiation à l’analyse, Tome 9(1) (1969–1970) 17. N.J. Cutland, M. Di Nasso, D.A. Ross (eds.), Nonstandard Methods and Applications in Mathematics. Lecture Notes in Logic, vol. 25, (A.S.L., A.K. Peters, 2006) 18. M. Di Nasso, Embeddability properties of difference sets. Integers 14, A27 (2014) 19. M. Di Nasso, Iterated hyper-extensions and an idempotent ultrafilter proof of Rado’s theorem. Proc. Am. Math. Soc. 143, 1749–1761 (2015) ˇ 20. M. Di Nasso, M. Forti, Ultrafilter semirings and nonstandard submodels of the Stone-Cech compactification of the natural numbers, in Logic and Its Applications. AMS Contemporary Mathematics, vol. 380, ed. by A. Blass, Y. Zhang (2005), pp. 45-51 21. M. Di Nasso, M. Forti, Hausdorff ultrafilters. Proc. Am. Math. Soc. 134, 1809–1818 (2006) 22. R. Ellis, Lectures on Topological Dynamics (Benjamin, New York, 1969) 23. D. Galvin, Ultrafilters, with applications to analysis, social choice and combinatorics, unpublished notes (2009) 24. R. Graham, B. Rothschild, J. Spencer, Ramsey Theory, 2nd edn. (Wiley, New York, 1990) 25. L. Haddad, Un outil incomparable: l’ultrafiltre. Tatra Mt. Math. Publ. 31(2), 131–176 (2005) 26. N. Hindman, The existence of certain ultrafilters on N and a conjecture of Graham and Rothschild. Proc. Am. Math. Soc. 36, 341–346 (1972) 27. N. Hindman, Finite sums from sequences within cells of a partition of N. J. Comb. Theory (Series A) 17, 1–11 (1974) ˇ 28. N. Hindman, Algebra in the Stone-Cech compactification and its applications to Ramsey Theory. Sci. Math. Jpn. 62, 321–329 (2005) ˇ 29. N. Hindman, D. Strauss, Algebra in the Stone-Cech Compactification, Theory and Applications, 2nd edn. (W. de Gruyter, Berlin, 2011)
474
M. Di Nasso
30. J. Hirshfeld, Nonstandard combinatorics. Studia Logica 47, 221–232 (1988) 31. K. Hrbá˘cek, Realism, nonstandard set theory, and large cardinals. Ann. Pure Appl. Log. 109, 15–48 (2001) 32. R. Jin, Ultrapower of N and density problems, in [6], pp. 147–161 33. V. Kanovei, M. Reeken, Nonstandard Analysis, Axiomatically (Springer, Berlin, 2004) 34. H.J. Keisler, Limit ultrapowers. Trans. Am. Math. Soc. 107, 382–408 (1963) 35. P. Komjáth, V. Totik, Ultrafilters. Am. Math. Mon. 115, 33–44 (2009) 36. K. Kunen, Some points in β N. Math. Proc. Camb. Philol. Soc. 80, 385–398 (1976) 37. P.A. Loeb, M. Wolff (eds.), Nonstandard Analysis for the Working Mathematician (Kluwer A.P., Dordrecht, 2000) 38. L. Luperi Baglini, Partition regularity of nonlinear polynomials: a nonstandard approach. Integers 14, A30 (2014) 39. L. Luperi Baglini, Ultrafilters maximal for finite embeddability. J. Log. Anal. 6, 6 (2014) 40. L. Luperi Baglini, F-finite embeddabilities of sets and ultrafilters, arXiv:1401.6518, submitted 41. W.A.J. Luxemburg, Nonstandard Analysis. Lecture Notes, Department of Mathematics, California Institute of Technology (1962) 42. W.A.J. Luxemburg, A general theory of monads, in Applications of Model Theory to Algebra, Analysis and Probability, ed. by W.A.J. Luxemburg (Rinehart & Winston, Holt, 1969), pp. 18–86 43. J. van Mill, An introduction to βω, in Handbook of Set-theoretic Topology, ed. by K. Kunen, J.E. Vaughan (North-Holland, Amsterdam, 1984) 44. S. Ng, H. Render, The Puritz order and its relationship to the Rudin-Keisler order, in Reuniting the Antipodes: Constructive and Nonstandard Views of the Continuum. Synthèse Library, vol. 306, ed. by P. Schuster, U. Berger, H. Osswald (Kluwer A.P., Dordrecht, 2001), pp. 157–166 45. C. Puritz, Ultrafilters and standard functions in nonstandard analysis. Proc. Lond. Math. Soc. 22, 706–733 (1971) 46. C. Puritz, Skies, constellations and monads, in Contributions to Non-Standard Analysis, ed. by W.A.J. Luxemburg, A. Robinson (North Holland, Amsterdam, 1972), pp. 215–243
Index
A Accumulation point, 26 Adapted process, 245 Additive cut, 404 Adjoint operator, 115 ℵ1 -saturation, 59 Andersen-Jessen theorem, 100 Anderson, 6 Antibase operator, 94 Approximate fixed point sequence, 155 Approximation, 183 Approximation scheme, 148 Arithmetic progression, 417 Asymptotic basis, 413 Atkinson theorem, 136 Axioms of countability, 85
B Backward triangle, 434 Banach algebra, 118 C ∗ –algebra, 118 involutive, 118 unital, 118 Banach density, 464 Banach lattice, 116 finitely λ–lattice representable, 130 Banach limit, 57 Banach space FFP, 155 fixed point property, 155 internal, 108 nonstandard hull, 108 representable, 126 superreflexive, 128
Banach–Alaoglu theorem, 124 Base for a topology, 84 Base generating function, 94, 95 Base operator, 94 strong, 96 Bi-arithmetic progression, 417 Bleidtner, J., 93 Bolzano-Weierstrass theorem, 25 Bounded approximation property, 134 Bounded ultrapower, 45 Brownian motion, 3 b-topology, 96 monad, 96
C Calkin algebra, 135 Calkin space, 135 Carathéodory extension theorem, 98 Cauchy-Peano existence theorem, 34 Chain elementary, 73 Choquet theory, 101 Closed set, 26, 82 Closure of a set, 26 Cluster point, 83, 84 Cofinality, 404 Coin tossing, 97 Compact, 27, 165, 351, 355, 357, 359, 362, 392 locally, 379 operator, 385 Compact set, 85 Compactification, 93, 165 Competitive equilibrium, 353, 381 Complement, 170 Complete regularity, 166
© Springer Science+Business Media Dordrecht 2015 P.A. Loeb and M.P.H. Wolff (eds.), Nonstandard Analysis for the Working Mathematician, DOI 10.1007/978-94-017-7327-0
475
476 Component, 170 Comprehensive monomorphism, 55 Concurrent relation, 51 Connected, 166 Continuity, 29, 83 Continuous in probability, 264 Continuous iterated integral, 260 Continuous map, 166 Continuous surjection, 168 Continuum, 166 Contraction, 155 Convergence, 83, 84 Convergent in measure, 201 Convexity, 350, 352, 357, 359, 361, 362 Core of an operator, 149 Correspondence, 349, 356 closed valued, 376 compact valued, 356, 359 convex valued, 375 distribution of, 357, 364 independent, 376 individual demand, 380 integration, 361 mean demand, 380 mean excess demand, 353, 382 measurable, 356, 361, 376 sample, 377 Critical for Lévy functionals, 289 C ∗ –algebra, 118
D D-compact, 150 uniformly, 150 De Bruijn, 52 Deep family of sets, 69 Dense set, 82 Density topology, 94 abstract, 96 Denumerably comprehensive, 55 Derivative, 30 continuous, 31 Directed set, 84 Discrete approximation, 148 Discrete convergence, 148 Discretely compact, 150 Distance, 109, 356 between sets, 147 , 355 Hausdorff, 356, 375 infinitesimal, 355 Distribution
Index on space of measures, 326 sample, 324 Distributions empirical, 329 finite dimensional, 338 of random variables, 326 Doob measurability problem, 323, 324 Downward transfer, 168 Downward Transfer Principle, 42, 44 d(x, A), 109 Dynkin’s π − λ theorem, 326, 331, 337
E Eigenvalue, 138 approximate, 138 Eigenvector, 138 Embed, 170 Embedding, 170 elementary, 66 End, 171 End compactification, 171 Enlargement, 50, 79 Equivalence of L p -spaces, 192 Equivalence class, 165 Equivalence relation, 165 Erd˝os, P., 52, 412 Exchangeability, 324 almost, 336 de Finetti’s theorem, 338 duality with independence, 336 pairwise and multiple versions, 337 External entity, 53 External sets, 322 Extreme value theorem, 29
F FFP, 155 Filter, 7 Filter base, 168 Filtration internal, 195 right continuous, 196 standard, 195 standard part of a, 196 Fine topology, 94 Finite number, 9 Finitely λ–representable, 126 Finitely close, 21 Formula of language, 39
Index Forward triangle, 434 Free, 64 Freiman’s conjecture, 419 Freiman’s inverse phenomenon, 418 Freiman, G.A., 418, 419 From approximate to limit, 355 From limit to approximate, 354 From limit to limit, 355 Full subset, 434 Function, 9 domain, 9, 15 nonstandard extension, 19 range, 9, 14 Fundamental Theorem of Calculus, 34 Furstenberg, H., 408
G Galaxy, 21 Generated by sets, 182 Gödel, 4 Graph, 52 Graph-theoretical ends, 166
H Hausdorff space, 85, 170 Heine-Borel theorem, 27, 86 Henson–Moore theorem, 131 Hindman’s theorem, 469 holding in S, 13 Hole, 166 Homeomorphic spaces, 83 Hurd, 5, 37 Hyper-shift, 463 Hyperfinite, 5 approach, 321 computations, 321 population, 335 Hyperfinite coin toss, 6 Hyperfinite set, 50 Hyperfinite time line, 189 hypernatural numbers, ∗ N, 22 Hyperreal numbers, 8
477 approximately mutual, 341 ∗-independence, 327 asymptotic pairwise, 340 coalitional law in distribution, 332 conditional, 353, 387 continuum setting, 323, 324 discrete setting, 323 duality with exchangeability, 336 external, 394 internal, 395 multiple versions, 322 multiplicative properties, 339 mutual, 332, 378 pairwise, 322, 378 Infinite number, 9 Infinitely close, 21 between tensor products, 218 Infinitesimal, 171 Infinitesimal number, 9 Infinitesimally close, 21 Inside, 172 Integral for Lévy processes, 292 for tensor products, 247 Interior, 170 Interior of a set, 82 Intermediate value theorem, 29 Internal iterated integral, 253 iterated integral process, 259 iterated integral with parameters, 252 stochastic integral, 237 Internal entity, 53 Internal set theory, vii interpretable in S, 12 Intersection monad, 52 Iterated integral, 256 with parameters for Brownian functionals, 256 Iterated integral of order k for Lévy processes, 295 with parameters for Lévy processes, 295 Iterated integral process of order k, 296 J James theorem, 128
I Imbed, 170 Imbedding, 170 Independence, 322, 351 almost, 335, 378, 381 almost sure pairwise, 325 approximate, 335
K Kakutani theorem, 117 κ-saturated, 167 κ-saturation, 58 κ-small, 58
478 Karhunen-Loéve expansion, 353, 384 Keisler’s Fubini theorem, 323, 328 Keisler’s infinite sum theorem, 34 Keisler’s internal definition principle, 54 Keisler, H.J., 5, 32 Kernels of Brownian functionals, 267 of Lévy functionals, 305 Kneser, M., 419, 421
L λ–embeddable, 126 Language for superstructures, 38 Lattice homomorphism, 117 Law of large numbers, 378, 383, 394 asymptotic version, 342 coalitional, 330 conditional, 387 converse, 323 discrete, 323 exact, 323, 327 for subsystems, 323 higher order, 336 in distribution, 329 transferred, 327 Leibniz, 3 Lifting, 186, 321, 342 for processes, 207 of a standard function, 190 Limit, 28 elementary, 74 from approximate to, 321 from limit to, 321 Limited number, 9 Locally connected, 166 Locally S-integrable, 206 Loeb measurable, 191 Loeb measure, 185 Loeb product space, 322, 351, 377 richness of, 326 Loeb space, 185, 350, 358 adapted, 196 Loeb spaces, 97 Ł˘os theorem, 47 Lower asymptotic basis, 412 Lower asymptotic density, 405 Lower Banach basis, 413 Lower Banach density, 406 Lower U -density, 420 Luxemburg’s compact image theorem, 86
Index M Malliavin derivative for Brownian functionals, 276 for Lévy functionals, 305 Malliavin differentiable for Brownian functionals, 276 for Lévy functionals, 306 Martingale convergence theorem, 99 Measurability Borel, 325 continuum product, 324 joint, 323, 325 Lebesgue, 324 Loeb product, 327 usual product, 327 Measurable, 186 internally, 186 Metric, 356, 358 Prohorov, 356, 365 totally bounded, 375 Metric space, 167 compact, 351, 357 complete separable, 350, 375 uncountable compact, 365, 367 Metrically convex, 155 Model, 62 over, 62 weak, 61 Models closed under definition, 65 Monad, 21, 26, 80, 81, 165 Monomorphism, 41, 49 Mostowski Collapsing Function, 45 Multiple integral process for Lévy processes, 298 with parameters for Lévy processes, 297 Multiplicative properties, 322 for characteristic functions, 340 for generating functions, 339 for maximum of random variables, 340 unification of, 338 Mushroom space, 88
N n-ary relation, 9 complement, 9 Nash equilibrium, 350, 363, 369, 373 approximate, 370 mixed-strategy, 350, 367 nonexistence, 365, 368 pure-strategy, 350, 363, 392 Near-standard point, 27, 81
Index Nearstandard, 165 in tensor products, 218 Neighborhood, 167 Neighborhood filter base, 167 Neighborhood filter base at a point, 80 Nelson, vii Neocompact, 321 NES, 171 Nested nets, 166 Net, 84, 173 Neumann series, 137 Newton, 3 No arbitrage, 354, 388 Non end-splitting, 171 Non-completely regular, 170 Non-infinitesimal point, 172 Noncompact, 166 Nonexpansive, 155 Nonstandard analysis, 4 Nonstandard extension, 167, 169 Nonstandard hull, 92 locally convex space, 123 of a Banach space, 108 of a closed operator, 143 of an operator, 114 Nonstandard integers, 22 nonstandard natural numbers, ∗ N, 22 Nonstandard numbers, 8 Norm, 80 order continuous, 130 Not equivalence class splitting, 168 Nullset, 182
O One point compactification, 166 Open, 168 Open base, 81 Open covering, 85 Open mapping, 174 Open set, 26, 80, 81, 168 Operator closed densely defined, 142 compact, 133 Fredholm, 135 index, 136 internal, 114 S–continuous, 114 linear continuous, 114 of finite rank, 133 positive, 117 stable, 152 superergodic, 152
479 superstable, 152 uniformly stable, 152 Volterra, 151 Operator norm, 114 Order continuous norm, 130 Order on topologies, 84 ordered n-tuple, 38 Ordered pair, 38 Outside, 172
P P-integrable, 222 Parameters, 63 Piecewise syndetic, 408 Plünnecke’s inequality, 412 Plünnecke, H., 412 Point of closure, 82, 168 Pointwise convergence, 80, 81 Polish space, 350, 356, 357, 361, 375, 377, 378, 387 Polysaturated, 58, 108 Pre-model, 69 Predictable rectangle, 245 Problem incompatibility, 323 measurability, 323 Product space, 88, 166 Projection, 174 Projection mapping, 174 Property Sm , 156 Property holding a.e., 8 Pseudoresolvent, 142 approximate point spectrum, 144 pointspectrum, 144 singular set, 143 Purification, 350, 361, 392 Pushing-down, 321, 342
Q Q-topology, 85 Quadratic variation, 222
R Randomness coalitional, 353, 387 Rank, 37 Ray at infinity, 166 Rectangle measurable, 188 Regular cardinal regular, 76
480 Regular space, 86 Relation, 38 characteristic function, 10 ∗-transform, 10 Relative topology, 89, 169 Remainder, 170 Remote, 165 Remote point, 111 Remote set, 170 Resolvent, 137, 142 Resolvent set, 137, 142 Restricted, 298, 312 Reverse inclusion, 173 Riemann integral, 33 Riesz point, 138 S–Riesz point, 140 Riesz-Herglotz theorem, 101 Robinson, 4, 41 Robinson’s Compactness Criterion, 86 Robinson’s Sequential Lemma, 57 R-point, 167 Rudin-Keisler pre-ordering, 449
S S, 11 S-continuous, 114 uniformly, 114 S-integrability, 98, 197 internal, 197 S-point, 167 Saturated, 167 Saturated nonstandard extension, 167 Schauder theorem, 135 Selection, 349, 356, 357, 359 Semigroup generator, 145 strongly continuous, 145 Semimetric, 80 Seminorm, 80, 119 Sentence, 39 atomic, 12 compound, 12 simple, 12 transform, 15 Separable space, 82 Separates points, 170 Separating, 120 Sequence approximate fixed point, 155 bounded, 24 Cauchy, 24 limit, 24
Index limit point, 25 Shnirel’man density, 406 Simple, 186 Simple sentence transform, 16 Singular set of a pseudoresolvent, 143 Skolem function, 14 Skorokhod integral for Brownian functionals, 284 for Lévy processes, 310 Spectral radius, 137 Spectrum, 137 approximate point spectrum, 138, 142 point spectrum, 138, 142 Spectrum , 142 Spillover Principle, 56 Standard, 8 Standard entity, 50 Standard index, 174 Standard part, 21 in tensor products, 218 of a martingale, 229 right continuous, 226 Standard part map, 21, 85 Standard point, 165 ∗-finite set, 50 ∗-independence, 394 ∗-mapping, 41 ˇ Stone-Cech compactification, 166 S-topology, 85, 167, 451 Stopping time, 223 Strongly adapted, 245 Submartingale, 222 Subspace topology, 171 Sumset phenomenon, 407 Super property, 152 Superstructure, 37 Surjection, 168 Symmetric, 194, 220 Syndetic, 408
T Tensor pair, 461 Tensor product, 219 Term constant terms, 12 transform, 16 Terms, 11, 40 Thick, 408 Tight subset, 418 Topological component, 170
Index Topological ends, 166 Topological properties, 168 Topology, 81 locally convex, 120 pointwise convergence, 121 strong operator, 121 weak, 120 weak operator, 121 weak*, 121 Transfer, 168, 322, 341 backwards, 322 Transfer Principle, 16, 42 Transform of formula, 41 Two point compactification, 166 Tychonoff product theorem, 88 U U-equivalence, 445 U -internal, 420 Ultrafilter, 7 generated, 445, 458 good, 456 idempotent, 468 pseudo-sum, 466 regular, 454 Ultrafilter map, 450 Ultrapower construction, 8, 44 U -meager set, 409 U -monad, 408, 445 Uncorrelatedness, 323 almost sure, 328 characterization by coalitional law, 331 in m-tuple, 333 Uniform continuity, 30, 89 Uniform convergence on compact subsets, 80 Uniformity monad, 120 seminorms, 120
481 Uniqueness of measures, 330, 332 of Radon-Nikodym derivatives, 326 Unit Lebesgue interval, 349, 357, 365, 368, 392 Unlimited number, 9 U -nowhere dense set, 409 U -open set, 409 Upper asymptotic basis, 412 Upper asymptotic density, 405 Upper Banach basis, 413 Upper Banach density, 405 Upper semicontinuity, 350, 357, 360, 361
V Vakil, vii Valid in S, 13 van der Corput, 422 ε–pseudospectrum, 138, 142 Variable bound, 12, 39 free, 39 Vectorspace locally convex, 120 Version, 229, 262 von Neumann algebra , 132
W Weak convergence of measures, 99 Weak dependence, 335 Weak topology, 120 Wiener integral, 248 Wiener-Lévy integral, 295
Z Zakon, 41